S
Sr. Production Support Engineer
Shyft6
Remote · US Full-time Senior $115k – $145k/yr Today
About the role
About
We are seeking a Senior Production Support Engineer to support and maintain AI-driven applications, data platforms, and client-facing solutions in a production environment. This role is responsible for ensuring system stability, performance, and reliability across AWS, Azure, Tableau, Power BI, and DealCloud CRM integrations.
The ideal candidate brings strong troubleshooting skills, experience with cloud and data ecosystems, and the ability to support complex, integrated systems in a fast-paced, AI-focused environment.
Key Responsibilities
- Provide L2/L3 production support for applications, data pipelines, and AI-driven solutions
- Monitor system performance and respond to incidents, alerts, and service disruptions
- Perform root cause analysis (RCA) and implement fixes or coordinate with engineering teams
- Support data pipelines (ETL/ELT) and ensure accuracy of data feeding into reporting tools (Tableau, Power BI)
- Troubleshoot and resolve issues related to API integrations and microservices
- Support CRM integrations (DealCloud) and related data workflows
- Maintain and improve monitoring, logging, and alerting systems
- Execute runbooks and standard operating procedures (SOPs) for issue resolution
- Collaborate with development, QA, and data teams to ensure smooth deployment and production readiness
- Participate in on-call rotations and provide after-hours support as needed
- Identify opportunities for automation and process improvement within support operations
Requirements
Required Qualifications
- 5+ years of experience in Production Support, Application Support, or Site Reliability Engineering (SRE)
- Strong experience supporting systems in AWS and/or Azure environments
- Experience troubleshooting data pipelines, ETL/ELT processes, and data-related issues
- Strong SQL skills for data investigation and validation
- Experience with monitoring and observability tools (e.g., Datadog, Splunk, New Relic, CloudWatch, Azure Monitor)
- Experience with API troubleshooting and microservices-based architectures
- Familiarity with incident management and ticketing systems (e.g., ServiceNow, Jira)
- Basic scripting or programming experience (e.g., Python, Bash, or PowerShell)
Key Traits for Success
- Strong analytical and troubleshooting mindset
- Ability to remain calm and effective under pressure
- Proactive approach to identifying and preventing issues
- Strong collaboration skills across technical teams
- Ownership mentality and commitment to system reliability
Skills
AWSAzureBashCloudWatchDatadogDealCloudETLJiraMicroservicesNew RelicPower BIPowerShellPythonServiceNowSite Reliability EngineeringSQLSplunkTableau
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free