Skip to content
mimi

Sr. Production Support Engineer

Shyft6

Remote · US Full-time Senior $115k – $145k/yr Today

About the role

About

We are seeking a Senior Production Support Engineer to support and maintain AI-driven applications, data platforms, and client-facing solutions in a production environment. This role is responsible for ensuring system stability, performance, and reliability across AWS, Azure, Tableau, Power BI, and DealCloud CRM integrations.

The ideal candidate brings strong troubleshooting skills, experience with cloud and data ecosystems, and the ability to support complex, integrated systems in a fast-paced, AI-focused environment.

Key Responsibilities

  • Provide L2/L3 production support for applications, data pipelines, and AI-driven solutions
  • Monitor system performance and respond to incidents, alerts, and service disruptions
  • Perform root cause analysis (RCA) and implement fixes or coordinate with engineering teams
  • Support data pipelines (ETL/ELT) and ensure accuracy of data feeding into reporting tools (Tableau, Power BI)
  • Troubleshoot and resolve issues related to API integrations and microservices
  • Support CRM integrations (DealCloud) and related data workflows
  • Maintain and improve monitoring, logging, and alerting systems
  • Execute runbooks and standard operating procedures (SOPs) for issue resolution
  • Collaborate with development, QA, and data teams to ensure smooth deployment and production readiness
  • Participate in on-call rotations and provide after-hours support as needed
  • Identify opportunities for automation and process improvement within support operations

Requirements

Required Qualifications

  • 5+ years of experience in Production Support, Application Support, or Site Reliability Engineering (SRE)
  • Strong experience supporting systems in AWS and/or Azure environments
  • Experience troubleshooting data pipelines, ETL/ELT processes, and data-related issues
  • Strong SQL skills for data investigation and validation
  • Experience with monitoring and observability tools (e.g., Datadog, Splunk, New Relic, CloudWatch, Azure Monitor)
  • Experience with API troubleshooting and microservices-based architectures
  • Familiarity with incident management and ticketing systems (e.g., ServiceNow, Jira)
  • Basic scripting or programming experience (e.g., Python, Bash, or PowerShell)

Key Traits for Success

  • Strong analytical and troubleshooting mindset
  • Ability to remain calm and effective under pressure
  • Proactive approach to identifying and preventing issues
  • Strong collaboration skills across technical teams
  • Ownership mentality and commitment to system reliability

Skills

AWSAzureBashCloudWatchDatadogDealCloudETLJiraMicroservicesNew RelicPower BIPowerShellPythonServiceNowSite Reliability EngineeringSQLSplunkTableau

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free