JV
Incident Manager (SRE / Operations)
Jobs via Dice
Philadelphia · On-site Contract Senior Yesterday
About the role
Job Summary
We are seeking experienced Incident Managers with strong expertise in SRE, operations engineering, and incident command. The ideal candidate will lead high-impact incident response, ensure system reliability, and drive cross-functional coordination during outages and large-scale system events.
Key Responsibilities
- Lead incident command and management for critical production issues
- Coordinate cross-functional teams during high-severity incidents
- Drive root cause analysis (RCA) and implement preventive measures
- Manage system reliability and operational stability
- Collaborate with SRE, DevOps, and engineering teams
- Ensure effective communication with stakeholders and leadership
- Drive automation and observability improvements
- Handle large-scale change events and system outages
- Maintain incident reports, documentation, and post-mortem analysis
- Continuously improve incident response processes and frameworks
Required Skills & Experience
- 6–8 years of experience in:
- Incident Management / Production Support / SRE roles
- Strong expertise in:
- Incident Command & Crisis Management
- Site Reliability Engineering (SRE)
- Operations Engineering
- Strong knowledge of:
- Reliability architecture and system design
- Automation and observability tools
- Proven ability to:
- Lead teams during high-impact outages
- Drive systemic problem resolution
- Excellent executive communication and stakeholder management skills
Technical Skills
- Incident Management
- SRE / Operations Engineering
- Monitoring & Observability Tools
- Automation & Reliability Engineering
Preferred Qualifications
- Experience in enterprise-scale production environments
- Strong analytical and problem-solving skills
- Ability to work in high-pressure, fast-paced environments
Key Deliverables
- Rapid and effective incident resolution
- Improved system reliability and uptime
- Well-documented RCA and post-incident reports
- Strong coordination across technical and business teams
Skills
AutomationIncident CommandIncident ManagementMonitoringObservabilityOperations EngineeringProduction SupportReliability EngineeringSRE
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free