Expert Site Reliability Engineer for Cloud
Confluent
About the role
About
Drive proactive reliability improvements as an Expert Site Reliability Engineer. Utilize deep expertise in incident management and cloud platforms to enhance the performance of a large-scale data streaming environment.
In this key role, you'll leverage your 10+ years of SRE experience to analyze and prevent incidents through system improvements and effective training programs. Engage with diverse teams globally while facilitating structured incident management and practices that support a seamless cloud experience.
Responsibilities
- Lead analysis on systemic failure patterns
- Own SLO/SLA definitions and framework maintenance
- Drive improvements in incident response practices
- Edit customer-facing documentation for clarity
- Partner with engineering leaders on reliability
Requirements
- 10+ years of relevant engineering experience
- Hands-on experience with AWS, GCP, or Azure
- Proficiency with incident management tools
- Strong background in observability and metrics
- Familiar with Kafka or event streaming knowledge preferred
Enhance organizational reliability and foster a collaborative engineering culture through expert incident management solutions.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free