Cloud Reliability Engineering Specialist
Confluent
About the role
About
Become a key player in cloud reliability as a Cloud Reliability Engineering Specialist. Focus on improving incident management processes and educating teams to maintain operational excellence across platforms in a remote setting.
This position combines technical engineering with program management expertise, requiring at least 10 years in incident management or reliability engineering. You'll analyze patterns, develop training, and collaborate with leaders to enhance reliability across large engineering organizations.
Experience with cloud services is essential for this role.
Key Responsibilities:
- Analyze systemic failures to prevent incidents
- Manage incident management tooling and workflows
- Define SLO/SLA frameworks and utilize error budgets
- Improve incident response standards continuously
- Provide training and elevate engineering practices
Requirements:
- 10+ years in SRE or reliability fields
- Background in AWS, GCP, or Azure
- Deep experience with incident management technologies
- Strong grasp of distributed systems and observability
- CI/CD pipeline familiarity and cultural adaptation skills
Drive significant reliability enhancements and support a culture of continuous improvement and teamwork across engineering teams.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free