Skip to content
mimi

Cloud Reliability Engineering Specialist

Confluent

Remote · Canada Full-time Senior 1mo ago

About the role

About

Become a key player in cloud reliability as a Cloud Reliability Engineering Specialist. Focus on improving incident management processes and educating teams to maintain operational excellence across platforms in a remote setting.

This position combines technical engineering with program management expertise, requiring at least 10 years in incident management or reliability engineering. You'll analyze patterns, develop training, and collaborate with leaders to enhance reliability across large engineering organizations.

Experience with cloud services is essential for this role.

Key Responsibilities:

  • Analyze systemic failures to prevent incidents
  • Manage incident management tooling and workflows
  • Define SLO/SLA frameworks and utilize error budgets
  • Improve incident response standards continuously
  • Provide training and elevate engineering practices

Requirements:

  • 10+ years in SRE or reliability fields
  • Background in AWS, GCP, or Azure
  • Deep experience with incident management technologies
  • Strong grasp of distributed systems and observability
  • CI/CD pipeline familiarity and cultural adaptation skills

Drive significant reliability enhancements and support a culture of continuous improvement and teamwork across engineering teams.

Skills

AWSAzureCI/CDGCPobservability

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free