Cloud Reliability Engineering Specialist

Confluent

Remote · Canada Full-time Senior 3mo ago

About the role

About

Become a key player in cloud reliability as a Cloud Reliability Engineering Specialist. Focus on improving incident management processes and educating teams to maintain operational excellence across platforms in a remote setting.

This position combines technical engineering with program management expertise, requiring at least 10 years in incident management or reliability engineering. You'll analyze patterns, develop training, and collaborate with leaders to enhance reliability across large engineering organizations.

Experience with cloud services is essential for this role.

Key Responsibilities:

Analyze systemic failures to prevent incidents
Manage incident management tooling and workflows
Define SLO/SLA frameworks and utilize error budgets
Improve incident response standards continuously
Provide training and elevate engineering practices

Requirements:

10+ years in SRE or reliability fields
Background in AWS, GCP, or Azure
Deep experience with incident management technologies
Strong grasp of distributed systems and observability
CI/CD pipeline familiarity and cultural adaptation skills

Drive significant reliability enhancements and support a culture of continuous improvement and teamwork across engineering teams.

Skills

AWSAzureCI/CDGCPobservability

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Cloud Reliability Engineering Specialist

About the role

About

Key Responsibilities:

Requirements:

Skills

Similar roles

Fullstack Software Architect / Lead Engineer

Java Backend Engineer (all gender)

Senior Mobile Developer (w/m/d) - iOS (ab 32h/Woche)

Don't send a generic resume