Skip to content
mimi

Expert Site Reliability Engineer for Cloud

Confluent

Chaput Hughes · On-site Full-time Senior 1w ago

About the role

About

Drive proactive reliability improvements as an Expert Site Reliability Engineer. Utilize deep expertise in incident management and cloud platforms to enhance the performance of a large-scale data streaming environment.

In this key role, you'll leverage your 10+ years of SRE experience to analyze and prevent incidents through system improvements and effective training programs. Engage with diverse teams globally while facilitating structured incident management and practices that support a seamless cloud experience.

Responsibilities

  • Lead analysis on systemic failure patterns
  • Own SLO/SLA definitions and framework maintenance
  • Drive improvements in incident response practices
  • Edit customer-facing documentation for clarity
  • Partner with engineering leaders on reliability

Requirements

  • 10+ years of relevant engineering experience
  • Hands-on experience with AWS, GCP, or Azure
  • Proficiency with incident management tools
  • Strong background in observability and metrics
  • Familiar with Kafka or event streaming knowledge preferred

Enhance organizational reliability and foster a collaborative engineering culture through expert incident management solutions.

Skills

AWSAzureGCPKafka

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free