Expert Site Reliability Engineer for Cloud

Confluent

Chaput Hughes · On-site Full-time Senior 1mo ago

About the role

About

Drive proactive reliability improvements as an Expert Site Reliability Engineer. Utilize deep expertise in incident management and cloud platforms to enhance the performance of a large-scale data streaming environment.

In this key role, you'll leverage your 10+ years of SRE experience to analyze and prevent incidents through system improvements and effective training programs. Engage with diverse teams globally while facilitating structured incident management and practices that support a seamless cloud experience.

Responsibilities

Lead analysis on systemic failure patterns
Own SLO/SLA definitions and framework maintenance
Drive improvements in incident response practices
Edit customer-facing documentation for clarity
Partner with engineering leaders on reliability

Requirements

10+ years of relevant engineering experience
Hands-on experience with AWS, GCP, or Azure
Proficiency with incident management tools
Strong background in observability and metrics
Familiar with Kafka or event streaming knowledge preferred

Enhance organizational reliability and foster a collaborative engineering culture through expert incident management solutions.

Skills

AWSAzureGCPKafka

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Expert Site Reliability Engineer for Cloud

About the role

About

Responsibilities

Requirements

Skills

Similar roles

(Senior) Software Engineer

Mid-Level IoT Engineer

AI Forward Deploy Engineer

Don't send a generic resume