Principal Site Reliability Engineer
Jobgether
About the role
About
Enhance and manage cloud infrastructure as a Principal Site Reliability Engineer. Drive reliability, scalability, and security across AWS/EKS environments while collaborating with engineering and customer teams.
In this hands-on role, you will take ownership of mission-critical workloads, optimizing performance and implementing automated solutions. Your deep technical expertise will be necessary for leading incident responses, ensuring security compliance, and supporting CI/CD processes. Ideal for a self-motivated individual who excels in a dynamic environment, this position focuses on shaping infrastructure strategies that greatly affect customer success.
Key Responsibilities:
- Own and enhance cloud infrastructure for availability
- Manage Kubernetes clusters' operation and health
- Lead incident response and systemic fixes to reduce downtime
- Oversee cloud security and IAM governance
- Drive infrastructure design and cost optimization strategies
Requirements:
- 5+ years in SRE or Dev Ops roles
- Strong AWS and Kubernetes expertise required
- Proficiency in infrastructure-as-code tools like Terraform
- Experience with monitoring tools and CI/CD pipelines
- Strong scripting skills in Python and Bash
Utilize your expertise to optimize a complex cloud infrastructure and ensure seamless operation across mission-critical environments.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free