Site Reliability Engineer
TES The Employment Solution
About the role
About
Elevate system reliability as a Site Reliability Engineer. Design and enhance highly available systems while integrating Agile principles and facilitating Scrum ceremonies for optimal team performance.
In this role, you'll work on crafting, operating, and improving resilient systems. Key responsibilities include defining SLOs and SLIs, maintaining observability, and automating operations through CI/CD practices. Collaborate closely with development teams to improve reliability from the ground up and manage incidents through a blameless post-mortem process.
Key Responsibilities
- Design and operate highly reliable systems
- Define and monitor SLO, SLI, and SLA metrics
- Automate operations using CI/CD and IaC
- Manage blameless post-mortems and RCA
- Facilitate Scrum ceremonies and Agile adoption
Requirements
- Strong expertise in cloud environments (AWS)
- Proficient in Kubernetes and Docker
- Experience with automation tools (Terraform, Ansible)
- Familiarity with Linux systems and security
- Strong communication and facilitation skills
Lead the way in system reliability while fostering team collaboration and technological excellence.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free