ASE Compute - Site Reliability Engineering (SRE) Manager
Apple
About the role
We are looking for an SRE Manager to lead a team that keeps this infrastructure reliable, performant, and ready for the next order-of-magnitude growth.
This is a hands-on leadership role. You will set the technical direction for reliability and operational excellence while mentoring engineers, driving automation, and partnering closely with software and infrastructure teams to ship improvements that matter.
You will have direct impact on the platform that underpins Apple's most critical services. You will work alongside world-class engineers solving problems at a scale few organizations encounter - and you will build a team culture that makes reliability engineering sustainable and rewarding.
3+ years of engineering management experience leading infrastructure or SRE teams Deep experience operating large-scale, multi-tenant Kubernetes environments in production Strong systems background - comfortable troubleshooting across the full stack (network, OS, container runtime, application) Experience with configuration management at scale (Puppet, Ansible, or equivalent) Track record of building high-performing teams through coaching, clear expectations, and psychological safety Demonstrated ability to drive cross-functional initiatives to completion Strong written and verbal communication skills
Experience with third-party cloud platforms (AWS, GCP, or Azure) Familiarity with bare-metal provisioning and lifecycle management at datacenter scale Experience with Java, Go, or Python services in production Understanding of cloud-native observability (Prometheus, Thanos, Splunk, or similar) CNCF Certified Kubernetes Administrator (CKA) or equivalent hands-on certification Experience running infrastructure as an internal managed service with defined SLAs
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free