Principal Site Reliability Engineer
Parallel Domain
About the role
Enhance the reliability and security of cloud systems in the Principal Site Reliability Engineer role. Take charge of the infrastructure that supports advanced simulation workloads for autonomous vehicle innovation.
This high-ownership position requires overseeing AWS/EKS environments, collaborating closely with a small team of platform engineers and cross-functional engineering groups. You’ll engage in proactive incident management and infrastructure improvements, ensuring our platform meets the highest standards for performance and security.
Key Responsibilities:
- Lead improvements to AWS-based infrastructure reliability
- Manage EKS cluster operations including node strategies
- Implement Git Ops for streamlined application management
- Address complex networking including DNS and load balancing
- Drive incident investigations and root cause analyses
Requirements:
- 5+ years experience in SRE or infrastructure roles
- Solid skills in Terraform and infrastructure-as-code
- Strong familiarity with AWS; EKS, VPC, IAM essential
- Kubernetes operations knowledge and automation skills
- Experience with observability tools like Grafana, Elasticsearch
Join in shaping reliable cloud operations for innovative technology solutions while enhancing overall system security and performance.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free