Principal Site Reliability Engineer for AWS and Kubernetes Operations

Parallel Domain

Barrie · On-site Full-time Lead 1mo ago

About the role

About

Shape cloud reliability as a Principal Site Reliability Engineer leading AWS and Kubernetes operations. Enhance infrastructure for high-stake simulation workloads in autonomous vehicle technology.

In this role, you’ll be responsible for the overall health and performance of a scalable cloud architecture, managing EKS clusters, and ensuring security compliance. You will also drive initiatives for incident response and proactive issue prevention while collaborating across engineering and customer-facing teams to deliver a seamless experience.

Key Responsibilities:

Evolve AWS infrastructure and enhance platform performance
Lead incident investigations and deploy automated solutions
Oversee security governance for cloud services
Collaborate with customer teams for optimal availability
Improve CI/CD pipelines for seamless development

Requirements:

5+ years in infrastructure engineering or SRE
Proficient in infrastructure-as-code with Terraform
Deep knowledge of AWS services and Kubernetes
Strong networking fundamentals and security awareness
Experience with monitoring tools like Prometheus

Become a key player in building reliable and high-performance cloud systems that foster innovation in autonomous vehicle simulation and beyond.

Skills

AWSCI/CDKubernetesPrometheusTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Principal Site Reliability Engineer for AWS and Kubernetes Operations

About the role

About

Key Responsibilities:

Requirements:

Skills

Similar roles

Senior Database Engineer

Team Leads

Staff Engineer

Don't send a generic resume