Site Reliability Engineer

Hiive

On-site Full-time 1w ago

About the role

Join a dynamic infrastructure team as a Site Reliability Engineer. Focus on enhancing platform reliability, ensuring availability, and supporting AI workloads for improved system performance. In this role, you'll directly impact platform operational performance and reliability. Collaborating with DevOps and engineering teams, you will help build scalable infrastructure and address incident responses. You'll play a key role in implementing security measures and improving observability for AI systems.

Key Responsibilities

Maintain platform reliability and availability
Optimize and secure infrastructure systems
Proactively address scaling and reliability challenges
Configure monitoring and incident response strategies
Support AI/ML infrastructure and workloads

Requirements

Experience in Site Reliability Engineering or similar
Proven skills with AWS, particularly EKS and RDS
Familiarity with Kubernetes for production environments
Proficient in Terraform for infrastructure development
Strong background in PostgreSQL and observability tools

Enhance the system performance and contribute to a vibrant engineering culture while supporting AI innovations.

Skills

AWSEKSKubernetesPostgreSQLRDSTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer

About the role

Key Responsibilities

Requirements

Skills

Similar roles

Lead Software Engineer

AI & Digital Lead M/F/X

Technical Lead

Don't send a generic resume