JV
Sr. DevOps/Site Reliability Engineer
Jobs via Dice
Arlington Heights · Hybrid Contract Senior 2w ago
About the role
About
We are looking for a Senior Site Reliability Engineer (SRE) with deep experience in AWS infrastructure, automation, observability, and production support. As an SRE, you will ensure our cloud-native systems are resilient, scalable, and efficient, driving reliability through code, not just processes.
Key Responsibilities
- Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
- Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness
- Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog
- Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processes
- Optimize systems for cost, performance, and reliability
- Drive chaos engineering and resilience testing
- Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets
- Mentor junior SREs and promote DevOps/SRE culture across the organization
Basic Qualifications
- Strong Experience In SRE, DevOps, Or Cloud Engineering
- Expertise in AWS core services (EC2, ECS/EKS, Lambda, S3, VPC, RDS, IAM, CloudFront, etc.)
- Hands-on Experience With Terraform, Ansible, Or Other IaC Tools
- Strong scripting/coding skills (Python, Go, Shell, etc.)
- Experience With Kubernetes, Containerization, And Orchestration
- Deep knowledge of Linux systems and networking
Preferred Qualifications
- Experience With Service Meshes (e.g., Istio, App Mesh)
- Familiarity with AWS Well-Architected Framework
- Experience Building Self-healing Systems And Automated Remediation
- Background in security, compliance, or multi-account/multi-region AWS architectures
Certifications (Optional/Preferred)
- AWS Certified DevOps Engineer – Professional
- AWS Certified Solutions Architect – Professional
Skills
AnsibleAWSCloudFrontDatadogDevOpsDockerDynatraceEC2ECSEKSError budgetsGoHarnessIAMIaCIstioKubernetesLambdaLinuxMonitoringNetworkingObservabilityPythonRDSReliabilityS3ScalabilitySecurityShellSite Reliability EngineeringSLAsSLOsTerraformVPC
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free