Site Reliability Engineer

Avrioc Technologies

UAE · On-site Senior Today

About the role

We’re looking for a Seasoned DevOps & Site Reliability Engineer (SRE) Lead to design, scale, and enhance our cloud infrastructure and observability ecosystem.

If you’re passionate about automation, resilience, and reliability — this role is for you! • Architect and deploy scalable, highly available cloud infrastructure for production workloads. • Lead and implement SRE best practices, ensuring system reliability, performance, and scalability. • Oversee and optimize CI/CD pipelines (Jenkins, Argo CD or similar) for seamless deployments. • Define and monitor SLOs & SLIs to ensure service reliability and uptime. • Design and manage observability frameworks — monitoring, logging, and alerting (Elastic Stack, Prometheus, Grafana, Dynatrace, New Relic). • Manage and optimize Kubernetes clusters and Helm charts for efficient orchestration and streamlined releases. • Implement auto-healing and proactive monitoring systems to prevent outages. • Drive fault injection testing & chaos engineering (Chaos Mesh, Litmus, AWS FIS) for resilience validation. • Collaborate with engineering and product teams to embed reliability into every phase of development. • Maintain clear documentation on infrastructure, incidents, and operational processes. • 8+ years of experience as a DevOps/SRE professional, leading enterprise SRE implementations. • Hands‑on with AWS, GCP, or Azure (EC2, S3, RDS, Lambda, etc.). • Strong with IaC tools (Terraform, CloudFormation, Ansible). • Proven experience in CI/CD automation, monitoring, and incident response. • Skilled in observability tools — Elastic Stack, Grafana, Prometheus, Dynatrace, New Relic. • Experience with AWS managed & self‑managed databases (MySQL, Cassandra, etc.). • Skilled in Python, Bash, or Go scripting. • Experience designing and testing BCP/DR strategies. • Proactive in capacity planning, ensuring scalability and resilience across cloud environments. • Excellent communication, documentation, and troubleshooting skills. • Comply with Avrioc’s Information Security & Service Management policies. • Maintain the confidentiality and integrity of all information assets. • Attend mandatory information security trainings. • Report any security incidents through official channels.

Requirements

8+ years of experience as a DevOps/SRE professional, leading enterprise SRE implementations.
Hands-on with AWS, GCP, or Azure.
Strong with IaC tools (Terraform, CloudFormation, Ansible).
Proven experience in CI/CD automation, monitoring, and incident response.
Skilled in observability tools — Elastic Stack, Grafana, Prometheus, Dynatrace, New Relic.
Experience with AWS managed & self-managed databases (MySQL, Cassandra, etc.).
Skilled in Python, Bash, or Go scripting.
Experience designing and testing BCP/DR strategies.
Proactive in capacity planning, ensuring scalability and resilience across cloud environments.
Excellent communication, documentation, and troubleshooting skills.

Responsibilities

Architect and deploy scalable, highly available cloud infrastructure for production workloads.
Lead and implement SRE best practices, ensuring system reliability, performance, and scalability.
Oversee and optimize CI/CD pipelines for seamless deployments.
Define and monitor SLOs & SLIs to ensure service reliability and uptime.
Design and manage observability frameworks — monitoring, logging, and alerting.
Manage and optimize Kubernetes clusters and Helm charts for efficient orchestration and streamlined releases.
Implement auto-healing and proactive monitoring systems to prevent outages.
Drive fault injection testing & chaos engineering for resilience validation.
Collaborate with engineering and product teams to embed reliability into every phase of development.
Maintain clear documentation on infrastructure, incidents, and operational processes.

Benefits

Comply with Avrioc’s Information Security & Service Management policies.Maintain the confidentiality and integrity of all information assets.Attend mandatory information security trainings.Report any security incidents through official channels.

Skills

DevOpsSRECloud infrastructureAutomationResilienceReliabilityCI/CDKubernetesHelmObservabilityMonitoringLoggingAlertingAWSGCPAzureTerraformCloudFormationAnsiblePythonBashGoElastic StackGrafanaPrometheusDynatraceNew Relic

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer

About the role

Requirements

Responsibilities

Benefits

Skills

Similar roles

Technical Recruiter II | (Tech Hiring) || 5-7 Yrs || Hyderabad (Hybrid) || Diverse Hiring || Contract 12 M ||

Site Reliability Engineer

Solution Architect (Remote)

Don't send a generic resume