Skip to content
mimi

Site Reliability Engineer

Avrioc Technologies

UAE · On-site Senior Today

About the role

We’re looking for a Seasoned DevOps & Site Reliability Engineer (SRE) Lead to design, scale, and enhance our cloud infrastructure and observability ecosystem.

If you’re passionate about automation, resilience, and reliability — this role is for you! • Architect and deploy scalable, highly available cloud infrastructure for production workloads. • Lead and implement SRE best practices, ensuring system reliability, performance, and scalability. • Oversee and optimize CI/CD pipelines (Jenkins, Argo CD or similar) for seamless deployments. • Define and monitor SLOs & SLIs to ensure service reliability and uptime. • Design and manage observability frameworks — monitoring, logging, and alerting (Elastic Stack, Prometheus, Grafana, Dynatrace, New Relic). • Manage and optimize Kubernetes clusters and Helm charts for efficient orchestration and streamlined releases. • Implement auto-healing and proactive monitoring systems to prevent outages. • Drive fault injection testing & chaos engineering (Chaos Mesh, Litmus, AWS FIS) for resilience validation. • Collaborate with engineering and product teams to embed reliability into every phase of development. • Maintain clear documentation on infrastructure, incidents, and operational processes. • 8+ years of experience as a DevOps/SRE professional, leading enterprise SRE implementations. • Hands‑on with AWS, GCP, or Azure (EC2, S3, RDS, Lambda, etc.). • Strong with IaC tools (Terraform, CloudFormation, Ansible). • Proven experience in CI/CD automation, monitoring, and incident response. • Skilled in observability tools — Elastic Stack, Grafana, Prometheus, Dynatrace, New Relic. • Experience with AWS managed & self‑managed databases (MySQL, Cassandra, etc.). • Skilled in Python, Bash, or Go scripting. • Experience designing and testing BCP/DR strategies. • Proactive in capacity planning, ensuring scalability and resilience across cloud environments. • Excellent communication, documentation, and troubleshooting skills. • Comply with Avrioc’s Information Security & Service Management policies. • Maintain the confidentiality and integrity of all information assets. • Attend mandatory information security trainings. • Report any security incidents through official channels.

Requirements

  • 8+ years of experience as a DevOps/SRE professional, leading enterprise SRE implementations.
  • Hands-on with AWS, GCP, or Azure.
  • Strong with IaC tools (Terraform, CloudFormation, Ansible).
  • Proven experience in CI/CD automation, monitoring, and incident response.
  • Skilled in observability tools — Elastic Stack, Grafana, Prometheus, Dynatrace, New Relic.
  • Experience with AWS managed & self-managed databases (MySQL, Cassandra, etc.).
  • Skilled in Python, Bash, or Go scripting.
  • Experience designing and testing BCP/DR strategies.
  • Proactive in capacity planning, ensuring scalability and resilience across cloud environments.
  • Excellent communication, documentation, and troubleshooting skills.

Responsibilities

  • Architect and deploy scalable, highly available cloud infrastructure for production workloads.
  • Lead and implement SRE best practices, ensuring system reliability, performance, and scalability.
  • Oversee and optimize CI/CD pipelines for seamless deployments.
  • Define and monitor SLOs & SLIs to ensure service reliability and uptime.
  • Design and manage observability frameworks — monitoring, logging, and alerting.
  • Manage and optimize Kubernetes clusters and Helm charts for efficient orchestration and streamlined releases.
  • Implement auto-healing and proactive monitoring systems to prevent outages.
  • Drive fault injection testing & chaos engineering for resilience validation.
  • Collaborate with engineering and product teams to embed reliability into every phase of development.
  • Maintain clear documentation on infrastructure, incidents, and operational processes.

Benefits

Comply with Avrioc’s Information Security & Service Management policies.Maintain the confidentiality and integrity of all information assets.Attend mandatory information security trainings.Report any security incidents through official channels.

Skills

DevOpsSRECloud infrastructureAutomationResilienceReliabilityCI/CDKubernetesHelmObservabilityMonitoringLoggingAlertingAWSGCPAzureTerraformCloudFormationAnsiblePythonBashGoElastic StackGrafanaPrometheusDynatraceNew Relic

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free