Site Reliability Engineering

Viraaj HR Solutions Private Limited

Indore · On-site Full-time Today

About the role

About

A technology services organization operating in the IT Services / HR Technology sector, delivering cloud-hosted platforms and managed infrastructure for enterprise customers. We build and run production‑grade SaaS solutions focused on reliability, performance, and secure operations across public cloud environments. This role is for an on‑site Site Reliability Engineer supporting critical production systems in India.

Role & Responsibilities

Maintain service reliability and uptime for production systems through proactive monitoring, incident response, and root‑cause analysis.
Implement and operate infrastructure as code to provision, scale, and secure cloud resources across AWS environments.
Design, build, and maintain container orchestration platforms, CI/CD pipelines, and automated deployment workflows.
Develop and operate observability tooling (metrics, logs, traces) and dashboards to surface SLIs/SLOs and reduce MTTR.
Automate repetitive operational tasks with scripts or small services and own runbooks for on‑call rotations.
Collaborate with development teams to improve application resiliency, capacity planning, and release practices.

Skills & Qualifications

Must‑Have

Kubernetes
Docker
Linux
AWS
Terraform
Prometheus
Grafana
Jenkins

Preferred

Python
Golang
HashiCorp Vault

Additional Qualifications

Proven experience operating production services with strong focus on reliability, automation, and observability.
Familiarity with on‑call practices, incident management workflows, and post‑incident remediation.
Ability to work on‑site in India and collaborate across engineering, product, and support teams.

Benefits & Culture Highlights

Hands‑on, outcome‑driven engineering culture with ownership of end‑to‑end production systems.
Opportunity to influence architecture, tooling, and SRE practices for mission‑critical platforms.
Structured on‑call support, knowledge‑sharing forums, and career growth into platform engineering roles.

Skills: kubernetes, docker, aws, jenkins, prometheus, grafana, site reliability engineering, linux, python, terraform

Requirements

Proven experience operating production services with strong focus on reliability, automation, and observability.
Familiarity with on-call practices, incident management workflows, and post-incident remediation.
Ability to work on-site in India and collaborate across engineering, product, and support teams.

Responsibilities

Maintain service reliability and uptime for production systems through proactive monitoring, incident response, and root-cause analysis.
Implement and operate infrastructure as code to provision, scale, and secure cloud resources across AWS environments.
Design, build, and maintain container orchestration platforms, CI/CD pipelines, and automated deployment workflows.
Develop and operate observability tooling (metrics, logs, traces) and dashboards to surface SLIs/SLOs and reduce MTTR.
Automate repetitive operational tasks with scripts or small services and own runbooks for on-call rotations.
Collaborate with development teams to improve application resiliency, capacity planning, and release practices.

Benefits

Structured on-call supportknowledge-sharing forumscareer growth into platform engineering roles

Skills

AWSDockerGrafanaJenkinsKubernetesLinuxPrometheusTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free