All jobs

Site Reliability Architect/Lead

Coforge

Hyderabad · On-site Full-time Lead Today

Apply with a tailored resume Save job

About the role

Title

Site Reliability Architect/Lead

Location

Hyderabad

Experience

12-16 Years

Responsibilities

Site Reliability Architect/Lead will be responsible for implementing and operationalizing SRE practices across production systems, including defining and enforcing SLIs, SLOs, and error budgets.
The role involves active participation in system and architecture-level design decisions to ensure high availability, scalability, resilience, and performance.
The individual will own observability standards, including hands-on dashboard creation, alert design, and continuous tuning to reduce false alerting.
They will lead infrastructure and application deployments, ensure reliable CI/CD pipelines, drive automation to eliminate operational toil, manage incident responses and RCAs, act as an escalation point during critical outages, and mentor SREs while promoting a reliability-first engineering culture.

Skill Stack

Strong hands-on experience in observability and monitoring tools such as Prometheus, Grafana, Datadog, Dynatrace, New Relic, or ELK
Infrastructure and application deployment using Kubernetes and cloud platforms (AWS, Azure, or GCP)
CI/CD and GitOps tools such as Helm, Argo CD, Flux, Jenkins, GitHub Actions, or GitLab CI
Infrastructure as Code using Terraform, CloudFormation, or ARM
SRE automation using scripting languages such as Python, Go, or Bash/Shell
Proven experience working with distributed systems, microservices, and large-scale production environments is required.

Requirements

Proven experience working with distributed systems, microservices, and large-scale production environments.

Responsibilities

Implement and operationalize SRE practices across production systems, including defining and enforcing SLIs, SLOs, and error budgets.
Participate in system and architecture-level design decisions to ensure high availability, scalability, resilience, and performance.
Own observability standards, including hands-on dashboard creation, alert design, and continuous tuning to reduce false alerting.
Lead infrastructure and application deployments, ensure reliable CI/CD pipelines, and drive automation to eliminate operational toil.
Manage incident responses and RCAs, act as an escalation point during critical outages, and mentor SREs while promoting a reliability-first engineering culture.

Skills

AWSArgo CDAzureBash/ShellCloudFormationDatadogDockerDynatraceELKGCPGitOpsGitLab CIGoGrafanaHelmInfrastructure as CodeJenkinsKubernetesNew RelicPrometheusPythonTerraform

Similar roles

Platform Engineering Manager

Affinity.co

CA$90k – CA$110k/yr

Sr. AI Engineer

WebMobril Inc.

Cybersecurity Senior Engineer

Cox Automotive

$122k – $203k/yr

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free