All jobs

Site Reliability Engineer

Ascendion

Vandiperiyar · Hybrid Full-time Senior 1w ago

Apply with a tailored resume Save job

About the role

Job Title

Site Reliability Engineer

Location

Bengaluru (Hybrid, 2-3 days onsite in a week)

Minimum relevant years of experience

10+ Years

Role Overview

We are recruiting multiple SRE Engineers to embed directly within engineering teams. These engineers will become expert practitioners of the systems they support — understanding how they are built, how they behave in production, and how to troubleshoot them forensically when issues arise. This is a hands‑on, technical role that demands both engineering rigour and operational instinct.

Key Responsibilities

Embed within engineering squads to build deep system knowledge — understanding architecture, data flows, failure modes, and dependencies.
Instrument systems with comprehensive observability — metrics, logs, traces, and alerting — to provide a full forensic picture of production behaviour.
Participate in on‑call rotas and lead technical incident response, using structured troubleshooting and tooling to diagnose and resolve production issues rapidly.
Proactively identify reliability risks and work with engineering teams to address them before they impact production.
Build and maintain runbooks, playbooks, and diagnostic tooling to support efficient incident management.
Monitor system performance continuously, validating both infrastructure health and functional correctness of data pipelines and application behaviour.
Support the SRE Lead in establishing team‑wide standards for monitoring, alerting, and incident response.

Essential Skills & Experience

Solid software engineering or platform engineering background with production operations experience.
Hands‑on experience with observability and monitoring tooling (e.g. Datadog, Grafana, ELK stack, Prometheus, or equivalent).
Experience troubleshooting complex distributed systems — strong diagnostic skills and methodical approach to incident investigation.
Comfortable reading and understanding application code as well as infrastructure configuration.
Experience working in Agile engineering teams with shared ownership of reliability outcomes.

Desirable / Nice-to-Have

Experience in financial services, particularly with data‑intensive or calculation‑heavy systems (e.g. index calculation, pricing engines, market data pipelines).
Familiarity with AI‑assisted diagnostic tools or agentic troubleshooting workflows.
Knowledge of data ingestion patterns and how to validate the accuracy and completeness of processed data.
Experience writing tooling or automation to improve operational workflows and reduce toil.

About Ascendion

Ascendion is transforming the future of technology with AI‑driven software engineering. Our global team accelerates innovation and delivers future‑ready solutions for some of the world’s most important industry leaders. Our applied AI, software engineering, cloud, data, experience design, and talent transformation capabilities accelerate innovation for Global 2000 clients. Join us to build transformative experiences, pioneer cutting‑edge solutions, and thrive in a vibrant, inclusive culture - powered by AI and driven by bold ideas.

Requirements

Solid software engineering or platform engineering background with production operations experience.
Hands-on experience with observability and monitoring tooling (e.g. Datadog, Grafana, ELK stack, Prometheus, or equivalent).
Experience troubleshooting complex distributed systems — strong diagnostic skills and methodical approach to incident investigation.
Comfortable reading and understanding application code as well as infrastructure configuration.
Experience working in Agile engineering teams with shared ownership of reliability outcomes.

Responsibilities

Embed within engineering squads to build deep system knowledge — understanding architecture, data flows, failure modes, and dependencies.
Instrument systems with comprehensive observability — metrics, logs, traces, and alerting — to provide a full forensic picture of production behaviour.
Participate in on-call rotas and lead technical incident response, using structured troubleshooting and tooling to diagnose and resolve production issues rapidly.
Proactively identify reliability risks and work with engineering teams to address them before they impact production.
Build and maintain runbooks, playbooks, and diagnostic tooling to support efficient incident management.
Monitor system performance continuously, validating both infrastructure health and functional correctness of data pipelines and application behaviour.
Support the SRE Lead in establishing team-wide standards for monitoring, alerting, and incident response.

Skills

DatadogELK stackGrafanaPrometheus

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer

About the role

Job Title

Location

Minimum relevant years of experience

Role Overview

Key Responsibilities

Essential Skills & Experience

Desirable / Nice-to-Have

About Ascendion

Requirements

Responsibilities

Skills

Similar roles

System Engineer Private Cloud; m/w/d | HCPHCS

Cloud Operations Engineer; m/w/d

Lead Infrastructure Engineer Driving Digital Asset Innovations Onsite

Don't send a generic resume

Site Reliability Engineer

About the role

Job Title

Location

Minimum relevant years of experience

Role Overview

Key Responsibilities

Essential Skills & Experience

Desirable / Nice-to-Have

About Ascendion

Requirements

Responsibilities

Skills

Similar roles

System Engineer Private Cloud; m​/w​/d | HCPHCS

Cloud Operations Engineer; m​/w​/d

Lead Infrastructure Engineer Driving Digital Asset Innovations Onsite

Don't send a generic resume

System Engineer Private Cloud; m/w/d | HCPHCS

Cloud Operations Engineer; m/w/d