AS
CSB/Site Reliability Architect
Acura Solution
Navi Mumbai · On-site Full-time Lead Today
About the role
Designation
Site Reliability Architect
Location
Turbhe Office, Mumbai
CTC
as per company norms
About
- The Site Reliability Architect is a key leadership role, responsible for designing and implementing the architectural vision for our production systems, with a primary focus on reliability, scalability, and performance.
- This individual will work closely with development, operations, and product teams to define and enforce SRE best practices, develop robust and resilient system designs, and drive the adoption of automation and observability across the organization.
Responsibilities
- Champion the architectural principles and long-term strategy for site reliability.
- Design and review system architectures to ensure they meet high standards for reliability, scalability, and fault tolerance.
- Enforce SRE principles such as Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.
- Oversee the design and implementation of continuous integration/continuous deployment (CI/CD) pipelines.
- Lead the response to major incidents, guiding teams through diagnosis and resolution.
- Design and test disaster recovery and business continuity plans.
- Collaborate with engineering teams to embed reliability into the software development lifecycle from the initial design phase.
- Communicate complex technical concepts and reliability metrics to both technical and non-technical stakeholders.
- Implement Chaos Engineering practices to proactively test system resilience.
Requirements
- M.Tech/B.Tech Or Equivalent Bachelor's Degree
- Min Experience: 10 years
- Max Experience: 16 years
- 10-16 years of experience in software engineering, systems administration, or a related role, with at least 5 years in a dedicated SRE or senior DevOps position.
Requirements
- 10-16 years of experience in software engineering, systems administration, or a related role
- At least 5 years in a dedicated SRE or senior DevOps position
Responsibilities
- Design and implement the architectural vision for production systems, focusing on reliability, scalability, and performance.
- Work closely with development, operations, and product teams to define and enforce SRE best practices.
- Develop robust and resilient system designs.
- Drive the adoption of automation and observability across the organization.
- Champion the architectural principles and long-term strategy for site reliability.
- Design and review system architectures to ensure they meet high standards for reliability, scalability, and fault tolerance.
- Enforce SRE principles such as Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.
- Oversee the design and implementation of continuous integration/continuous deployment (CI/CD) pipelines.
- Lead the response to major incidents, guiding teams through diagnosis and resolution.
- Design and test disaster recovery and business continuity plans.
- Collaborate with engineering teams to embed reliability into the software development lifecycle from the initial design phase.
- Communicate complex technical concepts and reliability metrics to both technical and non-technical stakeholders.
- Implement Chaos Engineering practices to proactively test system resilience.
Skills
CI/CDChaos EngineeringDevOpsError BudgetsObservabilitySRESLISLO
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free