Skip to content
mimi

Senior Site Reliability Engineer - SRE

tsworks

Bengaluru · On-site Full-time Senior 4d ago

About the role

About tsworks:

tsworks is a leading technology innovator, providing transformative products and services designed for the digital-first world. Our mission is to provide domain expertise, innovative solutions and thought leadership to drive exceptional user and customer experiences. Demonstrating this commitment, we have a proven track record of championing digital transformation for industries such as Banking, Travel and Hospitality, and Retail (including e-commerce and omnichannel), as well as Distribution and Supply Chain, delivering impactful solutions that drive efficiency and growth. We take pride in fostering a workplace where your skills, ideas, and attitude shape meaningful customer engagements.

About Team:

We are looking for an experienced and highly skilled Senior Site Reliability Engineer (SRE) to join our team and play a key role in ensuring the high availability, scalability, and reliability of our infrastructure. The ideal candidate will have 7+ years of experience in site reliability engineering, cloud computing, infrastructure automation, and monitoring, with a deep understanding of modern DevOps and SRE practices. Responsibilities: • Architect, design, and maintain high availability, scalable, and resilient infrastructure to support business-critical applications. • Lead the implementation and management of Infrastructure as Code (IaC) using AWS CDK, ensuring infrastructure is automated, repeatable, and secure. • Develop and optimize automation for deployments, configuration management, and infrastructure provisioning across cloud (AWS) and container orchestration platforms (Kubernetes, EKS, ECS). • Enhance and maintain CI/CD pipelines, ensuring smooth and automated application and infrastructure deployments. • Design and implement monitoring and observability solutions using tools such as Datadog, Prometheus, Grafana, ensuring proactive identification and resolution of performance bottlenecks and failures. • Lead incident response and root cause analysis efforts, ensuring high levels of service availability and quick resolution of infrastructure issues. • Continuously improve infrastructure performance, scalability, and reliability through best practices, automation, and innovation. • Mentor and coach junior engineers, sharing knowledge, best practices, and expertise in site reliability engineering.

Requirements

Key Attributes and Qualifications: • 7-10+ years of experience in Site Reliability Engineering, DevOps, or a related field. • Expertise in cloud computing, particularly AWS, with deep knowledge of infrastructure design and best practices. • Experience with multi-cloud environments, including Azure and GCP, is highly desirable. • Proficiency with AWS CDK is essential, with additional experience in Terraform and Ansible considered a strong advantage. • Strong experience with Kubernetes and container orchestration platforms (EKS, ECS), including deploying, scaling, and managing workloads. • Advanced scripting and programming skills (Python, Bash, or similar) for automation and infrastructure management. • In-depth knowledge of monitoring, logging, and observability tools (Datadog, Prometheus, Grafana, ELK, etc.). • Preferred knowledge of Content Delivery Networks (CDNs) for optimizing application performance and scalability. • Excellent communication and leadership skills, with experience mentoring junior engineers and driving technical excellence.

Mandatory Work Experience in Project • Kubernetes-Docker • CI/CID Pipeline • Scripting - terraform, helm • Monitoring

Good to Have Application Knowledge (Java/Maven/Angular)

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free