Skip to content
mimi

Sr Site Reliability Engineer

Bizmatics India Private Limited

India · On-site Full-time Senior 2d ago

About the role

Business Unit:

STChealth is a company focused on vaccine intelligence and immunization data management — it connects public and private healthcare sources to deliver real-time immunization information.

Their platform is used by thousands of locations, and they emphasize data integrity, real-time analytics, and enabling better decision-making in public health. Headquarters: Phoenix, Arizona (US).

Job Summary:

The Site Reliability Engineer (SRE) supports a U.S. public health SaaS platform processing protected health information (PHI) under HIPAA. The role emphasizes automation, monitoring, and reliability engineering for regulated environments. The SRE will partner closely with U.S.-based teams to enhance observability, CI/CD automation, and operational maturity in non-production and staging systems—maintaining compliance with HIPAA, SOC2, and corporate data protection standards.

Core Responsibilities

- Automate infrastructure provisioning, configuration, and maintenance using Terraform, Ansible, and Python. - Build, enhance, and maintain CI/CD pipelines using Jenkins, GitHub Actions, or AWS CodePipeline for continuous delivery and consistency across environments. - Implement and optimize monitoring solutions using Datadog, Prometheus, Grafana, and ELK/EFK stacks to ensure high service reliability. - Develop alerting strategies and escalation paths aligned to service-level objectives (SLOs) and key performance indicators (KPIs). - Build custom scripts and automation for patching, validation, and system health checks. - Partner with U.S. SREs and Engineering teams on environment management, change control, and incident response improvements. - Analyze logs and performance metrics to identify stability issues, optimize cloud costs, and drive continuous improvement. - Maintain detailed runbooks, SOPs, and documentation supporting operational readiness and knowledge transfer. - Contribute to open-source or internal tooling that enhances automation, monitoring, or observability capabilities. - Conduct periodic reliability reviews, performance tests, and failover simulations to validate readiness. - Support adoption of infrastructure-as-code, immutable environments, and container orchestration (Docker/Kubernetes). - Promote DevOps and SRE best practices across the engineering organization. Tools & Technologies AWS (EC2, S3, Lambda, CloudWatch, IAM, RDS, ECS/EKS), Terraform, Ansible, Python, Bash, Jenkins, GitHub Actions, Docker, Kubernetes, Prometheus, Grafana, ELK/EFK, Loki, Jira, Confluence.

Qualifications - 5–7 years in SRE, DevOps, or Infrastructure Engineering. - Bachelor’s degree in computer science or related field of study preferred, or equivalent experience - Experience supporting U.S. healthcare or other regulated SaaS systems (HIPAA, SOC2, ISO27001). - Strong scripting and automation (Ansible, Jenkins, Python, Bash, Terraform, CloudFormation). - Understanding of CI/CD, networking, and secure cloud architecture. - Proven collaboration with U.S. teams across time zones; clear written and spoken English. - Familiarity with EHR, HL7/FHIR, or state/federal public health systems preferred. - Knowledge of data privacy frameworks (HIPAA, HITRUST, GDPR) and ITIL-based change/incident management. Work Model - Aligns with U.S. Eastern hours for daily collaboration, stand-ups, and sprint planning. - Documents work thoroughly to ensure audit readiness and operational transparency. - Works closely with U.S. SRE leadership on automation priorities, sprint goals, and production readiness activities.

Soft Skills - Analytical problem-solver with attention to detail. - Self-driven, collaborative, and process-oriented. - Excellent communication and time management across distributed teams. - Passionate about automation, reliability, and continuous improvement. Example Contributions - Automated patching pipeline for pre-production validation of security updates. - Designed Grafana dashboards reducing alert noise by 40%. - Built Python scripts automating AWS cleanup, saving 15% cloud spend. - Implemented environment consistency checks improving deployment success rates. - Introduced CI/CD optimizations reducing release time by 25%.

Work Mode: Remote

Shift Timings: 6:30pm to 3:30am IST Location: Mumbai – Remote

Benefits: Annual Public Holidays as applicable 30 days total leave per calendar year Mediclaim policy Lifestyle Rewards Program Group Term Life Insurance Gratuity ...and more

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free