Site Reliability Engineer (SRE) – AI & Incident Management
Praxis HR Solution
About the role
Job Title
Site Reliability Engineer (SRE) – AI & Incident Management
Location
Pune | Gurugram | Noida (Hybrid / On-site)
Employment Type
Full-Time
Notice Period
Immediate Joiners to 30 Days
Job Summary
We are looking for a highly motivated Site Reliability Engineer (SRE) with strong expertise in AI-driven systems and Incident Management. The ideal candidate will be responsible for ensuring reliability, scalability, and performance of critical production systems. This role requires hands-on experience in automation, monitoring, and incident response to maintain high system availability.
Key Responsibilities • Ensure high availability, reliability, and performance of production systems. • Monitor infrastructure and applications to detect and resolve issues proactively. • Manage incident response, troubleshooting, and root cause analysis (RCA). • Implement automation to improve operational efficiency and reduce manual efforts. • Work closely with development teams to improve system reliability and deployment processes. • Utilize AI/ML tools or AI-enabled platforms to enhance monitoring and incident prediction. • Maintain SLA, SLO, and SLI metrics for system reliability. • Build and maintain observability solutions (logging, metrics, tracing). • Participate in on-call rotations and handle production incidents.
Required Skills • Strong experience in Site Reliability Engineering (SRE) • Hands-on experience with Incident Management and Production Support • Knowledge of AI tools / AI-driven automation / AI-based monitoring • Experience with Cloud Platforms (AWS / Azure / GCP) • Familiarity with Monitoring Tools (Prometheus, Grafana, Datadog, Splunk, etc.) • Experience with Linux / scripting (Python, Bash) • Knowledge of CI/CD pipelines and DevOps practices • Understanding of containerization (Docker, Kubernetes)
Preferred Qualifications • Experience with AIOps platforms • Knowledge of Infrastructure as Code (Terraform / Ansible) • Strong debugging and problem-solving skills • Experience working in high-availability distributed systems
Why Join Us • Opportunity to work on modern AI-driven infrastructure • Exposure to large-scale production environments • Collaborative and growth-focused work culture
How to Apply
Interested candidates with Immediate to 30 days notice period can apply via Indeed or share their updated resume.
Job Types: Full-time, Permanent
Pay: ₹1,200,000.00 per year
Benefits: • Cell phone reimbursement • Food provided • Health insurance • Paid sick time • Paid time off • Provident Fund • Work from home
Work Location: In person
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free