Senior Site Reliability Engineer Ensuring High-Availability and Optimization
EPAM Systems Inc
About the role
About
Step into a leading role focused on reliability and performance in trading environments. As a Site Reliability Engineer, you’ll drive critical performance monitoring, observability, and system optimization initiatives.
This leadership role combines strategic oversight with hands-on technical expertise, including managing a team of SRE engineers. You will define reliability standards and improve monitoring frameworks to ensure operational excellence in high-availability contexts. Additionally, leading incident management and identifying automation opportunities are key components of the position.
Key Responsibilities
- Establish reliability strategy covering trading portfolio
- Lead and mentor Site Reliability Engineering team
- Own SLA/SLO/SLI framework management
- Configure extensive monitoring and alerting systems
- Analyze incidents for root cause identification
Requirements
- Over 8 years of Site Reliability or related experience
- Proven leadership in technical direction and mentorship
- Strong knowledge of SLA/SLO/SLI governance
- Experience with Microsoft Azure suite
- Proficient in Dynatrace configuration and applications
Champion system reliability and performance across digital trading environments through effective leadership and strategic optimization initiatives.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free