Drive Reliability and Performance as a Senior Site Reliability Engineer
EPAM Systems Inc
About the role
Become a pivotal force in enhancing system reliability and performance for digital trading products. As a Lead Site Reliability Engineer, you’ll spearhead monitoring initiatives to ensure high availability and continuous improvement.
This role requires leadership within a team of SRE engineers, overseeing infrastructure and application performance. You'll define a strategic reliability vision while ensuring stable connectivity to external partners. Responsibilities include optimizing monitoring systems and leading incident management efforts in a high-stakes environment.
Key Responsibilities:
- Define a reliability vision for trading portfolio
- Oversee SRE team, providing mentorship and guidance
- Own SLA/SLO/SLI frameworks and service health reporting
- Configure and optimize monitoring systems
- Analyze performance and manage critical incidents
Requirements:
- 8+ years in Site Reliability Engineering or Dev Ops
- Proven leadership experience in technical roles
- Strong experience with SLA/SLO/SLI governance
- Hands-on knowledge of Microsoft Azure environments
- Proficiency with Dynatrace in production settings
Elevate system reliability and performance through strategic initiatives, mentorship, and effective incident management in a dynamic trading environment.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free