Skip to content
mimi

Senior Site Reliability Engineer Ensuring High-Availability and Optimization

EPAM Systems Inc

Apohaqui · On-site Full-time Senior 5d ago

About the role

About

Step into a leading role focused on reliability and performance in trading environments. As a Site Reliability Engineer, you’ll drive critical performance monitoring, observability, and system optimization initiatives.

This leadership role combines strategic oversight with hands-on technical expertise, including managing a team of SRE engineers. You will define reliability standards and improve monitoring frameworks to ensure operational excellence in high-availability contexts. Additionally, leading incident management and identifying automation opportunities are key components of the position.

Key Responsibilities

  • Establish reliability strategy covering trading portfolio
  • Lead and mentor Site Reliability Engineering team
  • Own SLA/SLO/SLI framework management
  • Configure extensive monitoring and alerting systems
  • Analyze incidents for root cause identification

Requirements

  • Over 8 years of Site Reliability or related experience
  • Proven leadership in technical direction and mentorship
  • Strong knowledge of SLA/SLO/SLI governance
  • Experience with Microsoft Azure suite
  • Proficient in Dynatrace configuration and applications

Champion system reliability and performance across digital trading environments through effective leadership and strategic optimization initiatives.

Skills

AzureDynatrace

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free