Skip to content
mimi

Lead Site Reliability Engineer

EPAM Systems

Canada · On-site Full-time Senior Yesterday

About the role

About the Role

Become a pivotal force in enhancing system reliability and performance for digital trading products. As a Lead Site Reliability Engineer, you’ll spearhead monitoring initiatives to ensure high availability and continuous improvement. This role requires leadership within a team of SRE engineers, overseeing infrastructure and application performance. You'll define a strategic reliability vision while ensuring stable connectivity to external partners, optimizing monitoring systems, and leading incident management efforts in a high‑stakes environment.

Key Responsibilities

  • Define a reliability vision for the trading portfolio
  • Oversee the SRE team, providing mentorship and guidance
  • Own SLA/SLO/SLI frameworks and service health reporting
  • Configure and optimize monitoring systems
  • Analyze performance and manage critical incidents

Requirements

  • 8+ years in Site Reliability Engineering or DevOps
  • Proven leadership experience in technical roles
  • Strong experience with SLA/SLO/SLI governance
  • Hands‑on knowledge of Microsoft Azure environments
  • Proficiency with Dynatrace in production settings

Requirements

  • 8+ years in Site Reliability Engineering or DevOps
  • Proven leadership experience in technical roles
  • Strong experience with SLA/SLO/SLI governance
  • Hands-on knowledge of Microsoft Azure environments
  • Proficiency with Dynatrace in production settings

Responsibilities

  • Define a reliability vision for trading portfolio
  • Oversee SRE team, providing mentorship and guidance
  • Own SLA/SLO/SLI frameworks and service health reporting
  • Configure and optimize monitoring systems
  • Analyze performance and manage critical incidents

Skills

AzureDevOpsDynatraceSRE

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free