ES
Lead Site Reliability Engineer
EPAM Systems
Canada · On-site Full-time Senior Yesterday
About the role
About the Role
Become a pivotal force in enhancing system reliability and performance for digital trading products. As a Lead Site Reliability Engineer, you’ll spearhead monitoring initiatives to ensure high availability and continuous improvement. This role requires leadership within a team of SRE engineers, overseeing infrastructure and application performance. You'll define a strategic reliability vision while ensuring stable connectivity to external partners, optimizing monitoring systems, and leading incident management efforts in a high‑stakes environment.
Key Responsibilities
- Define a reliability vision for the trading portfolio
- Oversee the SRE team, providing mentorship and guidance
- Own SLA/SLO/SLI frameworks and service health reporting
- Configure and optimize monitoring systems
- Analyze performance and manage critical incidents
Requirements
- 8+ years in Site Reliability Engineering or DevOps
- Proven leadership experience in technical roles
- Strong experience with SLA/SLO/SLI governance
- Hands‑on knowledge of Microsoft Azure environments
- Proficiency with Dynatrace in production settings
Requirements
- 8+ years in Site Reliability Engineering or DevOps
- Proven leadership experience in technical roles
- Strong experience with SLA/SLO/SLI governance
- Hands-on knowledge of Microsoft Azure environments
- Proficiency with Dynatrace in production settings
Responsibilities
- Define a reliability vision for trading portfolio
- Oversee SRE team, providing mentorship and guidance
- Own SLA/SLO/SLI frameworks and service health reporting
- Configure and optimize monitoring systems
- Analyze performance and manage critical incidents
Skills
AzureDevOpsDynatraceSRE
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free