Site Reliability Engineer (SRE) / Scrum Master

COFOMO

Montreal · On-site Full-time 1mo ago

About the role

Design, operate, and improve highly available, resilient, and secure systems
Define and track SLOs, SLIs, and SLAs
Implement and maintain observability (monitoring, logging, alerting)
Automate operations (CI/CD, infrastructure as code, self‑remediation)
Handle incidents (post‑mortems without blame, RCA)
Collaborate with development teams to improve shift‑left reliability
Participate in architectural decisions and technical reviews
Optimize cost, performance, and system capacity
Facilitate Scrum ceremonies (Sprint Planning, Daily, Review, Retrospective)
Support the team in the adoption of Agile and DevOps principles
Remove obstacles and protect the team from external interruptions
Foster collaboration between teams (Dev, Ops, Security, Product)
Work with the Product Owner on the backlog (prioritization, quality of user stories)
Measure and improve team performance (velocity, flow, quality)
Encourage a culture of continuous improvement and collective responsibility
Act as an Agile leader, servant and coach

Possess Scrum certifications (CSM, PSM, SAFe), as well as AWS certification (an asset)
Have a good experience with Kubernetes / Docker
Have CI/CD experience (GitHub Actions, DevOps, etc.)
Have proven experience as a Scrum Master or similar role
Demonstrate experience in high‑criticality environments (an asset)
Have experience with observability tools (Splunk, Datadog, etc.)
Have a solid understanding of Linux systems, networks and security
Have an excellent understanding of cloud environments (AWS)
Have a good level of scripting (Python, Bash, Go, etc.)

AWSBashCI/CDDockerDevOpsGoGitHub ActionsKubernetesLinuxPythonSAFeScrumSplunk

Ruby Labs

Google

$147k – $211k/yr

Glencore AG

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.