C
Site Reliability Engineer (SRE) / Scrum Master
COFOMO
Montreal · On-site Full-time 4d ago
About the role
Responsibilities
- Design, operate, and improve highly available, resilient, and secure systems
- Define and track SLOs, SLIs, and SLAs
- Implement and maintain observability (monitoring, logging, alerting)
- Automate operations (CI/CD, infrastructure as code, self‑remediation)
- Handle incidents (post‑mortems without blame, RCA)
- Collaborate with development teams to improve shift‑left reliability
- Participate in architectural decisions and technical reviews
- Optimize cost, performance, and system capacity
- Facilitate Scrum ceremonies (Sprint Planning, Daily, Review, Retrospective)
- Support the team in the adoption of Agile and DevOps principles
- Remove obstacles and protect the team from external interruptions
- Foster collaboration between teams (Dev, Ops, Security, Product)
- Work with the Product Owner on the backlog (prioritization, quality of user stories)
- Measure and improve team performance (velocity, flow, quality)
- Encourage a culture of continuous improvement and collective responsibility
- Act as an Agile leader, servant and coach
Requirements
- Possess Scrum certifications (CSM, PSM, SAFe), as well as AWS certification (an asset)
- Have a good experience with Kubernetes / Docker
- Have CI/CD experience (GitHub Actions, DevOps, etc.)
- Have proven experience as a Scrum Master or similar role
- Demonstrate experience in high‑criticality environments (an asset)
- Have experience with observability tools (Splunk, Datadog, etc.)
- Have a solid understanding of Linux systems, networks and security
- Have an excellent understanding of cloud environments (AWS)
- Have a good level of scripting (Python, Bash, Go, etc.)
Requirements
- Possess Scrum certifications (CSM, PSM, SAFe)
- Possess AWS certification
- Have a good experience with Kubernetes / Docker
- Have CI/CD experience (GitHub Actions, DevOps, etc.)
- Have proven experience as a Scrum Master or similar role
- Demonstrate experience in high-criticality environments
- Have experience with observability tools (Splunk, Datadog, etc.)
- Have a solid understanding of Linux systems, networks and security
- Have an excellent understanding of cloud environments (AWS)
- Have a good level of scripting (Python, Bash, Go, etc.)
Responsibilities
- Design, operate, and improve highly available, resilient, and secure systems
- Define and track SLOs, SLIs, and SLAs
- Implement and maintain observability (monitoring, logging, alerting)
- Automate operations (CI/CD, infrastructure as code, self-remediation)
- Handle incidents (post-mortems without blame, RCA)
- Collaborate with development teams to improve shift-left reliability
- Participate in architectural decisions and technical reviews
- Optimize cost, performance, and system capacity
- Facilitate Scrum ceremonies (Sprint Planning, Daily, Review, Retrospective)
- Support the team in the adoption of Agile and DevOps principles
- Remove obstacles and protect the team from external interruptions
- Foster collaboration between teams (Dev, Ops, Security, Product)
- Work with the Product Owner on the backlog (prioritization, quality of user stories)
- Measure and improve team performance (velocity, flow, quality)
- Encourage a culture of continuous improvement and collective responsibility
- Act as an Agile leader, servant and coach
Skills
AWSBashCI/CDDockerDevOpsGoGitHub ActionsKubernetesLinuxPythonSAFeScrumSplunk
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free