Senior Site Reliability Engineer
Sage Recruiting Inc.
About the role
Location
Quispamsis
Base Pay Range
CA$/yr - CA$/yr
This range is provided by Sage Recruiting Inc.. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Title
Senior Site Reliability Engineer (Founding Role)
Location
Canada
About the Role
This team is building a brand-new fintech platform from the ground up and is looking for an experienced Senior Site Reliability Engineer to join as one of the founding members of the team. This is a high-impact, staff-level position where you’ll shape the early architecture, define reliability practices, and directly influence how the platform scales.
If you love solving hard problems, building things that last, and having your fingerprints all over a greenfield project, this one’s for you.
What You’ll Do
- Build and own the SRE program from scratch, including processes, tools, and best practices
- Lead incident response, on-call, alerting, escalation, and post-incident reviews
- Partner closely with engineering and infrastructure to design reliable, scalable systems
- Define and meet uptime SLAs using metrics, tracing, and observability tools
- Develop automation to improve deployment speed, reliability, and performance
- Strengthen CI/CD pipelines (Git Hub Actions, ArgoCD, or similar)
- Contribute to architectural decisions that shape the product’s long-term success
What You Bring
- 10+ years of overall technical experience, including 5–8+ years in SRE, Dev Ops, or Systems Engineering roles in high-volume, 24×7 production environments
- Proficiency with AWS and Linux systems
- Deep experience with containerization (Docker, Kubernetes, EKS)
- Strong understanding of observability (logs, traces, metrics)
- Programming/scripting in Python or Bash for automation
- CI/CD experience (Git Hub Actions, ArgoCD, or similar)
- Excellent problem-solving skills and attention to detail
- Clear communication skills and a collaborative mindset
Nice to Have
- Fintech or startup experience
- Familiarity with Type Script/Node.js
- Experience with Kafka, Redis, and RDS
The Challenge
This is a “build it from the ground up” role. You’ll initially take full ownership of reliability and on-call, which means a heavier on-call load early on until the team grows. Once additional hires are made, this will transition into a more balanced rotation. You’ll be instrumental in designing the playbooks, tools, and culture for reliability excellence.
Compensation
- Unlimited vacation
- Comprehensive health and dental benefits
Seniority Level
Director
Employment Type
Full-time
Job Function
Information Technology
Industries
Technology, Information and Media
#J-18808-Ljbffr
Requirements
- 10+ years of overall technical experience, including 5–8+ years in SRE, Dev Ops, or Systems Engineering roles in high-volume, 24×7 production environments
- Proficiency with AWS and Linux systems
- Deep experience with containerization (Docker, Kubernetes, EKS)
- Strong understanding of observability (logs, traces, metrics)
- Programming/scripting in Python or Bash for automation
- CI/CD experience (Git Hub Actions, ArgoCD, or similar)
- Excellent problem-solving skills and attention to detail
- Clear communication skills and a collaborative mindset
Responsibilities
- Build and own the SRE program from scratch, including processes, tools, and best practices
- Lead incident response, on-call, alerting, escalation, and post-incident reviews
- Partner closely with engineering and infrastructure to design reliable, scalable systems
- Define and meet uptime SLAs using metrics, tracing, and observability tools
- Develop automation to improve deployment speed, reliability, and performance
- Strengthen CI/CD pipelines (Git Hub Actions, ArgoCD, or similar)
- Contribute to architectural decisions that shape the product’s long-term success
Benefits
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free