Site Reliability Engineer / Platform Operations Engineer

Targeted Talent

Remote · Canada Full-time Senior 4mo ago

About the role

Below is a polished, ready‑to‑post version of the Site Reliability Engineer / Platform Operations Engineer role you described, followed by a few quick tips on where and how to share it for maximum visibility.

📣 Site Reliability Engineer (Platform Operations) – Remote → Relocation (Calgary / Winnipeg)

Client: Global enterprise‑scale technology company (the product is one you probably use every day)
Employment type: Permanent, full‑time
Location: Remote start (full remote) – later relocation to Calgary, AB or Winnipeg, MB (company will support the move)

Why Join?

Work on a high‑impact, globally‑distributed AWS platform that serves millions of users.
Lead the design and execution of “Wargames” – realistic failure‑injection drills that shape the future of our reliability posture.
Directly influence the roadmap for Platform & Service Operations Engineering.
Competitive salary + comprehensive benefits + relocation assistance + flexible work‑from‑home policy.

What You’ll Own

Area	Responsibilities
Platform Roadmap	Lead development projects, provide technical guidance, and deliver on the Platform & Service Operations Engineering roadmap.
Wargames & Chaos Engineering	Design, implement, and run operational wargames to test response processes, surface weaknesses, and drive continuous improvement.
Incident Management	Act as the technical and managerial escalation point for SOC engineers; lead major‑incident response, post‑mortems, and remediation.
Production Troubleshooting	Reproduce, diagnose, and mitigate issues in production environments; own end‑to‑end resolution.
Mentorship	Coach and mentor junior engineers; foster a culture of learning and knowledge sharing.
AWS Operations at Scale	Operate, monitor, and continuously improve a global, multi‑region AWS footprint.

What You Bring

Must‑Have	Nice‑to‑Have
Strong troubleshooting & investigative mindset	Experience with Ansible, Terraform, Python
Hands‑on experience with AWS (or other major cloud provider)	Serverless & container orchestration (e.g., EKS, Fargate, Lambda)
Production‑grade Java development	ELK stack, Prometheus/Grafana, Graphite
Major‑incident leadership on large‑scale platforms	Use of distributed tracing tools (Jaeger, Zipkin, OpenTelemetry)
Deep understanding of distributed web applications	Prior work in Chaos Engineering / Failure Injection
Automation of operational tasks (any language)	Agile‑scrum experience
Relational & NoSQL data modeling	Prior SRE‑specific role
Proven mentorship & influence

Bonus Points

Built or maintained Infrastructure‑as‑Code pipelines (Terraform, CloudFormation).
Developed CI/CD pipelines for Java micro‑services.
Implemented observability dashboards and alerting strategies.
Conducted post‑mortems that drove measurable reliability improvements.

How to Apply

If you thrive on solving complex reliability challenges, love automating the mundane, and want to shape the future of a global platform, we’d love to hear from you.

Submit your résumé and a brief cover letter (max 300 words) highlighting:

A recent incident you owned from detection to resolution.
A wargame or chaos‑engineering experiment you designed or participated in.
Your experience with AWS at scale and any IaC tools you’ve used.

Apply directly through the job posting link or email [recruiter@yourcompany.com] with the subject line “SRE – Remote/Calgary/Winnipeg”.

Quick Posting Tips

Platform	Why It Works	Suggested Tagline
LinkedIn	Professional network, strong SRE community	“Scale‑first SRE needed for global AWS platform – remote start, relocate to Canada!”
Indeed	High traffic, easy to filter candidates	“Site Reliability Engineer – Remote → Calgary/Winnipeg”
Stack Overflow Jobs	Developers actively looking for engineering roles	“Lead SRE – Own AWS platform, design chaos‑engineered wargames”
GitHub Jobs (or community boards)	Engineers who contribute to open‑source & love automation	“SRE – Build, break, and fix a global cloud platform”
Reddit – r/remotejobs, r/devops, r/aws	Niche communities, high engagement	“Remote SRE role (later relocate to Canada) – work on massive AWS infra”

Add a “Benefits” section (health, 401k/ RRSP matching, learning budget, conference tickets, relocation stipend).
Include a salary range (e.g., CAD 120‑150 k + bonuses) – transparency attracts more qualified applicants.
Use keywords: “Site Reliability Engineer”, “SRE”, “AWS”, “Java”, “Chaos Engineering”, “Incident Management”, “Terraform”, “Observability”.

Ready to Go?

Copy the formatted description above into your ATS or posting platform, tweak the company‑specific details (benefits, salary, recruiter email), and you’ll have a compelling, SEO‑friendly ad that speaks directly to the talent you need.

Good luck finding the perfect SRE! 🚀

Skills

AWSAnsibleELKGrafanaGraphiteJavaNoSQLPrometheusPythonTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer / Platform Operations Engineer

About the role

📣 Site Reliability Engineer (Platform Operations) – Remote → Relocation (Calgary / Winnipeg)

Why Join?

What You’ll Own

What You Bring

Bonus Points

How to Apply

Quick Posting Tips

Ready to Go?

Skills

Similar roles

AVP, GWPC Technical Manager

(Senior) Software Engineer

Mid-Level IoT Engineer

Don't send a generic resume

Site Reliability Engineer / Platform Operations Engineer

About the role

📣 Site Reliability Engineer (Platform Operations) – Remote → Relocation (Calgary / Winnipeg)

Why Join?

What You’ll Own

What You Bring

Bonus Points

How to Apply

Quick Posting Tips

Ready to Go?

Skills

Similar roles

AVP, GWPC Technical Manager

(Senior) Software Engineer

Mid-Level IoT Engineer

Don't send a generic resume

📣 Site Reliability Engineer (Platform Operations) – Remote → Relocation (Calgary / Winnipeg)