Skip to content
mimi

Site Reliability Engineer / Platform Operations Engineer

Targeted Talent

Remote · Canada Full-time Senior 3w ago

About the role

Below is a polished, ready‑to‑post version of the Site Reliability Engineer / Platform Operations Engineer role you described, followed by a few quick tips on where and how to share it for maximum visibility.


📣 Site Reliability Engineer (Platform Operations) – Remote → Relocation (Calgary / Winnipeg)

Client: Global enterprise‑scale technology company (the product is one you probably use every day)
Employment type: Permanent, full‑time
Location: Remote start (full remote) – later relocation to Calgary, AB or Winnipeg, MB (company will support the move)

Why Join?

  • Work on a high‑impact, globally‑distributed AWS platform that serves millions of users.
  • Lead the design and execution of “Wargames” – realistic failure‑injection drills that shape the future of our reliability posture.
  • Directly influence the roadmap for Platform & Service Operations Engineering.
  • Competitive salary + comprehensive benefits + relocation assistance + flexible work‑from‑home policy.

What You’ll Own

Area Responsibilities
Platform Roadmap Lead development projects, provide technical guidance, and deliver on the Platform & Service Operations Engineering roadmap.
Wargames & Chaos Engineering Design, implement, and run operational wargames to test response processes, surface weaknesses, and drive continuous improvement.
Incident Management Act as the technical and managerial escalation point for SOC engineers; lead major‑incident response, post‑mortems, and remediation.
Production Troubleshooting Reproduce, diagnose, and mitigate issues in production environments; own end‑to‑end resolution.
Mentorship Coach and mentor junior engineers; foster a culture of learning and knowledge sharing.
AWS Operations at Scale Operate, monitor, and continuously improve a global, multi‑region AWS footprint.

What You Bring

Must‑Have Nice‑to‑Have
Strong troubleshooting & investigative mindset Experience with Ansible, Terraform, Python
Hands‑on experience with AWS (or other major cloud provider) Serverless & container orchestration (e.g., EKS, Fargate, Lambda)
Production‑grade Java development ELK stack, Prometheus/Grafana, Graphite
Major‑incident leadership on large‑scale platforms Use of distributed tracing tools (Jaeger, Zipkin, OpenTelemetry)
Deep understanding of distributed web applications Prior work in Chaos Engineering / Failure Injection
Automation of operational tasks (any language) Agile‑scrum experience
Relational & NoSQL data modeling Prior SRE‑specific role
Proven mentorship & influence

Bonus Points

  • Built or maintained Infrastructure‑as‑Code pipelines (Terraform, CloudFormation).
  • Developed CI/CD pipelines for Java micro‑services.
  • Implemented observability dashboards and alerting strategies.
  • Conducted post‑mortems that drove measurable reliability improvements.

How to Apply

If you thrive on solving complex reliability challenges, love automating the mundane, and want to shape the future of a global platform, we’d love to hear from you.

Submit your résumé and a brief cover letter (max 300 words) highlighting:

  1. A recent incident you owned from detection to resolution.
  2. A wargame or chaos‑engineering experiment you designed or participated in.
  3. Your experience with AWS at scale and any IaC tools you’ve used.

Apply directly through the job posting link or email [recruiter@yourcompany.com] with the subject line “SRE – Remote/Calgary/Winnipeg”.


Quick Posting Tips

Platform Why It Works Suggested Tagline
LinkedIn Professional network, strong SRE community “Scale‑first SRE needed for global AWS platform – remote start, relocate to Canada!”
Indeed High traffic, easy to filter candidates “Site Reliability Engineer – Remote → Calgary/Winnipeg”
Stack Overflow Jobs Developers actively looking for engineering roles “Lead SRE – Own AWS platform, design chaos‑engineered wargames”
GitHub Jobs (or community boards) Engineers who contribute to open‑source & love automation “SRE – Build, break, and fix a global cloud platform”
Reddit – r/remotejobs, r/devops, r/aws Niche communities, high engagement “Remote SRE role (later relocate to Canada) – work on massive AWS infra”
  • Add a “Benefits” section (health, 401k/ RRSP matching, learning budget, conference tickets, relocation stipend).
  • Include a salary range (e.g., CAD 120‑150 k + bonuses) – transparency attracts more qualified applicants.
  • Use keywords: “Site Reliability Engineer”, “SRE”, “AWS”, “Java”, “Chaos Engineering”, “Incident Management”, “Terraform”, “Observability”.

Ready to Go?

Copy the formatted description above into your ATS or posting platform, tweak the company‑specific details (benefits, salary, recruiter email), and you’ll have a compelling, SEO‑friendly ad that speaks directly to the talent you need.

Good luck finding the perfect SRE! 🚀

Requirements

  • Strong Troubleshooting, problem-solving and investigative skills
  • Experience of AWS or Other cloud providers
  • Experience developing in Java
  • Major incident management on experience operating production platforms at scale
  • Experience working with distributed web applications
  • Experience Automating operational tasks / Processes using other languages
  • Understanding of relational and/or NoSQL data structures
  • Experience mentoring/influencing peers
  • Identifying improvements, highlighting risks vs benefits, and translating them into technical requirements

Responsibilities

  • Own development projects, providing technical guidance and delivering against the Platform & Service Operations Engineering roadmap.
  • Designing and Implementing Wargames to test our operational response and identify areas of weakness in our platforms.
  • Technical and Management Escalation point for Service Operations Centre (SOC) engineers and during major incidents.
  • Troubleshooting, reproducing and mitigating issues in our production environments
  • Mentoring other team members.
  • Operate global AWS Platforms at scale

Skills

AWSAnsibleELKGrafanaGraphiteJavaNoSQLPrometheusPythonTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free