Skip to content
mimi

Site Reliability Engineer II - PC Financial & Services

Loblaw

Remote · Canada Full-time Mid Level CA$100k – CA$132k/yr 3w ago

About the role

About Loblaw Companies Limited

Come make your difference in communities across Canada, where authenticity, trust and making connections is valued – as we shape the future of Canadian retail, together. Our unique position as one of the country's largest employers, coupled with our commitment to positively impact the lives of all Canadians, provides our colleagues a range of opportunities and experiences to help Canadians Live Life Well®.

At Loblaw Companies Limited, we succeed through collaboration and commitment and set a high bar for ourselves and those around us. Whether you are just starting your career, re‑entering the workforce, or looking for a new job, this is where you belong.

Loblaw Technology powers large‑scale retail and digital platforms that serve millions of Canadians every day. The Site Reliability Engineering team builds and operates the enterprise observability and reliability stack across Azure and GCP, enabling product teams to build systems that are reliable by design.

This role will support PC Financial & Services.

Role Overview

Site Reliability Engineer II – SRE team

  • Design, operate, and continuously improve our shared observability and reliability platform.
  • Hands‑on engineering role with clear ownership of key platform components, reporting directly to the SRE Lead.
  • Work closely with application and platform teams to ensure systems are observable, resilient, and reliable by design, while enabling scale through automation, standards, and self‑service patterns.

What You’ll Do

  • Design, build, and improve our enterprise observability stack (metrics, logs, traces, dashboards, alerting)
  • Own and operate platform components across Azure and GCP using Infrastructure as Code and GitOps practices
  • Improve reliability of Kubernetes workloads and platform services through automation, standards, and performance optimization
  • Define and enforce CI/CD patterns for SRE tooling, Helm charts, Terraform modules, and pipelines
  • Support production systems, participate in incident response, root cause analysis and reliability improvements

What You Bring

  • Embrace software engineering mindset with hands‑on experience in Go, Python, Java, or similar
  • Experience with Kubernetes (AKS, GKE, or OpenShift), Linux, Helm, and cloud‑native architectures
  • Experience operating modern observability stacks (Prometheus, Alertmanager, Grafana, Elasticsearch, APM tools such as Sentry)
  • Hands‑on experience with Terraform, CI/CD systems (Jenkins or GitLab CI), and GitOps tools (ArgoCD or equivalent)
  • Excellent problem‑solving skills in distributed systems and a passion for automation and reducing operational toil

Nice to Have

  • Experience defining SLIs, SLOs, SLAs, and managing error budgets
  • Exposure to regulated or enterprise‑scale environments
  • Experience with secrets management and policy enforcement
  • Knowledge of application instrumentation patterns (Micrometer, Spring Boot Actuator, Prometheus metrics)

How Success Is Measured

Success in this role is not defined only by uptime. It is measured by how effectively you:

  • Automate repetitive operational work
  • Standardize reliability practices across teams
  • Enable application teams to self‑serve observability patterns
  • Improve platform resilience and scalability

What Loblaw Offers You

  • Flexibility and balance, and an environment that sets you up for success no matter where your workspace is located
  • A fast‑paced technology environment supporting stores, colleagues, and customers across Canada
  • Commitment to building diverse and inclusive teams – if your experience does not match every requirement but you believe you would be a strong contributor, we encourage you to apply
  • Accommodation is available upon request throughout the recruitment process

Commitment to Sustainability and Social Impact

Our approach to sustainability and social impact is based on three pillars – Environment, Sourcing and Community – and we are constantly looking for ways to demonstrate leadership in these important areas. Our CORE Values – Care, Ownership, Respect and Excellence – guide all our decision‑making and come to life through our Blue Culture.

We offer progressive careers, comprehensive training, flexibility, and other competitive benefits – reasons why we are one of Canada’s Top Employers, Canada’s Best Diversity Employers, Canada’s Greenest Employers & Canada’s Top Employers for Young People.

Diversity, Equity, and Inclusion

We have a long‑standing focus on diversity, equity and inclusion because we know it will make our company a better place to work and shop. We are committed to creating accessible environments for our colleagues, candidates and customers. Requests for accommodation due to a disability (visible or invisible, temporary or permanent) can be made at any stage of application and employment.

Application Details

  • Candidates who are 18 years or older are required to complete a criminal background check. Details will be provided through the application process.

Hiring Range / Échelle salariale à l’embauche

  • $100,000.00 – $132,000.00 per year (per annum)

A candidate’s experience and knowledge as well as the geographical region in which the position is located may be factored into the pay a candidate receives for this position. This posting is for an existing vacancy. The Company uses artificial intelligence for the purpose of screening, assessing and/or selecting applicants for this position.

Tags

#EN
#SS #LTnA #ON

Requirements

  • Embrace software engineering mindset with hands-on experience in Go, Python, Java, or similar
  • Experience with Kubernetes (AKS, GKE, or OpenShift), Linux, Helm, and cloud-native architectures
  • Experience operating modern observability stacks (Prometheus, Alertmanager, Grafana, Elasticsearch, APM tools such as Sentry)
  • Hands-on experience with Terraform, CI/CD systems (Jenkins or GitLab CI), and GitOps tools (ArgoCD or equivalent)
  • Excellent problem-solving skills in distributed systems and a passion for automation and reducing operational toil

Responsibilities

  • Design, build, and improve our enterprise observability stack (metrics, logs, traces, dashboards, alerting)
  • Own and operate platform components across Azure and GCP using Infrastructure as Code and GitOps practices
  • Improve reliability of Kubernetes workloads and platform services through automation, standards, and performance optimization
  • Define and enforce CI/CD patterns for SRE tooling, Helm charts, Terraform modules, and pipelines
  • Support production systems, participate in incident response, root cause analysis and reliability improvements

Benefits

progressive careerscomprehensive trainingflexibility

Skills

APMArgoCDAzureCI/CDDockerElasticsearchGCPGitGitOpsGitLab CIGoGrafanaHelmInfrastructure as CodeJavaJenkinsKubernetesLinuxPrometheusPythonSentryTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free