Skip to content
mimi

SRE/ Observablity Engineer

Realign

Toronto · On-site Full-time Mid Level Today

About the role

About

We are looking for a Mid-Level Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms. This role focuses on hands-on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers. You will collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production-ready.

Responsibilities

  • Implement and maintain metrics, logs, and traces for applications and infrastructure.
  • Assist with onboarding applications into observability platforms (e.g., Dynatrace, ELK, Datadog).
  • Configure dashboards, alerts, and basic anomaly detection.
  • Work with development teams to enable structured logging, basic distributed tracing, and core metrics.
  • Validate observability requirements during Production Readiness Reviews (PRR).
  • Troubleshoot missing or low-quality telemetry.
  • Configure alerts based on golden signals (latency, errors, traffic, saturation).
  • Help reduce alert noise by tuning thresholds and alert logic.
  • Support incident response by gathering logs, metrics, and traces.
  • Support root cause analysis using observability tools.
  • Maintain dashboards and documentation used by on-call and support teams.
  • Participate in on-call rotations (as applicable).
  • Assist in automating observability onboarding and validation tasks.
  • Create and maintain reusable dashboards and alert templates.
  • Follow established observability standards and best practices.

Required Qualifications

  • 4 years of experience in Observability, or SRE.
  • Working knowledge of metrics, logs, and basic tracing concepts.
  • Hands-on experience with at least one observability platform (Dynatrace, Elastic ELK, Datadog, New Relic, etc.).
  • Basic understanding of SLIs, SLOs and service health indicators.
  • Experience with cloud platforms or hybrid environments.
  • Ability to write scripts (Python, Bash, PowerShell) for automation and troubleshooting.

Preferred Qualifications

  • Experience with Open Telemetry or APM agents.
  • Familiarity with Kubernetes or containerized workloads.
  • Experience working with incident management tools (PagerDuty, ServiceNow).
  • Exposure to Dynatrace, Kibana, ELK or similar cloud-native monitoring.
  • Experience in regulated or enterprise environments.

Skills

BashDatadogDynatraceElastic ELKKubernetesNew RelicPagerDutyPowerShellPythonServiceNow

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free