R
SRE/ Observablity Engineer
Realign
Toronto · On-site Full-time Mid Level Today
About the role
About
We are looking for a Mid-Level Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms. This role focuses on hands-on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers. You will collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production-ready.
Responsibilities
- Implement and maintain metrics, logs, and traces for applications and infrastructure.
- Assist with onboarding applications into observability platforms (e.g., Dynatrace, ELK, Datadog).
- Configure dashboards, alerts, and basic anomaly detection.
- Work with development teams to enable structured logging, basic distributed tracing, and core metrics.
- Validate observability requirements during Production Readiness Reviews (PRR).
- Troubleshoot missing or low-quality telemetry.
- Configure alerts based on golden signals (latency, errors, traffic, saturation).
- Help reduce alert noise by tuning thresholds and alert logic.
- Support incident response by gathering logs, metrics, and traces.
- Support root cause analysis using observability tools.
- Maintain dashboards and documentation used by on-call and support teams.
- Participate in on-call rotations (as applicable).
- Assist in automating observability onboarding and validation tasks.
- Create and maintain reusable dashboards and alert templates.
- Follow established observability standards and best practices.
Required Qualifications
- 4 years of experience in Observability, or SRE.
- Working knowledge of metrics, logs, and basic tracing concepts.
- Hands-on experience with at least one observability platform (Dynatrace, Elastic ELK, Datadog, New Relic, etc.).
- Basic understanding of SLIs, SLOs and service health indicators.
- Experience with cloud platforms or hybrid environments.
- Ability to write scripts (Python, Bash, PowerShell) for automation and troubleshooting.
Preferred Qualifications
- Experience with Open Telemetry or APM agents.
- Familiarity with Kubernetes or containerized workloads.
- Experience working with incident management tools (PagerDuty, ServiceNow).
- Exposure to Dynatrace, Kibana, ELK or similar cloud-native monitoring.
- Experience in regulated or enterprise environments.
Skills
BashDatadogDynatraceElastic ELKKubernetesNew RelicPagerDutyPowerShellPythonServiceNow
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free