Skip to content
mimi

Observability Engineer – Production Support & Monitoring (SRE)

Mississauga · Hybrid Contract Mid Level 4d ago

About the role

Contract

6 months (high likelihood of extension)

Location

core downtown Toronto

Schedule

Hybrid – 2 days onsite

Rate

market rate (looking for the best experience/rate ratio)

Main Deliverables

  • Ensure reliability, performance, and capacity of enterprise production platforms
  • Own and operate observability and monitoring tooling across infrastructure and applications
  • Execute automation and operational hygiene to support roadmap-driven growth

Technical Stack

  • Monitoring & Observability: ITRS Geneos (primary), ISINGA / Insignia, Faddom, Corvil, Dynatrace
  • Infrastructure: Linux / Unix, VMware, AWS (CloudWatch)
  • Scripting & Automation: Perl, Bash / Shell, Python
  • Messaging / Middleware: IBM MQ, Market Data Monitoring
  • Databases: SQL-based relational databases (operational support)
  • ITSM & Collaboration: ServiceNow, Microsoft Teams
  • Legacy / Transition: SCOM (planned decommissioning)

Must‑Haves

  • 5+ years of experience in Production Support, SRE, or Operations Engineering
  • Strong, hands-on ITRS Geneos experience in enterprise production environments
  • Advanced scripting skills in Perl, Bash/Shell, and Python
  • Experience supporting large-scale production environments (hundreds to thousands of servers)
  • Strong Linux / Unix systems knowledge
  • Experience with enterprise monitoring platforms (Geneos, Dynatrace, Corvil, Faddom)
  • Experience with incident and event management using ServiceNow
  • Operational SQL skills for troubleshooting and validation
  • Willingness to participate in a defined on-call rotation

Other Requirements

  • Experience monitoring infrastructure and applications (CPU, memory, disk, network, processes)
  • Experience with capacity planning, trend analysis, and platform scaling
  • Familiarity with monitoring integrations: AWS CloudWatch, VMware, IBM MQ and Synthetic Monitoring, Market Data Monitoring
  • Experience integrating alerts with: ServiceNow, Microsoft Teams, Email and webhook-based notifications
  • Exposure to hybrid environments (on‑prem + cloud)

Responsibilities

  • Provide L2/L3 production support for business‑critical platforms
  • Operate and enhance enterprise monitoring platforms, with Geneos as the core solution
  • Perform capacity planning and infrastructure performance analysis
  • Develop automation to:
    • Execute hygiene routines (log cleanup, validation, health checks)
    • Reduce alert noise and manual operational effort
    • Support reporting and alert validation
  • Configure monitoring for: Infrastructure, applications, APIs, logs, batch jobs, FIX, file watches, databases
  • Maintain runbooks, SOPs, and monitoring configuration lifecycle
  • Participate in incident response, RCA, and post‑incident remediation
  • Support monitoring platform rollouts, onboarding, and gateway scaling
  • Improve on-call effectiveness through tuning, automation, and proactive monitoring

Skills

AWS CloudWatchBashCorvilDynatraceFaddomIBM MQITRS GeneosISGNA / InsigniaLinuxMarket Data MonitoringMicrosoft TeamsPerlPythonServiceNowShellSQLSCOMUnixVMware

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free