Cloud Operations Engineer – Monitoring Lead

Extreme Networks, Inc.

Pincher Creek · Hybrid Full-time Lead $120k – $140k/yr 3mo ago

About the role

About

Extreme is a technology leader in the Gartner Magic Quadrant and promotes an internal culture that embraces diversity, inclusion, and equality. The company is expanding its portfolio through acquisitions, creating significant growth opportunities in the region. The role is for a Cloud Operations Engineer – Monitoring Lead (Hybrid, Thornhill, Toronto) to join the growing Cloud Operations team.

Responsibilities

Lead the design, implementation, and continuous improvement of our end‑to‑end monitoring and alerting framework for cloud infrastructure (AWS, Azure, GCP), applications, and services.
Define key performance indicators (KPIs), service level indicators (SLIs), and service level objectives (SLOs) for critical systems.
Evaluate, select, and integrate monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk, Cloud Watch, Azure Monitor, GCP Operations Suite) to meet evolving needs.
Develop and implement automation scripts and tools (e.g., Python, Bash, Power Shell) to streamline monitoring deployment, configuration, and incident remediation.
Build and maintain dashboards, alerts, and reports that provide actionable insights into system performance, health, and availability.
Analyze monitoring data to identify performance bottlenecks, resource inefficiencies, and potential cost‑optimization opportunities.
Collaborate with engineering teams to implement performance improvements and cost‑saving measures.
Create and maintain comprehensive documentation for monitoring systems, procedures, and best practices.
Proactively identify areas for improvement in our cloud operations and monitoring capabilities.
Provide 24 × 7 support for cloud services.
Participate in cloud security and compliance implementation.

Ideal Qualifications

BS‑level technical degree required; Computer Science or Engineering background preferred.
8+ years of progressive experience in Cloud Operations, DevOps, or Site Reliability Engineering roles, with a strong focus on monitoring.
Deep expertise with at least one major public cloud platform (AWS, Azure, or Google Cloud Platform).
Proven experience as a technical lead or senior contributor in a monitoring‑focused role.
Working knowledge of container‑based architecture and deployment (Docker, Kubernetes).
Extensive experience with various monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, ELK Stack, vendor‑specific monitoring solutions).
Excellent problem‑solving, analytical, and troubleshooting skills.
Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka, and RabbitMQ.
Comfortable working within a distributed team located in multiple time zones.

Compensation

$120,000 – $140,000 per year.

Skills

AWSAWS Cloud WatchAzureAzure MonitorBashCloud OperationsDockerELK StackElasticsearchGCPGCP Operations SuiteGrafanaIgniteKafkaKubernetesMonitoringPower ShellPrometheusRabbitMQRedisSite Reliability EngineeringSplunkSQLPython

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Cloud Operations Engineer – Monitoring Lead

About the role

About

Responsibilities

Ideal Qualifications

Compensation

Skills

Similar roles

backend developer

Java Backend Engineer (all gender)

Frontend Software Developer m/f/d

Don't send a generic resume