Skip to content
mimi

Senior / Staff DevOps Engineer (Platform & Reliability)

Peerlogic

Remote · Canada Full-time Senior 3w ago

About the role

The Role

Peerlogic is hiring a Senior / Staff DevOps Engineer to own the platform, infrastructure, and reliability of a production system that spans application services, AI/ML workloads, and real-time voice infrastructure.

You are replacing a strong DevOps leader and not building from scratch. The system works. Your job is to make it exceptional.

This is not a support role. This is not a ticket-driven role.

What You’ll Own

Platform & Infrastructure

  • End-to-end ownership of cloud + hybrid infrastructure (AWS, GCP, and physical environments)
  • Multi-region architecture targeting 99.999% uptime
  • Kubernetes clusters and container orchestration across all services
  • CI/CD pipelines (GitHub Actions); reliability, speed, and developer experience
  • Infrastructure as Code (Terraform, Ansible)

Reliability & Observability

  • Design and enforce SLOs, SLIs, and error budgets
  • Build a best-in-class observability stack (metrics, logs, traces)
  • Drive incident response, postmortems, and systemic fixes (not band-aids)
  • Reduce MTTR and eliminate repeat incidents

Data & Event Systems

  • Ownership of event-driven architecture (RabbitMQ or equivalent)
  • Ensure durability, replayability, and correctness of pipelines
  • Design and maintain backfill and recovery strategies
  • Improve debuggability of asynchronous systems

AI / ML Infrastructure

Skills

AnsibleAWSCI/CDDockerGCPGitHub ActionsInfrastructure as CodeKubernetesMLRabbitMQTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free