P
Senior / Staff DevOps Engineer (Platform & Reliability)
Peerlogic
Remote · Canada Full-time Senior 3w ago
About the role
The Role
Peerlogic is hiring a Senior / Staff DevOps Engineer to own the platform, infrastructure, and reliability of a production system that spans application services, AI/ML workloads, and real-time voice infrastructure.
You are replacing a strong DevOps leader and not building from scratch. The system works. Your job is to make it exceptional.
This is not a support role. This is not a ticket-driven role.
What You’ll Own
Platform & Infrastructure
- End-to-end ownership of cloud + hybrid infrastructure (AWS, GCP, and physical environments)
- Multi-region architecture targeting 99.999% uptime
- Kubernetes clusters and container orchestration across all services
- CI/CD pipelines (GitHub Actions); reliability, speed, and developer experience
- Infrastructure as Code (Terraform, Ansible)
Reliability & Observability
- Design and enforce SLOs, SLIs, and error budgets
- Build a best-in-class observability stack (metrics, logs, traces)
- Drive incident response, postmortems, and systemic fixes (not band-aids)
- Reduce MTTR and eliminate repeat incidents
Data & Event Systems
- Ownership of event-driven architecture (RabbitMQ or equivalent)
- Ensure durability, replayability, and correctness of pipelines
- Design and maintain backfill and recovery strategies
- Improve debuggability of asynchronous systems
AI / ML Infrastructure
Skills
AnsibleAWSCI/CDDockerGCPGitHub ActionsInfrastructure as CodeKubernetesMLRabbitMQTerraform
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free