Sr Infra Devops Engineer (Toronto, ON-Hybrid)
TestingXperts
About the role
Title: Sr. Infra Devops engineer
Duration: 6+ months
Location: Toronto, ON(Hybrid)
Job Description:
Platform Infrastructure, CI/CD, EKS Operations, IaC & Cloud Cost Management
ROLE OVERVIEW & KEY RESPONSIBILITIES:
• Infrastructure Operations & On-Call Own on-call rotation for infrastructure-layer incidents; manage EKS
cluster health, node scaling, networking, and availability; perform RCAs for infra failures.
• CI/CD Pipeline Management Operate and maintain GitHub Actions pipelines; manage Argo CD GitOps
deployments across dev, QA, and production; handle pipeline failures and improve reliability.
• DORA Metrics — Infrastructure Lens Track Lead Time for Changes and Deployment Frequency at the
infrastructure level; identify pipeline bottlenecks and drive continuous improvement.
• Infrastructure as Code (IaC) Write and maintain OpenTofu/Terraform scripts for AWS infrastructure
provisioning; manage EKS, VPC, IAM roles, S3, RDS, and networking configurations.
• Kong API Gateway Operations Administer Kong instances (K8s-deployed); manage plugins, routing policies,
rate limits, JWT auth configuration, and gateway health monitoring.
• Security & Compliance Operations Manage IAM roles and service account roles (SAR); rotate credentials and
secrets; ensure SSDLC compliance for all infra changes; coordinate security reviews.
• Cost & Capacity Management Monitor AWS spend; identify and act on cost optimization opportunities;
manage resource right-sizing; report on infrastructure cost per service.
• Artifactory & Tooling Operate Artifactory for image and artifact management; manage registry access
controls; ensure pipeline dependencies are pinned and auditable.
REQUIRED SKILLS & EXPERIENCE:
AWS Infrastructure (Strong)
• 5+ years of hands-on AWS experience: EKS, EC2, VPC, IAM, S3, RDS, CloudWatch, Route53
• Strong Kubernetes administration: cluster setup, node groups, namespaces, RBAC, Helm charts
• Experience with AWS networking: VPC design, subnets, NAT gateways, security groups, peering
• Familiarity with AWS cost management tools and FinOps practices
DevOps & CI/CD (Core Competency)
• Deep experience with GitHub Actions or equivalent CI/CD platforms
• Hands-on Argo CD or Flux GitOps — deployment strategies, rollback, progressive delivery
• Container image management: Docker, Artifactory or ECR, image scanning
• Experience with secret management: HashiCorp Vault, AWS Secrets Manager, or equivalent
Infrastructure as Code
• Proficient with Terraform or OpenTofu — modules, state management, remote backends
• Experience writing IaC for EKS, VPC, and IAM from scratch
• Familiarity with Helm chart authoring and management
Operational Excellence (Core Competency)
• DORA metrics tracking at the infrastructure and pipeline level
• Experience running on-call rotations with structured incident management
• Runbook authoring for infrastructure failure modes
• SRE principles: error budgets, toil reduction, reliability engineering
Nice to Have:
• Kong API Gateway — plugin configuration, deck/declarative config, admin API
• OpenStack experience (reference: existing Kong test instance)
• Multi-cloud exposure (GCP or Azure) alongside AWS primary
• Familiarity with Langfuse, Temporal, or data pipeline infrastructure
OpEx Ownership: This role owns Lead Time for Changes and pipeline reliability metrics. Target: pipeline success rate > 95%; infrastructure incident MTTR < 1 hour; zero unplanned infra outages per sprint. Owns monthly cloud
cost report.
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free