Sr. Infra Devops engineer

TestingXperts

Toronto · Hybrid Contract Senior 1mo ago

Apply with a tailored resume Save job

About the role

ROLE OVERVIEW & KEY RESPONSIBILITIES:

Infrastructure Operations & On-Call
- Own on-call rotation for infrastructure-layer incidents; manage EKS cluster health, node scaling, networking, and availability; perform RCAs for infra failures.
CI/CD Pipeline Management
- Operate and maintain GitHub Actions pipelines; manage Argo CD GitOps deployments across dev, QA, and production; handle pipeline failures and improve reliability.
DORA Metrics — Infrastructure Lens
- Track Lead Time for Changes and Deployment Frequency at the infrastructure level; identify pipeline bottlenecks and drive continuous improvement.
Infrastructure as Code (IaC)
- Write and maintain OpenTofu/Terraform scripts for AWS infrastructure provisioning; manage EKS, VPC, IAM roles, S3, RDS, and networking configurations.
Kong API Gateway Operations
- Administer Kong instances (K8s-deployed); manage plugins, routing policies, rate limits, JWT auth configuration, and gateway health monitoring.
Security & Compliance Operations
- Manage IAM roles and service account roles (SAR); rotate credentials and secrets; ensure SSDLC compliance for all infra changes; coordinate security reviews.
Cost & Capacity Management
- Monitor AWS spend; identify and act on cost optimization opportunities; manage resource right-sizing; report on infrastructure cost per service.
Artifactory & Tooling
- Operate Artifactory for image and artifact management; manage registry access controls; ensure pipeline dependencies are pinned and auditable.

REQUIRED SKILLS & EXPERIENCE:

AWS Infrastructure (Strong)
- 5+ years of hands-on AWS experience: EKS, EC2, VPC, IAM, S3, RDS, CloudWatch, Route53
- Strong Kubernetes administration: cluster setup, node groups, namespaces, RBAC, Helm charts
- Experience with AWS networking: VPC design, subnets, NAT gateways, security groups, peering
- Familiarity with AWS cost management tools and FinOps practices
DevOps & CI/CD (Core Competency)
- Deep experience with GitHub Actions or equivalent CI/CD platforms
- Hands-on Argo CD or Flux GitOps — deployment strategies, rollback, progressive delivery
- Container image management: Docker, Artifactory or ECR, image scanning
- Experience with secret management: HashiCorp Vault, AWS Secrets Manager, or equivalent
Infrastructure as Code
- Proficient with Terraform or OpenTofu — modules, state management, remote backends
- Experience writing IaC for EKS, VPC, and IAM from scratch
- Familiarity with Helm chart authoring and management
Operational Excellence (Core Competency)
- DORA metrics tracking at the infrastructure and pipeline level
- Experience running on-call rotations with structured incident management
- Runbook authoring for infrastructure failure modes
- SRE principles: error budgets, toil reduction, reliability engineering

Nice to Have:

Kong API Gateway — plugin configuration, deck/declarative config, admin API
OpenStack experience (reference: existing Kong test instance)
Multi-cloud exposure (GCP or Azure) alongside AWS primary
Familiarity with Langfuse, Temporal, or data pipeline infrastructure

OpEx Ownership:

This role owns Lead Time for Changes and pipeline reliability metrics. Target: pipeline success rate > 95%; infrastructure incident MTTR < 1 hour; zero unplanned infra outages per sprint. Owns monthly cloud cost report.

Skills

AWSArgo CDArtifactoryCloudWatchDockerEC2ECREKSFinOpsGCPGitHub ActionsHashiCorp VaultHelmIAMKongKubernetesLangfuseOpenStackOpenTofuRDSRoute53S3SecuritySRETerraformTemporalVPC

Similar roles

MCP Engineer / AI Backend Engineer

Ruby Labs

Senior Database Engineer

Glencore AG

Team Leads

imagino

€70k – €110k/yr

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free