Skip to content
mimi

Devops Infrastructure Engineer

Rivago Infotech Inc

Toronto · Hybrid Full-time Senior 2w ago

About the role

ROLE OVERVIEW & KEY RESPONSIBILITIES

  • Infrastructure Operations & On-Call Own on-call rotation for infrastructure-layer incidents; manage EKS Cluster health, node scaling, networking, and availability; perform RCAs for infra failures.
  • CI/CD Pipeline Management Operate and maintain GitHub Actions pipelines; manage Argo CD GitOps Deployments across dev, QA, and production; handle pipeline failures and improve reliability.
  • DORA Metrics — Infrastructure Lens Track Lead Time for Changes and Deployment Frequency at the Infrastructure level; identify pipeline bottlenecks and drive continuous improvement.
  • Infrastructure as Code (IaC) Write and maintain OpenTofu/Terraform scripts for AWS infrastructure Provisioning; manage EKS, VPC, IAM roles, S3, RDS, and networking configurations.
  • Kong API Gateway Operations Administer Kong instances (K8s-deployed); manage plugins, routing policies, Rate limits, JWT auth configuration, and gateway health monitoring.
  • Security & Compliance Operations Manage IAM roles and service account roles (SAR); rotate credentials and secrets; ensure SSDLC compliance for all infra changes; coordinate security reviews.
  • Cost & Capacity Management Monitor AWS spend; identify and act on cost optimization opportunities; manage resource right-sizing; report on infrastructure cost per service.
  • Art factory & Tooling Operate Artifactory for image and artifact management; manage registry access controls; ensure pipeline dependencies are pinned and auditable.

AWS Infrastructure (Strong)

  • 5+ years of hands-on AWS experience: EKS, EC2, VPC, IAM, S3, RDS, CloudWatch, Route53
  • Strong Kubernetes administration: cluster setup, node groups, namespaces, RBAC, Helm charts
  • Experience with AWS networking: VPC design, subnets, NAT gateways, security groups, peering
  • Familiarity with AWS cost management tools and FinOps practices

DevOps & CI/CD (Core Competency)

  • Deep experience with GitHub Actions or equivalent CI/CD platforms
  • Hands-on Argo CD or Flux GitOps — deployment strategies, rollback, progressive delivery
  • Container image management: Docker, Artifactory or ECR, image scanning
  • Experience with secret management: HashiCorp Vault, AWS Secrets Manager, or equivalent

Infrastructure as Code:

  • Proficient with Terraform or OpenTofu — modules, state management, remote backends
  • Experience writing IaC for EKS, VPC, and IAM from scratch
  • Familiarity with Helm chart authoring and management

Operational Excellence (Core Competency)

  • DORA metrics tracking at the infrastructure and pipeline level
  • Experience running on-call rotations with structured incident management
  • Runbook authoring for infrastructure failure modes
  • SRE principles: error budgets, toil reduction, reliability engineering

Nice to Have

  • Kong API Gateway — plugin configuration, deck/declarative config, admin API
  • OpenStack experience (reference: existing Kong test instance)
  • Multi-cloud exposure (GCP or Azure) alongside AWS primary
  • Familiarity with Langfuse, Temporal, or data pipeline infrastructure

OpEx Ownership

This role owns Lead Time for Changes and pipeline reliability metrics. Target: pipeline success rate > 95%; infrastructure incident MTTR < 1 hour; zero unplanned infra outages per sprint. Owns monthly cloud cost report.

Skills

AWSArgo CDArtifactoryCloudWatchDockerEC2ECREKSFinOpsGCPGitHub ActionsHelmIAMKongKubernetesLangfuseOpenStackOpenTofuRDSRoute53S3TerraformTemporalVPCVault

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free