RI
Devops Infrastructure Engineer
Rivago Infotech Inc
Toronto · Hybrid Full-time Senior 2w ago
About the role
ROLE OVERVIEW & KEY RESPONSIBILITIES
- Infrastructure Operations & On-Call Own on-call rotation for infrastructure-layer incidents; manage EKS Cluster health, node scaling, networking, and availability; perform RCAs for infra failures.
- CI/CD Pipeline Management Operate and maintain GitHub Actions pipelines; manage Argo CD GitOps Deployments across dev, QA, and production; handle pipeline failures and improve reliability.
- DORA Metrics — Infrastructure Lens Track Lead Time for Changes and Deployment Frequency at the Infrastructure level; identify pipeline bottlenecks and drive continuous improvement.
- Infrastructure as Code (IaC) Write and maintain OpenTofu/Terraform scripts for AWS infrastructure Provisioning; manage EKS, VPC, IAM roles, S3, RDS, and networking configurations.
- Kong API Gateway Operations Administer Kong instances (K8s-deployed); manage plugins, routing policies, Rate limits, JWT auth configuration, and gateway health monitoring.
- Security & Compliance Operations Manage IAM roles and service account roles (SAR); rotate credentials and secrets; ensure SSDLC compliance for all infra changes; coordinate security reviews.
- Cost & Capacity Management Monitor AWS spend; identify and act on cost optimization opportunities; manage resource right-sizing; report on infrastructure cost per service.
- Art factory & Tooling Operate Artifactory for image and artifact management; manage registry access controls; ensure pipeline dependencies are pinned and auditable.
AWS Infrastructure (Strong)
- 5+ years of hands-on AWS experience: EKS, EC2, VPC, IAM, S3, RDS, CloudWatch, Route53
- Strong Kubernetes administration: cluster setup, node groups, namespaces, RBAC, Helm charts
- Experience with AWS networking: VPC design, subnets, NAT gateways, security groups, peering
- Familiarity with AWS cost management tools and FinOps practices
DevOps & CI/CD (Core Competency)
- Deep experience with GitHub Actions or equivalent CI/CD platforms
- Hands-on Argo CD or Flux GitOps — deployment strategies, rollback, progressive delivery
- Container image management: Docker, Artifactory or ECR, image scanning
- Experience with secret management: HashiCorp Vault, AWS Secrets Manager, or equivalent
Infrastructure as Code:
- Proficient with Terraform or OpenTofu — modules, state management, remote backends
- Experience writing IaC for EKS, VPC, and IAM from scratch
- Familiarity with Helm chart authoring and management
Operational Excellence (Core Competency)
- DORA metrics tracking at the infrastructure and pipeline level
- Experience running on-call rotations with structured incident management
- Runbook authoring for infrastructure failure modes
- SRE principles: error budgets, toil reduction, reliability engineering
Nice to Have
- Kong API Gateway — plugin configuration, deck/declarative config, admin API
- OpenStack experience (reference: existing Kong test instance)
- Multi-cloud exposure (GCP or Azure) alongside AWS primary
- Familiarity with Langfuse, Temporal, or data pipeline infrastructure
OpEx Ownership
This role owns Lead Time for Changes and pipeline reliability metrics. Target: pipeline success rate > 95%; infrastructure incident MTTR < 1 hour; zero unplanned infra outages per sprint. Owns monthly cloud cost report.
Skills
AWSArgo CDArtifactoryCloudWatchDockerEC2ECREKSFinOpsGCPGitHub ActionsHelmIAMKongKubernetesLangfuseOpenStackOpenTofuRDSRoute53S3TerraformTemporalVPCVault
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free