Skip to content
mimi

Infrastructure & SRE Engineer

Intuitive.ai

Canada · On-site Full-time Today

About the role

• *About us:**

Intuitive is an • *innovation-led engineering company delivering business outcomes**

for 100’s of Enterprises globally. With the reputation of being a • *Tiger Team**

& a • *Trusted Partner**

of enterprise technology leaders, we help solve the most complex Digital Transformation challenges across following Intuitive Superpowers: • *Modernization & Migration** • Application & Database Modernization • Platform Engineering (IaC/EaC, DevSecOps & SRE) • Cloud Native Engineering, Migration to Cloud, VMware Exit • FinOps • *Data & AI/ML** • Data (Cloud Native / DataBricks / Snowflake) • Machine Learning, AI/GenAI • *Cybersecurity** • Infrastructure Security • Application Security • Data Security • AI/Model Security • *SDx & Digital Workspace (M365, G-suite)** • SDDC, SD-WAN, SDN, NetSec, Wireless/Mobility • Email, Collaboration, Directory Services, Shared Files Services • *Intuitive Services:** • Professional and Advisory Services • Elastic Engineering Services • Managed Services • Talent Acquisition & Platform Resell Services • *About the job:** • *Title: Title: Infrastructure & SRE Engineer** • *Start Date:**

Immediately • *# of Positions:**

1 • *Position Type: Full Time/ Contract** • *Location**

: Remote across Canada (occasional travel to USA) • *About the Role:**

The Staff Infrastructure & SRE Engineer will own the full lifecycle of our cloud-native platform — from provisioning and sizing AWS and Kubernetes infrastructure, to maintaining reliability through observability, release engineering, and incident response. This is a deeply hands-on engineering role with real production ownership, where you'll balance technical depth with operational leadership to keep our platform reliable and scalable.

You will write Terraform, Python, and Shell scripts daily, manage EKS clusters at scale, integrate applications into APM and monitoring systems, and enforce DevOps best practices including change control and uptime monitoring. Your focus will be on platform reliability and operational excellence — building the automation, observability, and infrastructure-as-code foundations that make our cloud platform programmable, observable, and resilient. We value engineers who automate relentlessly, own their systems end-to-end, and drive reliability improvements through data and discipline. • *Key Responsibilities** • *As a Staff Infrastructure & SRE Engineer, you will:** • Own AWS infrastructure provisioning and operations ensuring production reliability across VPCs, EC2, RDS, S3, IAM, Route 53, ALB/NLB in multi-account environments following AWS Well-Architected Framework principles; implement cost optimization, right-sizing, and resource tagging strategies • Lead Kubernetes platform operations end-to-end from provisioning EKS clusters from scratch through full lifecycle management — sizing and capacity planning with Cluster Autoscaler/ Karpenter, version upgrades, node group rotations, and breaking-change migrations • Drive infrastructure as code excellence setting standards for Terraform/OpenTofu module development with automated testing (terratest, plan validation), reliable state management with remote backends, and governance enforcement through policy checks (OPA/Rego, tflint) • Own end-to-end observability and APM integration ensuring full visibility across infrastructure and applications — design monitoring frameworks with Prometheus, Grafana, Loki, Tempo, and OpenTelemetry; instrument applications for distributed tracing and structured logging; define and track SLIs/SLOs for platform services • Lead release engineering and change control from planning through production deployment — coordinate infrastructure and application releases with rollback plans, validation gates, maintenance windows, and audit trails for all production changes • Drive incident response and platform reliability building on-call rotations, escalation paths, actionable runbooks, and blameless postmortem processes; implement chaos engineering practices to proactively identify platform weaknesses • Own environment provisioning pipelines ensuring repeatable, automated infrastructure delivery from bare AWS accounts to fully operational platforms across dev, staging, and production • Build GitOps workflows implementing ArgoCD or Flux for declarative cluster and application management, ensuring all changes flow through Git with PR-based review and automated validation • Develop automation and tooling writing Python CLI tools, Bash scripts, and CI/CD pipelines (GitLab CI/GitHub Actions) for infrastructure provisioning, deployment, health chec

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free