Staff Infrastructure & SRE Engineer

Intuitive.ai

Remote · Canada Full-time Lead 3mo ago

About the role

About Intuitive

Intuitive is an innovation-led engineering company delivering business outcomes for 100’s of Enterprises globally. With the reputation of being a Tiger Team & a Trusted Partner of enterprise technology leaders, we help solve the most complex Digital Transformation challenges across following Intuitive Superpowers:

Modernization & Migration

Application & Database Modernization
Cloud Native Engineering, Migration to Cloud, VMware Exit
Fin Ops

Data & AI/ML

Cybersecurity

Infrastructure Security
Application Security
Data Security

SDx & Digital Workspace (M365, G-suite)

SDDC, SD-WAN, SDN, Net Sec, Wireless/Mobility
Email, Collaboration, Directory Services, Shared Files Services

Intuitive Services:

Professional and Advisory Services
Elastic Engineering Services
Managed Services
Talent Acquisition & Platform Resell Services

About the job

Start Date: Immediately

of Positions: 1

Position Type: Full Time/ Contract

Location: Remote across Canada (occasional travel to USA)

About the Role

The Staff Infrastructure & SRE Engineer will own the full lifecycle of our cloud-native platform — from provisioning and sizing AWS and Kubernetes infrastructure, to maintaining reliability through observability, release engineering, and incident response. This is a deeply hands‑on engineering role with real production ownership, where you'll balance technical depth with operational leadership to keep our platform reliable and scalable.

You will write Terraform, Python, and Shell scripts daily, manage EKS clusters at scale, integrate applications into APM and monitoring systems, and enforce Dev Ops best practices including change control and uptime monitoring. Your focus will be on platform reliability and operational excellence — building the automation, observability, and infrastructure-as-code foundations that make our cloud platform programmable, observable, and resilient. We value engineers who automate relentlessly, own their systems end-to-end, and drive reliability improvements through data and discipline.

Key Responsibilities

Own AWS infrastructure provisioning and operations ensuring production reliability across VPCs, EC2, RDS, S3, IAM, Route 53, ALB/NLB in multi-account environments following AWS Well-Architected Framework principles; implement cost optimization, right-sizing, and resource tagging strategies
Lead Kubernetes platform operations end-to-end from provisioning EKS clusters from scratch through full lifecycle management — sizing and capacity planning with Cluster Autoscaler/Karpenter, version upgrades, node group rotations, and breaking-change migrations
Drive infrastructure as code excellence setting standards for Terraform/Open Tofu module development with automated testing (terratest, plan validation), reliable state management with remote backends, and governance enforcement through policy checks (OPA/Rego, tflint)
Own end-to-end observability and APM integration ensuring full visibility across infrastructure and applications — design monitoring frameworks with Prometheus, Grafana, Loki, Tempo, and Open Telemetry; instrument applications for distributed tracing and structured logging; define and track SLIs/SLOs for platform services
Lead release engineering and change control from planning through production deployment — coordinate infrastructure and application releases with rollback plans, validation gates, maintenance windows, and audit trails for all production changes
Drive incident response and platform reliability building on-call rotations, escalation paths, actionable runbooks, and blameless postmortem processes; implement chaos engineering practices to proactively identify platform weaknesses
Own environment provisioning pipelines ensuring repeatable, automated infrastructure delivery from bare AWS accounts to fully operational platforms across dev, staging, and production
Build Git Ops workflows implementing ArgoCD or Flux for declarative cluster and application management, ensuring all changes flow through Git with PR-based review and automated validation
Develop automation and tooling writing Python CLI tools, Bash scripts, and CI/CD pipelines (Git Lab CI/Git Hub Actions) for infrastructure provisioning

Skills

AWSArgoCDBashCI/CDCloud NativeDockerEKSFluxGitGitLab CIGitHub ActionsGrafanaIAMKubernetesLokiM365MonitoringNet SecOpen TelemetryOpenTofuOPAPrometheusPythonRegoRoute 53ShellSDNSDDCSD-WANS3TempoTerraformTflintVPCVMwareAWS LambdaAWS Well-Architected FrameworkAPMEC2GitOpsG-suiteKarpenterNLBALBRDSTerratest

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Staff Infrastructure & SRE Engineer

About the role

About Intuitive

About the job

of Positions: 1

About the Role

Key Responsibilities

Skills

Similar roles

backend developer

Fullstack Software Architect / Lead Engineer

Java Backend Engineer (all gender)

Don't send a generic resume