Skip to content
mimi

Site Reliability Engineer (SRE)

H D

Remote · Canada Full-time Mid Level $50k – $70k/yr Today

About the role

About Us

We’re hiring an SRE who takes production personally. Someone who loses sleep over p99 latency, gets excited about runbook automation, and believes on-call should be boring because systems are already resilient, observable, and well-designed.

This role is for an engineer who thrives at the intersection of infrastructure, automation, reliability, and developer experience. You’ll work across cloud platforms, CI/CD systems, distributed applications, and production operations to ensure our systems remain scalable, secure, and highly available.

What You’ll Do

  • Lead reliability and operational excellence initiatives across our cloud infrastructure spanning AWS, Azure, GCP, and hybrid/private cloud environments
  • Design, implement, and maintain scalable infrastructure using Terraform, Ansible, and infrastructure-as-code best practices
  • Own and improve production systems running on AWS services including ECS, Fargate, Lambda, Aurora MySQL, RDS, ElastiCache, and S3
  • Maintain healthy, observable deployments across Azure App Services and Netlify environments
  • Manage and optimize Cloudflare configurations including WAF, DNS, caching, Workers, and edge security policies
  • Build and improve CI/CD pipelines using GitHub Actions, Jenkins, and related tooling with a focus on deployment safety, rollback strategies, and release velocity
  • Define and enforce SLOs, SLAs, error budgets, monitoring standards, and incident response processes
  • Drive postmortems that produce measurable operational improvements — not just documentation
  • Develop automation tools and scripts using Python, Bash, Go, PowerShell, or Ruby to reduce manual operational work
  • Manage and support Kubernetes and Docker-based containerized environments for microservices architectures
  • Monitor system performance, troubleshoot production issues proactively, and optimize availability, latency, and scalability
  • Collaborate closely with engineering teams to design resilient systems and improve application reliability from development through production
  • Support secure cloud operations through implementation of access controls, firewalls, VPNs, and infrastructure security best practices
  • Maintain clear operational documentation, runbooks, and architecture standards
  • Participate in incident response rotations and reliability planning initiatives

What We’re Looking For

  • 3–5+ years of hands-on experience in Site Reliability Engineering, Platform Engineering, DevOps, or Infrastructure Engineering
  • Strong expertise in AWS and production experience with ECS, Lambda, managed databases, and cloud-native architectures
  • Experience working with Azure and/or GCP environments in production
  • Strong knowledge of Kubernetes, Docker, and microservices-based systems
  • Experience with Infrastructure as Code and configuration management tools such as Terraform, Ansible, or Puppet
  • Solid Linux/Unix systems administration skills; Windows Server experience is a plus
  • Strong scripting and automation experience with Python, Bash, Go, PowerShell, or Ruby
  • Experience building and maintaining CI/CD pipelines using GitHub Actions, Jenkins, or similar tools
  • Experience configuring and debugging Cloudflare in production environments — beyond basic DNS management
  • Familiarity with observability and monitoring practices including metrics, logging, tracing, and alerting systems
  • Experience with relational and NoSQL databases including MySQL, PostgreSQL, MongoDB, Cassandra, or similar technologies
  • Understanding of distributed systems, REST APIs, SOA, and modern application deployment practices
  • Ability to read and understand application codebases (Node.js, Next.js, or similar) and evaluate infrastructure implications
  • Strong communication skills across engineering teams, leadership stakeholders, and incident response channels

Nice to Have

  • Experience with private cloud or virtualization platforms such as OpenStack, VMware, Citrix, or VirtualBox
  • Familiarity with SaaS/PaaS environments and large-scale distributed systems
  • Exposure to security engineering, edge networking, or performance optimization
  • Experience supporting high-traffic production environments with strict uptime requirements
  • Background in Agile development environments and SDLC best practices

Why Join Us

You’ll have the opportunity to work on mission-critical systems using modern cloud-native technologies while shaping the reliability culture of the organization. We value engineers who automate relentlessly, think systematically, and care deeply about operational excellence.

Compensation

Pay: $50,000.00-$70,000.00 per year

Work Location

Remote

Skills

AnsibleAWSAWS Aurora MySQLAWS CloudflareAWS ECSAWS ElastiCacheAWS FargateAWS LambdaAWS RDSAWS S3AzureBashCI/CDDockerGCPGitHub ActionsGoInfrastructure as CodeJenkinsKubernetesLinuxMicroservicesMongoDBMySQLNetlifyNode.jsObservabilityPostgreSQLPowerShellPythonRubyTerraformUnixVirtualizationVMwareWindows Server

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free