Skip to content
mimi

Site Reliability Engineer Focused on Systems Resilience

Newton

Remote · Canada Full-time Mid Level 4d ago

About the role

About the Role

Join as a Site Reliability Engineer to enhance service reliability and operational efficiency. Work with a remote team committed to innovative solutions and continuous improvements in the cloud environment. As part of this role, you will refine system design and directly impact operational success. From managing incident responses to leading post-mortems, your work will ensure scalable and resilient infrastructures. You'll play an essential part in maintaining service availability and improving system observability.

Key Responsibilities

  • Enhance infrastructure performance and scalability
  • Manage incidents and automate manual practices
  • Respond to alerts with on-call support
  • Define SLIs, SLOs, and error budgets
  • Improve monitoring, alerting, and documentation

Requirements

  • Proven experience with AWS or similar platforms
  • Skilled in chaos engineering techniques
  • Ability to debug live systems effectively
  • Experience with programming and scripting languages
  • Proactive mindset in a dynamic work environment

Additional Information

Drive innovation and excellence, ensuring operational readiness in a growing system.

Requirements

  • Proven experience with AWS or similar platforms
  • Skilled in chaos engineering techniques
  • Ability to debug live systems effectively
  • Experience with programming and scripting languages
  • Proactive mindset in a dynamic work environment

Responsibilities

  • Enhance infrastructure performance and scalability
  • Manage incidents and automate manual practices
  • Respond to alerts with on-call support
  • Define SLIs, SLOs, and error budgets
  • Improve monitoring, alerting, and documentation

Skills

AWS

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free