N
Site Reliability Engineer Focused on Systems Resilience
Newton
Remote · Canada Full-time Mid Level 4d ago
About the role
About the Role
Join as a Site Reliability Engineer to enhance service reliability and operational efficiency. Work with a remote team committed to innovative solutions and continuous improvements in the cloud environment. As part of this role, you will refine system design and directly impact operational success. From managing incident responses to leading post-mortems, your work will ensure scalable and resilient infrastructures. You'll play an essential part in maintaining service availability and improving system observability.
Key Responsibilities
- Enhance infrastructure performance and scalability
- Manage incidents and automate manual practices
- Respond to alerts with on-call support
- Define SLIs, SLOs, and error budgets
- Improve monitoring, alerting, and documentation
Requirements
- Proven experience with AWS or similar platforms
- Skilled in chaos engineering techniques
- Ability to debug live systems effectively
- Experience with programming and scripting languages
- Proactive mindset in a dynamic work environment
Additional Information
Drive innovation and excellence, ensuring operational readiness in a growing system.
Requirements
- Proven experience with AWS or similar platforms
- Skilled in chaos engineering techniques
- Ability to debug live systems effectively
- Experience with programming and scripting languages
- Proactive mindset in a dynamic work environment
Responsibilities
- Enhance infrastructure performance and scalability
- Manage incidents and automate manual practices
- Respond to alerts with on-call support
- Define SLIs, SLOs, and error budgets
- Improve monitoring, alerting, and documentation
Skills
AWS
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free