MS
AWS SRE Engineer
Marks Sattin
Glasgow · On-site Full-time Senior 2w ago
About the role
Overview
We’re hiring an experienced AWS SRE Engineer to lead observability for a cloud platform. The role focuses on building and maintaining actionable Grafana dashboards, defining and measuring reliability (SLIs/SLOs/SLAs), owning alerting strategy, and driving improvements to platform resilience. This is an opportunity to shape operational excellence and influence engineering decisions across the stack.
What you’ll do (key responsibilities)
- Design, build and maintain Grafana dashboards that deliver actionable insights into performance, availability and capacity.
- Implement and improve observability for AWS-hosted applications and infrastructure (metrics, logs, traces).
- Define and track SLIs, SLOs and SLAs; manage error budgets and translate reliability targets into engineering priorities.
- Monitor using golden signals and operate an effective, noise-aware alerting strategy.
- Support incident response, run RCA processes and drive continuous reliability improvements.
- Embed observability into CI/CD and cloud operations; collaborate with platform, engineering and ops teams to improve operational efficiency.
Must-have skills and experience
- 6+ years in SRE, Cloud Reliability or Cloud Operations roles.
- Strong, hands-on AWS experience.
- Proven expertise building Grafana dashboards and working in observability/monitoring stacks.
- Solid understanding of SRE fundamentals (SLA, SLO, SLI, error budgets, golden signals).
- Track record troubleshooting production systems and improving platform reliability.
- Strong communicator and team collaborator.
Nice-to-have
- Experience with Snowflake or Databricks.
- Familiarity with IaC, automation and cloud-native operational tooling.
Skills
AWSGrafanaSRE
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free