RI
SRE
Raas Infotek
Fulford Harbour · Hybrid Full-time Senior Today
About the role
About
Reliability, resiliency, and operational excellence for mission critical AWS serverless platforms, ensuring high availability, low MTTR, and strong production governance using Dynatrace driven observability.
- Resiliency strategy for serverless architectures (Lambda, API Gateway, async/event driven systems)
- SLOs / SLIs / Error Budgets for critical API’s
- Incident analysis and post incident reviews
- Dynatrace observability: dashboards, alert tuning, dependency mapping, RCA acceleration
- Operational excellence improvements: incident reduction, MTTR improvement, toil automation
- Reliability guardrails embedded into CI/CD and production readiness reviews
Core Responsibilities
- Design & enforce resiliency patterns: timeouts, retries, circuit breakers, throttling, graceful degradation
- Lead major incidents and drive actionable RCAs with sustained fixes
- Build signal driven alerts aligned to SLOs (noise reduction focus)
- Enable automation & self healing where feasible
Required Experience
- 5-6+ years in SRE/DevOps/Production Engineering
- Deep hands on with AWS serverless (Lambda, API Gateway, SQS/SNS, DynamoDB/RDS)
- Strong expertise in Dynatrace for serverless monitoring & triage
- Proven success improving availability, MTTR, and incident trends
- Solid coding/scripting (Python / Java / Node.js)
Skills
API GatewayAWS LambdaCI/CDDynamoDBDynatraceJavaNode.jsPythonRDSSRESQSSNS
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free