Skip to content
mimi

SRE

Raas Infotek

Fulford Harbour · Hybrid Full-time Senior Today

About the role

About

Reliability, resiliency, and operational excellence for mission critical AWS serverless platforms, ensuring high availability, low MTTR, and strong production governance using Dynatrace driven observability.

  • Resiliency strategy for serverless architectures (Lambda, API Gateway, async/event driven systems)
  • SLOs / SLIs / Error Budgets for critical API’s
  • Incident analysis and post incident reviews
  • Dynatrace observability: dashboards, alert tuning, dependency mapping, RCA acceleration
  • Operational excellence improvements: incident reduction, MTTR improvement, toil automation
  • Reliability guardrails embedded into CI/CD and production readiness reviews

Core Responsibilities

  • Design & enforce resiliency patterns: timeouts, retries, circuit breakers, throttling, graceful degradation
  • Lead major incidents and drive actionable RCAs with sustained fixes
  • Build signal driven alerts aligned to SLOs (noise reduction focus)
  • Enable automation & self healing where feasible

Required Experience

  • 5-6+ years in SRE/DevOps/Production Engineering
  • Deep hands on with AWS serverless (Lambda, API Gateway, SQS/SNS, DynamoDB/RDS)
  • Strong expertise in Dynatrace for serverless monitoring & triage
  • Proven success improving availability, MTTR, and incident trends
  • Solid coding/scripting (Python / Java / Node.js)

Skills

API GatewayAWS LambdaCI/CDDynamoDBDynatraceJavaNode.jsPythonRDSSRESQSSNS

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free