Skip to content
mimi

SRE AWS

Raas Infotek

Canada · Hybrid Full-time Senior Today

About the role

Resiliency & Operational Excellence — AWS Serverless | Dynatrace

Reliability, resiliency, and operational excellence for mission critical AWS serverless platforms, ensuring high availability, low MTTR, and strong production governance using Dynatrace driven observability.

  • Resiliency strategy for serverless architectures (Lambda, API Gateway, async/event driven systems)
  • SLOs / SLIs / Error Budgets for critical API’s
  • Incident analysis and post incident reviews
  • Dynatrace observability: dashboards, alert tuning, dependency mapping, RCA acceleration
  • Operational excellence improvements: incident reduction, MTTR improvement, toil automation
  • Reliability guardrails embedded into CI/CD and production readiness reviews

Core Responsibilities

  • Design & enforce resiliency patterns: timeouts, retries, circuit breakers, throttling, graceful degradation
  • Lead major incidents and drive actionable RCAs with sustained fixes
  • Build signal driven alerts aligned to SLOs (noise reduction focus)
  • Enable automation & self healing where feasible

Required Experience

  • 5-6+ years in SRE/DevOps/Production Engineering
  • Deep hands on with AWS serverless (Lambda, API Gateway, SQS/SNS, DynamoDB/RDS)
  • Strong expertise in Dynatrace for serverless monitoring & triage
  • Proven success improving availability, MTTR, and incident trends
  • Solid coding/scripting (Python / Java / Node.js)

Skills

API GatewayAWS LambdaDynamoDBDynatraceJavaNode.jsPythonRDSSQSSNS

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free