SRE – AWS

KTek Resourcing

Toronto · Hybrid Full-time Senior 1mo ago

About the role

Role Overview

Reliability, resiliency, and operational excellence for mission-critical AWS serverless platforms, ensuring high availability, low MTTR, and strong production governance using Dynatrace-driven observability.

Key Focus Areas

Resiliency strategy for serverless architectures (Lambda, API Gateway, async/event-driven systems)
SLOs / SLIs / Error Budgets for critical APIs
Incident analysis and post-incident reviews
Dynatrace observability: dashboards, alert tuning, dependency mapping, RCA acceleration
Operational excellence improvements: incident reduction, MTTR improvement, toil automation
Reliability guardrails embedded into CI/CD and production readiness reviews

Core Responsibilities

Design & enforce resiliency patterns: timeouts, retries, circuit breakers, throttling, graceful degradation
Lead major incidents and drive actionable RCAs with sustained fixes
Build signal-driven alerts aligned to SLOs (noise reduction focus)
Enable automation & self-healing where feasible

Required Experience

5–6+ years in SRE / DevOps / Production Engineering
Deep hands-on experience with AWS serverless (Lambda, API Gateway, SQS/SNS, DynamoDB/RDS)
Strong expertise in Dynatrace for serverless monitoring & triage
Proven success improving availability, MTTR, and incident trends
Solid coding/scripting (Python / Java / Node.js)

Skills

API GatewayAWS LambdaDynamoDBDynatraceJavaNode.jsPythonRDSSQSSNS

Similar roles

MCP Engineer / AI Backend Engineer

Ruby Labs

Senior Database Engineer

Glencore AG

Software Engineer (Rust)

Spire

$131k – $171k/yr

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free