Skip to content
mimi

AWS Cloud Engineering Ops Lead (Application Support)

Conglomerate-IT

Atlanta · On-site Contract Lead 1mo ago

About the role

Our Mission

Keep our AWS platforms and customer-facing apps available, observable, recoverable, secure, and cost‑sensible. Make the runbook path the easiest path, so on-call personnel feel calm and releases feel straightforward—in a good way.

Scope of the role

  • AWS operations: EC2, EKS, RDS, ALB/CloudFront, IAM/OIDC, VPC/TGW/SGs, patching, and hygiene.
  • Application support: release readiness, runbooks, post-deploy smoke checks, performance baselines, and clean rollback paths.
  • Visibility: dashboards, logs, metrics, traces, synthetics, error budgets, and alert health.
  • Backup & DR: policies, schedules, retention, cross-region copies, restore testing, and DR runbooks (RPO/RTO owned and measured).
  • Incident leadership: run Sev‑1/2 bridges, keep comms clear, and land post‑mortems with actions that actually close.
  • Cost hygiene: tagging, right-sizing, SP/RI coverage, lifecycle cleanups (EBS/EIP/AMIs).
  • Team enablement: guardrails, golden runbooks, and small automations that remove toil.

Day‑to‑day (what this looks like)

  • Triage overnight alerts and hot issues, set priorities, and make sure owners are clear.
  • Keep dashboards honest; fix flapping or missing alerts before they wake people up.
  • Check backups and recent restore points; open tickets for any gaps and track to done.
  • Unblock releases; verify smoke checks; keep environments tidy and predictable.
  • Lead or delegate break/fix; no lingering “mystery” incidents.
  • Write down what we learned in the runbook so the next person can fix it faster.

Weekly rhythm

  • Ops review: incidents, alerts, deploys, costs, capacity, and backup status in one short readout.
  • Observability tune‑up: delete noise, add the missing signal, and test a synthetic from the edge.
  • Backup/DR: run a small restore test and record RPO/RTO evidence.
  • Patch and change review: what shipped, what rolled back, why.

Monthly outcomes

  • Share availability/SLOs, MTTR, change failure rate, observability coverage, backup compliance, and costs in plain English.
  • Close the top recurring issues (noisy alerts, flaky deploys).
  • Refresh the most‑used runbooks; validate DR for one critical workload (tabletop or live restore).

Core responsibilities

  • Own production readiness and stability for assigned AWS accounts and apps.
  • Lead incidents and land post‑mortems; make the fixes stick.
  • Keep monitoring/logging/tracing standards real; enforce SLOs and error budgets.
  • Own backup strategy end-to-end, including monthly restore tests and DR docs.
  • Keep access least‑privileged and auditable; rotate secrets and certs on time.
  • Drive cost posture and mentor the team; make on-call humane.

What “good” looks like

  • Visibility: one clear dashboard per service, clean alert routing, low false positives.
  • Backups: 100% jobs green (or retried), documented RPO/RTO, and monthly restore tests that pass.
  • Reliability: MTTR trending down; most issues solved by the first responder with a runbook.
  • Change: predictable releases with smoke and rollback; fewer failed changes month over month.
  • Cost: flat or down against growth; tagging at or above 95%.

Minimum Experience Required

  • 8–10+ years in cloud/app operations with strong AWS hands-on experience.
  • Comfortable leading incidents, shaping dashboards and alerts, and automating the boring bits (Terraform, Ansible, Python).
  • Experience running backups/DR in AWS and proving it with real restore tests.
  • Cloud network experience.

Preferred Experience

  • AWS Solution Architect Certification
  • Any professional networking certifications
  • ITIL Certification

Benefits:

  • Health insurance

Work Location:

In person

Skills

AWSAnsibleCloud networkEC2EKSIAM/OIDCPythonRDSTerraformVPC/TGW/SGs

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free