Senior Site Reliability Engineer (SRE) – Cloud & Distributed Systems NEW!

Dutech Systems

Austin · On-site Full-time Senior 3mo ago

About the role

Senior Site Reliability Engineer (SRE) – Cloud & Distributed Systems

Location: Austin, TX
Date Posted: 4/2/2026 2:48:14 PM
Job Number: DTS1017187676
Job Type: Contract

Skills: SRE, DevOps, AWS, GCP, Kubernetes, Docker, Python, Go, Linux, Distributed Systems, Monitoring, Logging, SLIs, SLOs, CI/CD, Observability

Job Description

We are seeking an experienced Senior Site Reliability Engineer (SRE) to design, build, and operate highly scalable and reliable cloud-based systems. The ideal candidate will have a strong background in DevOps, distributed systems, and cloud infrastructure, with a focus on automation, observability, and system reliability.

This role involves working in a fast-paced environment to ensure system uptime, performance, and operational excellence.

Key Responsibilities

Design, implement, and manage highly available, distributed systems
Maintain and optimize cloud infrastructure (AWS/GCP)
Develop automation scripts using Python, Go, Java, or Bash
Manage containerized environments using Docker and Kubernetes
Define and monitor SLIs, SLOs, and error budgets
Implement monitoring, logging, and alerting solutions
Lead incident management, root cause analysis (RCA), and postmortems
Ensure system security and compliance within operational workflows
Improve system reliability through performance tuning and optimization
Collaborate with engineering teams to enhance deployment and release processes
Create and maintain runbooks, dashboards, and operational documentation

Required Qualifications

8+ years of experience in SRE, DevOps, or Systems Engineering
Strong expertise in Linux/Unix systems and system internals
Proficiency in at least one programming/scripting language (Python, Go, Java, Bash)
Experience designing and operating distributed systems
Hands‑on experience with cloud platforms (AWS or GCP)
Experience with Docker and Kubernetes
Strong understanding of monitoring, alerting, and logging concepts
Experience managing SLIs, SLOs, and error budgets
Experience with incident management and RCA processes

Preferred Qualifications

Experience with observability tools (Prometheus, Grafana, Datadog, Splunk, Application Insights)
Experience supporting 24x7 production environments and on‑call rotations
Knowledge of chaos engineering and resiliency testing
Experience with canary deployments, feature flags, and progressive delivery
Strong documentation and communication skills

Skills

AWSCI/CDDevOpsDockerGCPGoKubernetesLinuxLoggingMonitoringObservabilityPythonSRESLISLODistributed Systems

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Senior Site Reliability Engineer (SRE) – Cloud & Distributed Systems NEW!

About the role

Senior Site Reliability Engineer (SRE) – Cloud & Distributed Systems

Job Description

Key Responsibilities

Required Qualifications

Preferred Qualifications

Skills

Similar roles

MCP Engineer / AI Backend Engineer

Software Engineer

Senior Database Engineer

Don't send a generic resume