Site Reliability Engineer (SRE)

Technology Ventures

Reston · Hybrid Full-time Senior $175k – $185k/yr 2mo ago

About the role

About Us

We are seeking a highly skilled and hands-on Site Reliability Engineer (SRE) with deep Kubernetes expertise to support and enhance our enterprise platform engineering environment. This role is ideal for a self-starter who enjoys learning, solving complex infrastructure challenges, improving observability, and partnering closely with engineering teams to streamline CI/CD and platform operations.

The ideal candidate will have strong experience managing Kubernetes environments, preferably Red Hat OpenShift in an on-premises enterprise setting, along with a passion for automation, reliability engineering, and operational excellence.

Key Responsibilities

Manage, maintain, and optimize Kubernetes/OpenShift platform environments to ensure high availability, scalability, and operational reliability.
Provide ongoing “care and feeding” of Kubernetes clusters, including cluster administration, upgrades, troubleshooting, and performance tuning.
Improve end-to-end observability across the platform using tools such as Grafana, Prometheus, and Datadog.
Lead incident response efforts, root cause analysis, and postmortems to continuously improve platform reliability and resiliency.
Partner closely with Scrum and development teams to support CI/CD pipelines, deployments, routing, configuration management, and troubleshooting.
Build and maintain automation and deployment pipelines that support engineering and development teams.
Develop scripts and automation solutions using Bash, Python, Go, or PowerShell to reduce manual intervention and improve operational efficiency.
Support and maintain platform services such as HashiCorp Vault, AMQ/Kafka, Keycloak, and related infrastructure components.
Create and maintain technical documentation, operational procedures, deployment guides, and incident response plans.
Participate in an on-call rotation and support production environments as needed.

Required Qualifications

5–7+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or related infrastructure engineering roles.
Deep hands-on experience with Kubernetes administration and troubleshooting.
Strong experience with Red Hat OpenShift, including operators, ingress/routing, and cluster management.
Experience supporting enterprise infrastructure in on-premises environments.
Strong scripting and automation skills using Bash and/or Python.
Experience with observability and monitoring tools such as Grafana, Prometheus, and Datadog.
Experience troubleshooting complex production issues using logs, metrics, traces, packet captures, and Kubernetes debugging tools.
Experience working with CI/CD pipelines and collaborating directly with Agile/Scrum development teams.
Familiarity with Azure cloud services and hybrid infrastructure environments.
Experience with technologies such as HashiCorp Vault, Kafka/AMQ, Redis, and Keycloak is preferred.
Strong communication skills and ability to work collaboratively across teams.
Bachelor’s degree in computer science or related field, or equivalent practical experience.

Preferred Qualities

Self-motivated engineer with a strong desire to learn and continuously improve.
Ability to thrive in fast-paced, highly collaborative enterprise environments.
Experience working in heavily audited or compliance-focused organizations is a plus.

Interview Process

3 rounds total:
- Round 1: Virtual interview with Hiring Manager
- Round 2: Virtual panel interview with team members
- Round 3: Final onsite interview in Reston, VA
Entire interview process expected to be completed within 7–10 days.

Compensation

Target base salary range: $175,000 – $185,000
Bonus: 7.5% – 10% performance-based
Exceptional candidates may be considered for higher compensation.

Skills

AMQAzureBashDatadogGrafanaHashiCorp VaultKeycloakKubernetesKafkaOpenShiftPrometheusPythonRedisSite Reliability EngineeringGoPowerShell

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer (SRE)

About the role

About Us

Key Responsibilities

Required Qualifications

Preferred Qualities

Interview Process

Compensation

Skills

Similar roles

Fullstack Software Architect / Lead Engineer

Java Backend Engineer (all gender)

Backend Engineer (Bangalore)

Don't send a generic resume