Skip to content
mimi

Site Reliability Engineer (SRE)

Technology Ventures

Reston · Hybrid Full-time Senior $175k – $185k/yr 2w ago

About the role

About Us

We are seeking a highly skilled and hands-on Site Reliability Engineer (SRE) with deep Kubernetes expertise to support and enhance our enterprise platform engineering environment. This role is ideal for a self-starter who enjoys learning, solving complex infrastructure challenges, improving observability, and partnering closely with engineering teams to streamline CI/CD and platform operations.

The ideal candidate will have strong experience managing Kubernetes environments, preferably Red Hat OpenShift in an on-premises enterprise setting, along with a passion for automation, reliability engineering, and operational excellence.

Key Responsibilities

  • Manage, maintain, and optimize Kubernetes/OpenShift platform environments to ensure high availability, scalability, and operational reliability.
  • Provide ongoing “care and feeding” of Kubernetes clusters, including cluster administration, upgrades, troubleshooting, and performance tuning.
  • Improve end-to-end observability across the platform using tools such as Grafana, Prometheus, and Datadog.
  • Lead incident response efforts, root cause analysis, and postmortems to continuously improve platform reliability and resiliency.
  • Partner closely with Scrum and development teams to support CI/CD pipelines, deployments, routing, configuration management, and troubleshooting.
  • Build and maintain automation and deployment pipelines that support engineering and development teams.
  • Develop scripts and automation solutions using Bash, Python, Go, or PowerShell to reduce manual intervention and improve operational efficiency.
  • Support and maintain platform services such as HashiCorp Vault, AMQ/Kafka, Keycloak, and related infrastructure components.
  • Create and maintain technical documentation, operational procedures, deployment guides, and incident response plans.
  • Participate in an on-call rotation and support production environments as needed.

Required Qualifications

  • 5–7+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or related infrastructure engineering roles.
  • Deep hands-on experience with Kubernetes administration and troubleshooting.
  • Strong experience with Red Hat OpenShift, including operators, ingress/routing, and cluster management.
  • Experience supporting enterprise infrastructure in on-premises environments.
  • Strong scripting and automation skills using Bash and/or Python.
  • Experience with observability and monitoring tools such as Grafana, Prometheus, and Datadog.
  • Experience troubleshooting complex production issues using logs, metrics, traces, packet captures, and Kubernetes debugging tools.
  • Experience working with CI/CD pipelines and collaborating directly with Agile/Scrum development teams.
  • Familiarity with Azure cloud services and hybrid infrastructure environments.
  • Experience with technologies such as HashiCorp Vault, Kafka/AMQ, Redis, and Keycloak is preferred.
  • Strong communication skills and ability to work collaboratively across teams.
  • Bachelor’s degree in computer science or related field, or equivalent practical experience.

Preferred Qualities

  • Self-motivated engineer with a strong desire to learn and continuously improve.
  • Ability to thrive in fast-paced, highly collaborative enterprise environments.
  • Experience working in heavily audited or compliance-focused organizations is a plus.

Interview Process

  • 3 rounds total:
    • Round 1: Virtual interview with Hiring Manager
    • Round 2: Virtual panel interview with team members
    • Round 3: Final onsite interview in Reston, VA
  • Entire interview process expected to be completed within 7–10 days.

Compensation

  • Target base salary range: $175,000 – $185,000
  • Bonus: 7.5% – 10% performance-based
  • Exceptional candidates may be considered for higher compensation.

Skills

AMQAzureBashDatadogGrafanaHashiCorp VaultKeycloakKubernetesKafkaOpenShiftPrometheusPythonRedisSite Reliability EngineeringGoPowerShell

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free