Senior Site Reliability Engineer
UnitedHealth Group
About the role
About
Optum Tech is a global leader in health care innovation. Our teams develop cutting‑edge solutions that help people live healthier lives and help make the health system work better for everyone. From advanced data analytics and AI to cybersecurity, we use innovative approaches to solve some of health care's most complex challenges. Your contributions here have the potential to change lives. Ready to build the next breakthrough? Join us to start Caring. Connecting. Growing together.
As a Senior Site Reliability Engineer, you will improve the reliability, security, and efficiency of the Optum Consumer Payment Network. You will leverage modern cloud technologies, raise engineering standards, and advance observability while strengthening DevOps culture across engineering teams.
You'll enjoy the flexibility to work remotely * from anywhere within the U.S. as you take on some tough challenges.
Primary Responsibilities
- Enable teams to define, measure, and meet reliability goals (SLIs/SLOs) by strengthening post‑incident learning, reducing alert noise, and helping teams create and maintain quality runbooks
- Build and enhance shared observability capabilities (metrics, monitoring, logging, dashboards, and alerting) to support >99.95% availability for business‑critical applications
- Partner with software engineers across the organization to provide hands‑on guidance by establishing patterns for engineering excellence initiatives (zero‑downtime deployments, automated remediation)
- Use AI‑assisted tooling to improve engineering productivity (e.g., incident analysis, automation, and documentation)
- Provide 247 production support via a rotating on‑call schedule
Required Qualifications
- 5 years of experience with DevOps, security best practices, CI/CD, infrastructure as code (IaC) and observability (e.g., GitHub, Datadog, New Relic or Dynatrace, Terraform, PagerDuty)
- 3 years of experience operating production applications in hybrid environment (on‑premises and public cloud), including Kubernetes‑based workloads, in enterprise‑scale production environments
- 2 years of proficiency with a programming or scripting language for automation/tooling (e.g., .NET/C#, Java, Python, Go)
- 1 years of experience with AIOps and/or AI‑powered coding and analysis tools for faster RCA, alert noise reduction and anomaly detection
Preferred Qualifications
- Bachelor's or master's degree in computer science, software engineering, or a related field
- Working knowledge of cloud networking, cloud security, containerization, centralized logging, and monitoring
- Experience with cloud security controls such as DDoS protection, vulnerability management, and patching
- Experience with payment industry standards, protocols, and security best practices
- Solid foundation in Linux and/or Windows operating systems and troubleshooting tools
- All employees working remotely will be required to adhere to UnitedHealth Group's Telecommuter Policy.
Compensation & Benefits
Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with us, you'll find a far‑reaching choice of benefits and incentives.
The salary for this role will range from $91,700 to $163,700 annually based on full‑time employment. We comply with all minimum wage laws as applicable.
Application Details
- Application Deadline: This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected. Job posting may come down early due to volume of applicants.
Mission Statement
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone—of every race, gender, sexuality, age, location and income—deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes—an enterprise priority reflected in our mission.
Equal Employment Opportunity
UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations.
Drug‑Free Workplace
UnitedHealth Group is a drug‑free workplace. Candidates are required to pass a drug test before beginning employment.
Requirements
- 5 years of experience with DevOps, security best practices, CI/CD, infrastructure as code (IaC) and observability (e.g., GitHub, Datadog, New Relic or Dynatrace, Terraform, PagerDuty)
- 3 years of experience operating production applications in hybrid environment (on-premises and public cloud), including Kubernetes-based workloads, in enterprise-scale production environments
- 2 years of proficiency with a programming or scripting language for automation/tooling (e.g., .NET/C#, Java, Python, Go)
- 1 years of experience with AIOps and/or AI-powered coding and analysis tools for faster RCA, alert noise reduction and anomaly detection
Responsibilities
- Enable teams to define, measure, and meet reliability goals (SLIs/SLOs) by strengthening post-incident learning, reducing alert noise, and helping teams create and maintain quality runbooks
- Build and enhance shared observability capabilities (metrics, monitoring, logging, dashboards, and alerting) to support >99.95% availability for business-critical applications
- Partner with software engineers across the organization to provide hands-on guidance by establishing patterns for engineering excellence initiatives (zero-downtime deployments, automated remediation)
- Use AI-assisted tooling to improve engineering productivity (e.g., incident analysis, automation, and documentation)
- Provide 247 production support via a rotating on-call schedule
Benefits
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free