Senior Site Reliability Engineer (SRE)
Oracle
About the role
About Oracle Health
Be a part of something groundbreaking at Oracle Health as we create a new organization, Oracle Health Data, Analytics Platform. Join us in building innovative products that transform healthcare technology and create a lasting impact on billions of lives. We encourage an entrepreneurial spirit where creativity thrives and where your contributions will directly help establish a world-class engineering center focused on excellence.
You'll have the unique opportunity to:
- Reach billions of people with our products and services.
- Develop technology that truly makes a difference.
- Make a substantial impact on technology development.
- Experience unlimited growth potential through inspiring work.
- Collaborate with top talent in the industry.
- Work in a diverse, open, and highly productive environment.
About The Job
In this pivotal role, you will provide technical leadership for our core data platforms within Oracle Health's Data & Analytics Platform. As a Senior Site Reliability Engineer (SRE), you will own mission-critical systems that serve multiple products and teams.
Your responsibilities will include supporting the design and operation of large-scale, stateful distributed platforms utilizing Hadoop ecosystem components (HDFS, YARN, HBase) deployed on Oracle Big Data Service (BDS), as well as Kafka and Storm. These multi-tenant platforms will be managed through automation using Ansible and Terraform, requiring a strong architectural approach to effectively handle scale, modifications, and significant impact.
What You'll Do
Platform Ownership & Technical Leadership
- Oversee the reliability, scalability, and operability of shared data platforms.
- Define platform standards, architectural direction, and operational guidelines.
- Influence cross-team technical decisions and the long-term platform strategy.
- Advance platform evolution and drive reliability strategies across the data ecosystem.
Architecture & Design
- Articulate system behavior, dependencies, and potential failure modes.
- Balance reliability, performance, cost, and complexity in design decisions.
- Provide guidance that enables effective and safe use of platforms by downstream teams.
Operations Engineering
- Establish capacity models, scaling strategies, and operational best practices.
- Design platforms to operate predictably under varying loads, failures, and changes.
- Manage platform lifecycle events, including upgrades and recovery.
Distributed Systems Expertise
- Operate and evolve stateful distributed systems with critical data management.
- Understand failure modes, including backpressure and replication lag.
Security
- Maintain secure Kerberized platforms, focusing on authentication and secure communications.
- Integrate security as a fundamental aspect of architectural design.
Automation
- Develop and enhance an automation framework using Ansible and Terraform.
- Treat automation processes like production software—ensure they are versioned and rigorously tested.
- Reduce operational workload by embedding reliability measures into the platform.
Incident Leadership & Prevention
- Act as the principal point of contact for complex or ambiguous incidents.
- Aim to eliminate recurring failure types rather than resolving isolated issues.
Representation
- Represent SRE and platform engineering in high-stakes discussions.
- Communicate effectively with engineering leaders and partner teams.
Responsibilities
The team collaborates within the Oracle Health Data & Analytics Platform, supporting HealtheIntent, one of Oracle Health's cornerstone products. We manage big data and streaming environments that empower teams to create reliable, customer-facing solutions while enhancing operational efficiency.
Required Experience
- 4+ years operating large-scale, customer-facing distributed platforms.
- In-depth experience with HDFS, YARN, HBase, Kafka, Storm, or similar technologies.
- Strong background in Linux, networking, and troubleshooting distributed systems.
- Skilled in Infrastructure-as-Code using Ansible and Terraform.
- Ability to automate processes using Python, Ruby, and Bash.
- Hands-on experience with Kerberized environments.
- Proven track record in defining technical architecture for complex systems.
- Experience managing shared platforms with extensive impacts and numerous downstream users.
- Expertise in observing and modeling capacities for distributed systems.
Required Qualifications:
- U.S. Citizenship and eligibility for a Federal Security Clearance.
- 5+ years of relevant technical experience.
- Excellent communication skills and the ability to build rapport with colleagues.
- Bachelor's or Master's degree in Computer Science, or equivalent.
Salary and Benefits
Hiring Range in USD: $79,100 to $158,200 per annum. There may be eligibility for bonus and equity.
Oracle provides a wide salary range for its roles to consider variations in knowledge, skills, experience, market conditions, and locations while ensuring internal equity.
Oracle US offers a comprehensive benefits package that includes:
- Medical, dental, and vision insurance, including expert medical opinion.
- Short-term and long-term disability coverage.
- Life insurance and AD&D.
- Supplemental life insurance (Employee/Spouse/Child).
- Health and dependent care Flexible Spending Accounts.
- Pre-tax commuter and parking benefits.
- 401(k) Savings and Investment Plan with company match.
- Paid time off: Flexible Vacation policy for salaried employees, and specified vacation accrual for eligible hourly employees.
- 11 paid holidays annually.
- Paid sick leave of 72 hours upon hire, refreshing each calendar year.
- Paid parental leave.
- Adoption assistance.
- Employee Stock Purchase Plan.
- Financial planning and group legal assistance.
- Voluntary benefits, including auto, homeowner, and pet insurance.
About Us
Oracle is at the forefront of innovation, maintaining a unique position by unifying data, infrastructure, applications, and expertise. With AI integrated throughout our offerings, we empower customers to build a better future for all. Join us in shaping this future and discover limitless potential at a company leading advancements in AI and cloud solutions impactful to billions.
Oracle is committed to supporting a diverse and inclusive workforce. We offer competitive benefits and encourage community engagement through our volunteer programs. Accessibility is a priority; if you need assistance or accommodation during the application process, please let us know.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will be considered for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, veteran status, or any other characteristic protected by law.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free