Skip to content
mimi

Principal Site Reliability Engineer

Oracle

Washington · On-site Full-time Lead 1w ago

About the role

Join our innovative team at Oracle Health, where we're building a modern, automated healthcare platform that enhances product deployment, sustainability, and troubleshooting. This is a unique opportunity to help establish a world-class engineering organization dedicated to excellence, innovation, and impactful real-world solutions.

As a Principal Site Reliability Engineer, you will be instrumental in operating and scaling a Clinical AI Assistant platform that serves healthcare professionals globally. Your contributions will enhance the quality, safety, and efficiency of care for billions of patients worldwide. This is a chance to influence the reliability and performance of crucial AI-driven systems relied upon in critical healthcare environments.

This role transcends conventional SRE functions, allowing you to apply AI/ML methodologies to develop AIOps solutions that proactively manage system reliability, detect anomalies, automate responses, and enhance service performance continually. You will help shape the future of reliability engineering within intelligent, AI-enhanced healthcare systems.

Key Responsibilities:

  • Lead the architecture, design, implementation, and operations of core platform and AI-driven system services.
  • Guarantee the reliability, availability, and performance of the Clinical AI Assistant platform used in healthcare settings.
  • Develop and manage AIOps capabilities such as intelligent alerting, anomaly detection, automated remediation, and predictive scaling.
  • Enhance systems through automation, self-healing methods, and real-time visibility.
  • Design and implement software to boost system scalability, efficiency, and durability.
  • Collaborate with cross-functional teams to prototype and deliver innovative platform services.
  • Lead capacity planning, demand forecasting, performance tuning, and cost optimization efforts.
  • Resolve complex challenges in cloud-native distributed systems and establish engineering best practices to prevent future issues.
  • Contribute to best practices in platform engineering, including infrastructure as code, CI/CD, and service reliability standards.
  • Keep abreast of emerging technologies in cloud, distributed systems, and AI/ML operations.

Essential Qualifications:

  • Must obtain and maintain a federal security clearance (US citizenship required).
  • Over 8 years of experience in Site Reliability Engineering, DevOps, or related fields.
  • Demonstrated experience managing large-scale, distributed production systems with high availability requirements.
  • Strong experience with container orchestration (Kubernetes, Docker, etc.).
  • Expertise in Infrastructure as Code (e.g., Terraform, Ansible).
  • Experience in building and maintaining CI/CD pipelines (Git Lab, Jenkins, etc.).
  • Proficiency in scripting and automation (Bash, Python, etc.).
  • Experience with at least one major cloud provider (OCI, AWS, Azure).
  • Strong expertise in Linux systems.
  • Experience with observability tools (monitoring, logging, tracing) and performance optimization.

Preferred Qualifications:

  • Experience supporting AI/ML or LLM-based systems in production.
  • Familiarity with AIOps, intelligent automation, or ML-driven observability.
  • Experience in healthcare or regulated environments (HIPAA compliance).
  • Background in high-throughput, low-latency systems for mission-critical workloads.
  • Software engineering experience in Java, Python, or C++.

Oracle offers a comprehensive range of benefits, including medical, dental, and vision coverage, retirement plans, paid time off, and support for volunteering opportunities. This is a pivotal role supporting a new line of business motivated by an entrepreneurial spirit and is an exceptional chance to make a significant impact in healthcare technology.

Skills

AnsibleAWSAzureBashCI/CDDockerGit LabInfrastructure as CodeJenkinsKubernetesLinuxOCIPythonTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free