Senior Site Reliability Engineer
Oracle
About the role
Overview: Site Reliability Engineering Excellence
Oracle is on the lookout for a Senior Site Reliability Engineer to enhance the architectural resilience and operational reliability of our key Software as a Service (SaaS) offerings. This role is pivotal in ensuring the continued availability, security, and compliance of our services for some of the world's leading organizations. As a Site Reliability Engineer, you will spearhead collaborative efforts across various teams within SaaS and Oracle Cloud Infrastructure (OCI), employing best practices and proven strategies to deliver cloud services that are not only reliable but also scalable for future needs.
Your primary responsibilities will include maintaining service uptime, automating processes, improving monitoring systems, and managing critical incidents within a 24x7 operational environment. We seek individuals with exceptional leadership abilities, a comprehensive understanding of technology stacks, and a history of managing large‑scale service operations—all with a focus on proactive system hardening and ongoing improvement. Excellent communication skills are vital, including the capability to engage with stakeholders across all levels of the organization, including executive management, and to mentor junior engineers.
Successful candidates will showcase proficiency in compliance frameworks, Linux environments, cloud networking, programming/scripting languages, and DevOps tools. Prior experience in managing secure cloud environments and providing web services to customers at scale is essential. A strong commitment to customer service and the ability to perform effectively under pressure are also key attributes we value.
Key Responsibilities
- Design, develop, and maintain robust, secure, and highly available cloud services.
- Drive collaborative efforts across teams to enhance service reliability, compliance, and operational excellence.
- Ensure service uptime through continuous monitoring, automated solutions, and immediate incident response.
- Implement site reliability engineering best practices focusing on automation, monitoring, and continuous enhancement.
- Address critical incidents efficiently and communicate effectively with diverse stakeholders.
- Guarantee compliance and enforce security measures in cloud environments.
- Mentor team members and promote a culture of technical leadership and growth.
- Engage directly with customers and stakeholders, including executive personnel, aligning technical solutions with business objectives.
- Support capacity planning, establish architectural standards, and harden systems for reliability.
- Utilize DevOps tools and scripting/programming to streamline operational processes.
About Us
Oracle is a leader in integrating data, infrastructure, applications, and expertise to drive innovations that can change lives. By embedding AI into our products and services, we empower customers to create a better future. We are dedicated to fostering a workplace that values diversity and offers competitive benefits to our employees, including options for health care, life insurance, and retirement planning. We also encourage community involvement through our volunteer initiatives.
Disclaimer
Certain US‑based roles may require compliance with specific health‑related mandates. The hiring range for this position is between $139,400 and $291,800 per year, with eligibility for bonuses and stock options.
Requirements
- Proficiency in compliance frameworks
- Proficiency in Linux environments
- Proficiency in cloud networking
- Proficiency in programming/scripting languages
- Proficiency in DevOps tools
- Prior experience in managing secure cloud environments
- Prior experience providing web services to customers at scale
Responsibilities
- Design, develop, and maintain robust, secure, and highly available cloud services.
- Drive collaborative efforts across teams to enhance service reliability, compliance, and operational excellence.
- Ensure service uptime through continuous monitoring, automated solutions, and immediate incident response.
- Implement site reliability engineering best practices focusing on automation, monitoring, and continuous enhancement.
- Address critical incidents efficiently and communicate effectively with diverse stakeholders.
- Guarantee compliance and enforce security measures in cloud environments.
- Mentor team members and promote a culture of technical leadership and growth.
- Engage directly with customers and stakeholders, including executive personnel, aligning technical solutions with business objectives.
- Support capacity planning, establish architectural standards, and harden systems for reliability.
- Utilize DevOps tools and scripting/programming to streamline operational processes.
Benefits
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free