All jobs

Site Reliability Engineer, GovCloud Incident Response Team. Job in Herndon Move Collective Jobs

Salesforce.Com Inc

On-site Full-time Mid Level $117k – $177k/yr 1mo ago

Apply with a tailored resume Save job

About the role

Application Guidance

To ensure a smooth application process, we recommend applying for no more than three roles within a 12‑month period to avoid any duplication in efforts.

Job Category

Software Engineering

About Salesforce

Salesforce is the leading AI CRM, where the fusion of human expertise and advanced technology drives exceptional customer success. At Salesforce, ambition meets action, technology meets trust, and innovation is integral to our culture. We're seeking passionate individuals, known as Trailblazers, who are committed to transforming business and society through AI and prioritizing Salesforce's core values.

Looking to enhance your career at the forefront of workforce transformation? You've found the right opportunity! Agentforce represents the future of AI, and we believe you can help shape that future at Salesforce.

Applications will be accepted until 08/22/2026.

Join our team and contribute to the operational excellence of Salesforce GovCloud!

Role Overview

Site Reliability Engineer – GovCloud
Are you dedicated to maintaining the reliability and performance of vital cloud services? Salesforce is eager to welcome a skilled Site Reliability Engineer to our vibrant team. This role will support our GovCloud environment and play a crucial part in achieving a 99.99% uptime for our customer‑facing services. Working within a collaborative and innovative culture, you will partner with talented engineers to tackle complex challenges and drive continuous process improvements.

Please note: This position requires passing a background investigation and obtaining the necessary U.S. government clearance. Further details will be shared during the interview process.

Shift Requirements: This role involves shift work, including night shifts as part of a 24/7 support team, with a rotating schedule and compensation for shift differentials.

About the Role

As a member of the Site Reliability team, you will be integral to our cloud operations, working relentlessly to ensure the availability and security of our services. You will specifically support the GovCloud Incident Response team, addressing daily alerts, providing hands‑on support, and leading thorough incident management efforts, including retrospectives and long‑term remediation strategies.

Responsibilities

Ensure 99.99% uptime of customer‑facing services by actively monitoring and maintaining system health, which directly influences customer satisfaction.
Play a vital role during major incidents (e.g., Sev0, Sev1) and engage in technical reviews for incident management.
Contribute to Problem Management by conducting Root Cause Analyses (RCAs) and collaborating with the Global Solutions team.
Ensure all Site Reliability activities are compliant with internal policies and guidelines.
Collaborate with technical staff to address complex technical issues and customer‑related concerns.
Mentor team members to stay informed about industry innovations and assist in team development.
Excel in a fast‑paced environment, adeptly managing multiple priorities and solving complex issues efficiently.
Automate the detection and resolution of recurring issues within the production environment.
Assist in refining existing processes to minimize operational and engineering downtime, including implementing AI‑driven automation for routine tasks.

Basic Requirements

Citizenship: U.S. citizen (either by birth or naturalization) without dual citizenship, agreeing to undergo a Minimum Background Investigation (MBI) for a Moderate Public Trust position.
Education: Bachelor's degree in Computer Science, Engineering, Information Technology, or a related field.
Experience: Systems engineering background in an enterprise‑scale internet service engineering or support role.
Technical Skills:
- Expertise in TCP/IP technologies (network protocols, programming, etc.).
- Proficiency in CLI enterprise support for Unix variants (Linux/Solaris/BSD) with significant experience in Red Hat Enterprise Linux and Solaris.
- Strong understanding of security systems and administration monitoring.
- Experience with AWS/C2S infrastructure and systems.
- Scripting proficiency in Python, Go, or other relevant languages.
Communication: Excellent written and verbal communication skills.
Incident Management: Experience in Incident Management and familiarity with ITIL service operations.
Availability: Willingness to participate in a 24/7 on‑call rotation for large data center operations and shift work.

Preferred Qualifications

Experience with Chef/Puppet or automated deployment tools to optimize infrastructure management.
Experience with Jenkins/Bamboo/Spinnaker pipeline executions for continuous integration and deployment.
Experience in monitoring and alert systems maintenance for proactive issue detection.
Support and maintain Java applications.
Hands‑on experience with AWS configuration and management using CLI/SDKs.
Certifications in Linux+, RedHat, and AWS are valuable.
Experience supporting Kubernetes‑based applications and services.
Familiarity with Agile Process and DevOps methodologies to enhance collaboration.
Participation in blameless retrospectives and post‑incident investigations, with a focus on utilizing AI for root cause analysis and pattern identification.
Knowledge of resilience engineering concepts, with a particular interest in how to leverage AI for proactive risk assessment.
Familiarity with AI/ML tools for operational insights, predictive maintenance, or intelligent automation.
Experience with data analysis and visualization tools for interpreting AI‑generated insights.

Compensation

The typical base salary range for this role is $117,200 – $176,700 annually.
This range represents base salary only and does not include potential bonuses, incentives for sales roles, equity, or applicable benefits.
In the U.S., compensation is determined by factors like location, job level, relevant skills, and experience. Specific roles may offer additional incentives, equity, and benefits.

Benefits

Salesforce provides various benefits to promote well‑being, including:

Time off
Medical, dental, vision
Mental health support
Parental leave
Life and disability insurance
401(k)
Employee stock purchasing options

More details are available regarding company benefits.

Unleash Your Potential

Joining Salesforce means you can achieve greatness in all aspects of life. Our benefits and resources will empower you to find balance and perform at your best, while our AI agents amplify your impact. Together, we can harness the power of Agentforce to provide remarkable experiences to organizations of all sizes. Apply today to shape the future for yourself, AI, and the world.

Accommodations

If you need assistance due to a disability in applying for open positions, please submit a request via the Accommodations Request Form.

Posting Statement

Salesforce is committed to equal opportunity employment and maintains a non‑discrimination policy for all employees and applicants. This means we believe in equality for all, leading to an inclusive workplace free from discrimination. Know your rights: workplace discrimination is illegal. Evaluation of employees or applicants will be based on merit, competence, and qualifications—without regard to race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, disability, veteran or marital status, political viewpoint, or other classifications protected by law. This policy applies to all employees and applicants, throughout the hiring process and beyond. Recruiting, hiring, and promotion decisions are rooted in fairness and merit.

Skills

AWSAWS/C2SCLIGoITILJavaJenkinsLinuxLinux+PuppetPythonRed Hat Enterprise LinuxSolarisTCP/IP

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free