Skip to content
mimi

Sr. Site Reliability Engineer (Hybrid)

Broadridge

Newark · On-site Full-time Senior $100k – $110k/yr 1w ago

About the role

About

At Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your career, while helping others along the way, come join the Broadridge team. We are seeking a Senior Site Reliability Engineer (SRE) to design, build, and operate highly reliable, scalable, and secure platforms supporting business-critical applications across hybrid (on-prem and cloud) environments. This role blends software engineering, systems engineering, and operational excellence, with a strong focus on automation, resiliency, observability, and cost efficiency. The SRE will partner closely with application development, infrastructure, security, and product teams to reduce operational toil, improve system reliability, and enable faster, safer delivery of services.

Responsibilities

  • Reliability & Resiliency Engineering

    • Design and implement high-availability, fault-tolerant architectures across on-prem and cloud platforms (AWS).
    • Lead multi-region DR planning, implementation, and testing, including RTO/RPO definition and validation.
    • Define and enforce SLOs, SLIs, and error budgets to balance reliability with delivery velocity.
    • Drive self-healing automation and proactive remediation strategies.
  • Automation & Infrastructure as Code

    • Build and maintain infrastructure using Terraform and configuration management tools (e.g., Chef).
    • Develop automation to eliminate manual operational tasks (TOIL reduction).
    • Create reusable modules, pipelines, and guardrails for standardized deployments.
    • Automate certificate lifecycle management, key rotation, and security updates.
  • Observability & Monitoring

    • Design and implement end-to-end observability (metrics, logs, traces, synthetic monitoring).
    • Build dashboards, alerts, and runbooks to enable fast detection and resolution of incidents.
    • Improve signal-to-noise ratio in alerting to reduce operational fatigue.
    • Perform root cause analysis (RCA) and lead post-incident reviews with actionable follow-ups.
  • Cloud & Platform Engineering

    • Engineer and operate platforms on AWS, including services such as:
      • EKS, EC2, RDS/Aurora, Lambda, API Gateway
      • CloudFront, WAF, ALB/NLB
      • CloudWatch, X-Ray, IAM, Secrets Manager
    • Lead cloud migrations and modernization initiatives, including legacy system refactoring.
    • Implement secure networking patterns (VPCs, private subnets, controlled egress).
  • Performance, Scalability & Cost Optimization

    • Identify and resolve performance bottlenecks through testing and analysis.
    • Drive FinOps initiatives to optimize infrastructure cost without compromising reliability.
    • Implement capacity planning and autoscaling strategies.
  • CI/CD & SDLC Enablement

    • Design and support CI/CD pipelines enabling safe, repeatable deployments.
    • Embed reliability practices into the SDLC (testing, rollout strategies, rollback).
    • Partner with development teams to improve operability of applications before production.
  • Security & Compliance

    • Partner with security and legal teams to meet regulatory and compliance requirements (e.g., data residency, GDPR-related controls).
    • Implement secure access controls, secrets management, and encryption best practices.
    • Participate in security reviews, audits, and risk assessments.
  • Leadership & Collaboration

    • Act as a technical leader and mentor for engineers transitioning into SRE roles.
    • Influence architecture and design decisions across multiple teams.
    • Communicate effectively with engineering leadership, product owners, and non-technical stakeholders.
    • Drive a culture of operational excellence, blameless postmortems, and continuous improvement.

Qualifications

  • 3+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Systems Engineering
  • Strong programming experience in Python, Java, or similar languages
  • Deep experience with Linux/Unix systems
  • Hands‑on expertise with AWS and cloud‑native architectures
  • Proven experience with Terraform and Infrastructure as Code
  • Strong understanding of networking, security, and distributed systems
  • Experience operating mission-critical, high-volume platforms

Preferred Qualifications

  • Experience in financial services or highly regulated environments
  • Experience with EKS/Kubernetes at scale
  • Familiarity with Chaos Engineering and resilience testing
  • Experience leading cloud cost optimization (FinOps) initiatives
  • Prior experience transitioning traditional infrastructure teams into SRE practices

Compensation

  • Salary Range: $100,000 - $110,000 USD
    (Broadridge considers various factors when evaluating a candidate's final salary including, but not limited to, relevant experience, skills, and education.)

  • Bonus Eligibility: Bonus Eligible

Benefits

  • Please visit www.broadridgebenefits.com for information on our comprehensive benefit offerings for this role.
  • All Colorado employees receive paid sick leave in compliance with the Colorado Healthy Families and Workplaces Act and other legally required benefits, as applicable.

Application Details

  • Apply by clicking the application link and submitting your information.
  • The deadline to apply for this role is May 1st, 2026.
  • #LI-PP1

Inclusion & Culture

We are dedicated to fostering a collaborative, engaging, and inclusive environment and are committed to providing a workplace that empowers associates to be authentic and bring their best to work. We believe that associates do their best when they feel safe, understood, and valued, and we work diligently and collaboratively to ensure Broadridge is a company—and ultimately a community—that recognizes and celebrates everyone's unique perspective.

Use of AI in Hiring

As part of the recruiting process, Broadridge may use technology, including artificial intelligence (AI)-based tools, to help review and evaluate applications. These tools are used only to support our recruiters and hiring managers, and all employment decisions include human review to ensure fairness, accuracy, and compliance with applicable laws. Please note that honesty and transparency are critical to our hiring process. Any attempt to falsify, misrepresent, or disguise information in an application, resume, assessment, or interview will result in disqualification from consideration.

EEOC Notice (U.S. Applicants)

US applicants: Click here to view the EEOC "Know Your Rights" poster.

Disability Assistance

We recognize that ensuring our long-term success means creating an environment where everyone is welcome, where everyone's strengths are valued, and where everyone can perform at their best. Broadridge provides equal employment opportunities to all associates and applicants for employment without regard to race, color, religion, sex (including sexual orientation, gender identity or expression, and pregnancy), marital status, national origin, ethnic origin, age, disability, genetic information, military or veteran status, and other protected characteristics protected by applicable federal, state, or local laws.

If you need assistance or would like to request reasonable accommodations during the application and/or hiring process, please contact us at 888-237-7769 or by sending an email to BRcareers@broadridge.com.

Requirements

  • 3+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Systems Engineering
  • Strong programming experience in Python, Java, or similar languages
  • Deep experience with Linux/Unix systems
  • Hands-on expertise with AWS and cloud-native architectures
  • Proven experience with Terraform and Infrastructure as Code
  • Strong understanding of networking, security, and distributed systems
  • Experience operating mission-critical, high-volume platforms

Responsibilities

  • Design and implement high-availability, fault-tolerant architectures across on-prem and cloud platforms (AWS).
  • Lead multi-region DR planning, implementation, and testing, including RTO/RPO definition and validation.
  • Define and enforce SLOs, SLIs, and error budgets to balance reliability with delivery velocity.
  • Drive self-healing automation and proactive remediation strategies.
  • Build and maintain infrastructure using Terraform and configuration management tools (e.g., Chef).
  • Develop automation to eliminate manual operational tasks (TOIL reduction).
  • Create reusable modules, pipelines, and guardrails for standardized deployments.
  • Automate certificate lifecycle management, key rotation, and security updates.
  • Design and implement end-to-end observability (metrics, logs, traces, synthetic monitoring).
  • Build dashboards, alerts, and runbooks to enable fast detection and resolution of incidents.
  • Improve signal-to-noise ratio in alerting to reduce operational fatigue.
  • Perform root cause analysis (RCA) and lead post-incident reviews with actionable follow-ups.
  • Engineer and operate platforms on AWS, including services such as: EKS, EC2, RDS/Aurora, Lambda, API Gateway, CloudFront, WAF, ALB/NLB, CloudWatch, X-Ray, IAM, Secrets Manager.
  • Lead cloud migrations and modernization initiatives, including legacy system refactoring.
  • Implement secure networking patterns (VPCs, private subnets, controlled egress).
  • Identify and resolve performance bottlenecks through testing and analysis.
  • Drive FinOps initiatives to optimize infrastructure cost without compromising reliability.
  • Implement capacity planning and autoscaling strategies.
  • Design and support CI/CD pipelines enabling safe, repeatable deployments.
  • Embed reliability practices into the SDLC (testing, rollout strategies, rollback).
  • Partner with development teams to improve operability of applications before production.
  • Partner with security and legal teams to meet regulatory and compliance requirements (e.g., data residency, GDPR-related controls).
  • Implement secure access controls, secrets management, and encryption best practices.
  • Participate in security reviews, audits, and risk assessments.
  • Act as a technical leader and mentor for engineers transitioning into SRE roles.
  • Influence architecture and design decisions across multiple teams.
  • Communicate effectively with engineering leadership, product owners, and non-technical stakeholders.
  • Drive a culture of operational excellence, blameless postmortems, and continuous improvement.

Benefits

health insurancedental insurancevision insurancepaid sick leave

Skills

API GatewayALB/NLBAWSAWS CloudWatchAWS EC2AWS EKSAWS LambdaAWS Secrets ManagerAWS X-RayChefCloudFrontDockerIAMJavaKubernetesLinuxPythonRDS/AuroraTerraformUnixVPCWAF

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free