Sr. Principal Infrastructure Services
Northern Trust
About the role
About the Role
As a Senior Principal Site Reliability Engineer at Northern Trust, you will focus on developing observability and automation to ensure the reliability and performance of the company's systems and services. Your expertise in software engineering and system operations will drive continuous improvements in platform reliability. This role will involve working with cross‑functional teams to enhance the efficiency of services and bring complete observability across all technologies.
Key Responsibilities
- Lead the design and evolution of highly reliable, scalable, and performant distributed systems
- Partner with engineering and architecture teams to influence system design decisions
- Drive an automation‑first approach by designing and developing tools and platforms
- Participate in and lead incident response for production systems
- Architect and implement end‑to‑end observability across systems
- Identify reliability gaps through data analysis and drive improvement initiatives
- Create and maintain clear documentation and knowledge sharing practices
- Collaborate with product, development, platform, security, and operations teams
- Manage and prioritize multiple reliability‑focused initiatives
Qualifications Required
- Bachelors degree in Computer Science, Engineering, or related discipline
- 15+ years of progressive experience in systems engineering with a strong emphasis on site reliability
- 7+ years of experience in a technical leadership role
- Strong proficiency in one or more modern programming languages
- Hands‑on experience with containerization and container orchestration technologies
- Proven ability to design and implement observability solutions
- Deep understanding of distributed systems, networking fundamentals, and modern software architectures
- Exceptional problem‑solving skills and stakeholder orientation
- Prior experience designing and delivering Infrastructure as Code (IaC)
- Demonstrated success in mentoring and developing technical teams
- Hands‑on expertise in implementing automated remediation and corrective actions
Additional Overview
This role at Northern Trust offers you the opportunity to play a pivotal part in ensuring the reliability and performance of the company's systems and services. Your contributions will help drive continuous improvements in platform reliability and efficiency, making a meaningful impact on the organization's success. As a Senior Principal Site Reliability Engineer at Northern Trust, you will focus on developing observability and automation to ensure the reliability and performance of the company's systems and services. Your expertise in software engineering and system operations will drive continuous improvements in platform reliability. This role will involve working with cross‑functional teams to enhance the efficiency of services and bring complete observability across all technologies.
Key Responsibilities
- Lead the design and evolution of highly reliable, scalable, and performant distributed systems
- Partner with engineering and architecture teams to influence system design decisions
- Drive an automation‑first approach by designing and developing tools and platforms
- Participate in and lead incident response for production systems
- Architect and implement end‑to‑end observability across systems
- Identify reliability gaps through data analysis and drive improvement initiatives
- Create and maintain clear documentation and knowledge sharing practices
- Collaborate with product, development, platform, security, and operations teams
- Manage and prioritize multiple reliability‑focused initiatives
Qualifications Required
- Bachelors degree in Computer Science, Engineering, or related discipline
- 15+ years of progressive experience in systems engineering with a strong emphasis on site reliability
- 7+ years of experience in a technical leadership role
- Strong proficiency in one or more modern programming languages
- Hands‑on experience with containerization and container orchestration technologies
- Proven ability to design and implement observability solutions
- Deep understanding of distributed systems, networking fundamentals, and modern software architectures
- Exceptional problem‑solving skills and stakeholder orientation
- Prior experience designing and delivering Infrastructure as Code (IaC)
- Demonstrated success in mentoring and developing technical teams
- Hands‑on expertise in implementing automated remediation and corrective actions
Closing Statement
This role at Northern Trust offers you the opportunity to play a pivotal part in ensuring the reliability and performance of the company's systems and services. Your contributions will help drive continuous improvements in platform reliability and efficiency, making a meaningful impact on the organization's success.
Requirements
- 15+ years of progressive experience in systems engineering with a strong emphasis on site reliability
- 7+ years of experience in a technical leadership role
- Strong proficiency in one or more modern programming languages
- Hands-on experience with containerization and container orchestration technologies
- Proven ability to design and implement observability solutions
- Deep understanding of distributed systems, networking fundamentals, and modern software architectures
- Prior experience designing and delivering Infrastructure as Code (IaC)
- Demonstrated success in mentoring and developing technical teams
- Hands-on expertise in implementing automated remediation and corrective actions
Responsibilities
- Lead the design and evolution of highly reliable, scalable, and performant distributed systems
- Partner with engineering and architecture teams to influence system design decisions
- Drive an automation-first approach by designing and developing tools and platforms
- Participate in and lead incident response for production systems
- Architect and implement end-to-end observability across systems
- Identify reliability gaps through data analysis and drive improvement initiatives
- Create and maintain clear documentation and knowledge sharing practices
- Collaborate with product, development, platform, security, and operations teams
- Manage and prioritize multiple reliability-focused initiatives
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free