Principal Software Engineer - Manufacturing & Factory
Nvidia
About the role
About NVIDIA
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology—and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self‑driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAN, you'll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.
NVIDIA has a rapidly expanding ecosystem of data‑center platform & node designs. From single‑node HGX/DGX systems all the way up to large multi‑node NVLink domain rack architectures. These designs have become core to NVIDIA's rapidly growing enterprise and cloud‑provider businesses. Each brings together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack.
We are searching for a highly motivated, technical leader to design, drive, and operationalize rack‑scale factory and deployment flows for next‑generation data‑center products. The ideal candidate will combine deep systems expertise, decisive technical leadership, and a passion for building reliable, debuggable, and scalable manufacturing and deployment solutions.
Responsibilities
- Lead and drive rack‑scale/L11 flows for factory and initial data‑center deployment.
- Design and implement end‑to‑end factory workflows, including firmware flashing sequences, security provisioning, and deployment of software mitigations.
- Collaborate with data‑center architects, ODMs, and OEMs to define factory and data‑center requirements that ensure efficient and reliable production ramp.
- Champion reliability, debuggability and optimization in firmware, diagnostic and deployment tool design.
- Drive pre‑silicon readiness for factory & manufacturing workflows for rack‑scale products using NVIDIA's industry‑leading simulation & emulation technology.
- Mentor architects and engineering teams to grow them into future leaders.
- Make key technical decisions even when faced with ambiguity.
Requirements
- BS or MS degree in Computer Engineering, Computer Science, or related field, or equivalent experience.
- 15+ years in the area of system architecture and design.
- Deep experience in designing architecture for scalable and performant server systems, particularly at the SW/HW interface.
- Strong understanding of networking technology & protocols (e.g., Ethernet, InfiniBand).
- Previous experience working with complex system software for accelerators such as GPUs, DPUs, or FPGAs.
- Expertise in out‑of‑band and in‑band management architectures.
- Knowledge of system management protocols such as Redfish and IPMI.
- Demonstrable experience in implementing left‑shift strategy to de‑risk program execution.
- Excellent written and verbal communication skills.
Ways to Stand Out
- Knowledge of large‑scale cloud and cluster‑level deployment and management systems.
- Demonstrated track record of leading data‑center products across the entire lifecycle, spanning inception, pre‑silicon development, post‑silicon bring‑up, manufacturing, and deployment.
Compensation & Benefits
- Base salary range: $272,000 – $431,250 USD (determined based on location, experience, and comparable employee pay).
- Eligibility for equity and benefits.
Application Details
- Applications will be accepted at least until March 25, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
Equal Opportunity Statement
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. We highly value diversity in our current and future employees and do not discriminate (including in hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.
Requirements
- Deep experience in designing architecture for scalable and performant server systems, particularly at the SW/HW interface.
- Strong understanding of networking technology & protocols (e.g. Ethernet, Infiniband)
- Previous experience working with complex system software for accelerators such as GPUs, DPUs, or FPGAs
- Expertise in out-of-band and in-band management architectures.
- Knowledge of system management protocols such as Redfish and IPMI.
- Demonstrable experience in implementing left shift strategy to de-risk program execution.
- Excellent written and verbal communication skills.
Responsibilities
- Lead and drive rack-scale/L11 flows for factory and initial data center deployment.
- Design and implement end-to-end factory workflows, including firmware flashing sequences, security provisioning, and deployment of software mitigations.
- Collaborate with data center architects, ODMs, and OEMs to define factory and data center requirements that ensure efficient and reliable production ramp.
- Champion reliability, debuggability and optimization in firmware, diagnostic and deployment tool design.
- Drive pre-silicon readiness for factory & manufacturing workflows for rack-scale products.
- Mentor architects and engineering teams to grow them into future leaders.
- Make key technical decisions even when faced with ambiguity.
Benefits
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free