Infrastructure Reliability Engineer
Amazon Web Services (AWS)
About the role
About the Role
As an Infrastructure Reliability Engineer, you will be proactively driving the reliability risk identification, assessment, and mitigation for datacenter infrastructure equipment (e.g., LV Generator, MV Transformers, LV SWGR, Breakers, UPS, HV Transformers). You will also be responsible for root cause analysis of critical equipment failures and drive continuous improvements to enhance datacenter availability for AWS customers. You will collaborate closely with internal and external partners, including suppliers, to drive key aspects of product specification, risk identification plans, and execution. Success in this role requires an ownership mindset, independence, and an action- and results-oriented approach within an open, collaborative environment.
Candidate Profile
The ideal candidate will have experience using a Physics-of-Failure-based approach to develop and implement both analytical and empirical methods for product quality/reliability risk identification and assessment during product design, manufacturing, and deployment stages. You should be capable of driving AWS application-specific requirements in lifecycle environmental and operational stress analysis, including thermal, electrical, chemical, and mechanical stresses, to identify overstress and fatigue-related product weaknesses. The role also involves evaluating product design quality/reliability risks and assessing electronics manufacturing process-related quality/reliability issues. Knowledge of statistical techniques and models is essential for analyzing test and field data.
At the component level, you will drive critical component identification and associated vendor selection and qualification requirements. You will leverage knowledge of process capability for electronic component production and system-level performance requirements to establish critical-to-quality and reliability metrics.
At the system level, you will develop datacenter system-level reliability models and conduct related reliability quantification and risk analysis for datacenter configuration optimization. Familiarity with system reliability engineering tools such as reliability block diagrams, statistical modeling, and data analytics is expected.
During the sustaining stage, you will be responsible for monitoring product performance in the field, driving root cause analysis of critical failures, and implementing associated corrective and preventive actions. You should also be able to drive effective vendor auditing and quarterly review processes to foster continuous improvements in datacenter availability.
Qualifications
Basic Qualifications
- 6+ years of data center design, construction, operations, or facility maintenance experience
- 6+ years of industrial or commercial engineering in mission-critical facilities, including but not limited to: data centers, power generation, or oil and gas facilities experience
- Bachelor's degree in Engineering or a related field
- Knowledge of critical data center mechanical and electrical equipment
- Experience in data center design, construction, operations, or facility maintenance
- Experience in industrial or commercial engineering in mission-critical facilities, including but not limited to: data centers, power generation, or oil and gas facilities
- Experience researching new designs, technologies, and construction methods for data center equipment and facilities
- 5+ years of root cause analysis, troubleshooting, or problem-solving experience
- 10+ years of Reliability Engineering work experience in a high-reliability industry
- 3+ years of experience with accelerated life testing, stress analysis, and finite element analysis
Preferred Qualifications
- Professional Engineering or Architectural License
- Knowledge of building codes and regulations for your region
- Experience carrying design concepts through exploration, development, and into deployment or mass production
- Experience reading, interpreting, and creating construction drawings, specifications, and submittal documents
- Bachelor's degree in Electrical or Mechanical Engineering, Engineering Technology, Reliability Engineering, or 10+ years of managing, analyzing, and communicating results to senior leadership experience
- Master's or Ph.D. in Reliability Engineering, Physics, Electrical, Mechanical, or Materials Engineering, or a related field
- Experience with proactive and effective reliability approaches in a cost-effective manner throughout product design, manufacture, and deployment stages
- Proven experience in working with external design and manufacturing supply chain partners
- Familiarity with major data center infrastructure equipment reliability performance
- Ability in managing multiple qualification activities and development schedules
About AWS Infrastructure Services
AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. We are the team that keeps the cloud running, supporting all AWS data centers and the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We tackle challenging problems with thousands of variables impacting the supply chain and are looking for talented individuals to join us.
You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. You’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
Why AWS
Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating—that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Our Culture
Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empowers us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.
Mentorship and Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship, and other career-advancing resources here to help you develop into a better-rounded professional.
Compensation and Benefits
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free