AL
Sr. Mechanical Engineer, Annapurna Labs, Machine Learning Hardware
Annapurna Labs (U.S.) Inc.
Seattle · On-site Full-time Senior $159k – $215k/yr Today
About the role
Job Overview – Thermal/Mechanical Engineer (Machine‑Learning Acceleration, Annapurna Labs – AWS)
| Location | Austin, TX or Seattle, WA (U.S.) |
|---|---|
| Salary range | $159,200 – $215,300 USD per year (base) + sign‑on, RSUs, full Amazon benefits |
| Team | Annapurna Labs (AWS) – silicon, system‑level, and software organization that builds the custom chips and accelerators powering AWS (Graviton, Nitro, Inferentia, Trainium, etc.). |
| Mission | Design, model, and validate thermal‑mechanical solutions for the world’s largest data‑center servers and custom SoCs, ensuring performance, reliability, cost, and power‑efficiency at massive scale. |
Key Responsibilities
- End‑to‑end thermal/mechanical design for air‑cooled and liquid‑cooled server platforms and SoC packages, from concept through high‑volume production.
- Thermal modeling & simulation (CFD, compact RC models) of chips, packages, and full systems; develop and validate transient thermal response models.
- Performance measurement & characterization on silicon, boards, and complete servers (power, temperature, airflow, coolant flow, etc.).
- Cross‑functional collaboration with global hardware teams, ODMs, heatsink vendors, and internal design groups to triage, debug, and resolve thermal issues.
- Specification development for product teams, providing detailed thermal budgets, mechanical constraints, and cooling‑system requirements.
- Prototype and test thermal control strategies (fan/valve algorithms, liquid‑cooling loops, TIM selection) and integrate hardware/software power‑management schemes.
- Optimization of thermal solutions under PPA (performance‑power‑area) constraints, balancing cost, manufacturability, and reliability.
Required Qualifications (Must‑Have)
| Area | Details |
|---|---|
| Education | BS or MS in Mechanical/Thermal Engineering |
| Experience | • 10+ years in mechanical & thermal design of systems (servers, data‑center hardware) • 3+ years specifically in SoC thermal modeling & IC‑package transient response |
| Technical Skills | • Thermal & performance measurement on SoCs, boards, servers • Design of air‑cooled and liquid‑cooled solutions • Collaboration across multiple sites, ODMs, and vendor partners |
| Tools & Methods | Familiarity with CFD/thermal analysis tools (Ansys Icepak, FloTherm, Cadence Celsius) and mechanical CAD (PTC Creo, SolidWorks) is expected. |
| Programming / Scripting | Bash, Shell, Python (Linux environment) – useful for automation of simulations, data‑analysis, and test rigs. |
Preferred / Nice‑to‑Have Skills
- Deep knowledge of SoC thermal/mechanical design methodology, power‑modeling, and thermal‑analysis techniques.
- Hands‑on experience with fans, valves, chillers, CDUs, and liquid‑cooling infrastructure.
- Expertise in heatsink technologies, TIMs, and liquid‑cooling solutions (e.g., cold plates, micro‑channel coolers).
- Ability to develop compact RC thermal models and integrate them into system‑level simulations.
- Understanding of hardware‑/software‑based thermal/power management algorithms (e.g., dynamic fan control, coolant flow regulation).
- Proven track record of optimizing thermal solutions under PPA constraints and delivering cost‑effective designs for high‑volume production.
What You’ll Be Working On
- Next‑generation AWS chips (e.g., Inferentia, Trainium) and the server platforms that host them.
- Air‑cooled and liquid‑cooled data‑center racks that must meet strict thermal budgets while supporting ever‑growing compute density.
- Thermal‑control firmware/software that interacts with the silicon’s power‑management block to keep temperatures within safe limits under variable workloads.
- Collaboration with external ODMs and internal vendor teams to qualify new heatsink designs, TIMs, and cooling loops for production.
How to Position Yourself as a Strong Candidate
| Area | What to Highlight in Your Resume / Application |
|---|---|
| Thermal Modeling | Specific projects where you built CFD or compact RC models for SoCs or server boards; include tools used (Icepak, FloTherm, Celsius) and validation results (e.g., < 5 % error vs. hardware). |
| Air & Liquid Cooling | Experience designing both air‑cooled heat sinks and liquid‑cooled cold plates; mention flow rates, pressure drop calculations, and any novel TIM selections. |
| Measurement & Validation | Detail hands‑on thermal‑characterization (thermal imaging, IR cameras, thermocouples, power‑to‑temperature correlation) and how you closed the loop between model and silicon. |
| Cross‑Functional Leadership | Examples of leading multi‑site teams, working with ODMs, or managing vendor relationships to resolve thermal issues. |
| Programming / Automation | Scripts or tools you built in Python/Bash to automate simulation runs, data extraction, or test‑bench control. |
| Impact | Quantify outcomes (e.g., reduced thermal resistance by X %, enabled Y % higher compute density, saved $Z in cooling‑system cost). |
| Publications / Patents | Any technical papers, conference talks, or patents related to thermal management, especially for high‑performance computing. |
Interview Preparation Tips
- Deep‑Dive on Thermal Fundamentals – Be ready to discuss heat transfer modes, thermal resistance networks, transient vs. steady‑state analysis, and how you model them for a multi‑die package.
- CFD Case Study – Prepare a concise story: problem statement, simulation setup (mesh, boundary conditions), validation methodology, and final design decision.
- Liquid‑Cooling Design – Know the trade‑offs: coolant properties, pump sizing, pressure drop, reliability concerns, and how you integrate sensors/feedback loops.
- Collaboration Scenarios – Have examples of working with ODMs or vendor partners where you had to negotiate specifications, manage schedule risks, or resolve a “show‑stopper” thermal issue.
- Programming/Automation – Show a snippet or describe a script you wrote to batch‑run simulations or process thermal‑camera data. Emphasize reproducibility and version control.
- System‑Level Thinking – Demonstrate how you balance thermal constraints with power, performance, area, and cost (PPA) in a real product.
Next Steps
- Tailor your resume to the “Required” and “Preferred” sections above, using the same terminology (e.g., “SoC transient thermal response”, “Ansys Icepak”, “liquid‑cooled cold plate”).
- Write a concise cover letter that connects your 10+ years of experience to the mission of building AWS’s next‑gen data‑center hardware. Highlight one or two marquee projects that align with the job’s focus.
- Prepare a portfolio (if applicable) with simulation screenshots, thermal‑budget tables, and validation plots—ready to share if the recruiter asks.
- Apply through the Amazon Jobs portal (link provided in the posting) and be sure to select the correct location (Austin, TX or Seattle, WA).
Good luck! If you’d like help drafting a resume bullet list, a cover‑letter draft, or mock interview questions, just let me know.
Requirements
- BS or MS degree in Mechanical/Thermal Engineering
- 10+ years industry experience in Mechanical and Thermal design of Systems
- Experience in thermal and performance measurements and characterization on SoCs, Servers, and Systems
- 3+ years of experience SoC Thermal modelling and IC package transient thermal response
- Experience with Chip package, System Mechanical & Thermal design for air-cooled and liquid-cooled systems
- Collaborate effectively with teams spanning multiple sites and develop detailed specifications for product teams to use
- Work with ODMs, heatsink vendors, and internal design teams on cross-boundary triaging, debugging, and resolving issues across organization
- Familiarity with working in Linux environment is an added advantage
- Working knowledge on fans, valves, chillers, and CDUs
- Knowledge of various types of technologies used for Heatsink solutions, Thermal Interface Materials (TIMs) and liquid cooling technologies
- Develop detailed CFD and compact RC models for SoC and Package thermal analysis
- Knowledge of hardware and software based thermal / power management control algorithms
- Optimize thermal solutions under PPA and system design constraints
- Validate thermal models through power/thermal measurements on Hardware
Responsibilities
- As a member of the Machine Learning Acceleration team you’ll be responsible for the design and optimization of hardware in our data centers
- You’ll provide leadership in the application of new technologies to large scale server deployments in a continuous effort to deliver a world-class customer experience
- This is a fast-paced, intellectually challenging position, and you’ll work with thought leaders in multiple technology areas
- As a Thermal/Mechanical Engineer, you design and build the systems that are the heart of the world's largest and most powerful computing infrastructure
- Simulate and prototype thermal control strategies
Benefits
health_insurancepaid_time_offdental_coverage
Skills
BashCadence CelsiusCFDLinuxLuaPythonShell scriptSolidworks
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free