Forward Deployed Validation Engineer
Arena
About the role
Who we are
Arena Physica is building electromagnetic superintelligence. Our name is inspired by Theodore Roosevelt's 'Citizenship in a Republic' speech. To us, entering the Arena means committing fully and accepting the risk of failure in pursuit of an audacious, worthy cause. We believe the future belongs to those brave enough to build it. Our team of 50 combines AI engineering and applied physics expertise with deep experience in enterprise deployments. We're headquartered in NYC with presences in San Francisco and Los Angeles, backed by $62M from Initialized, Founders Fund, Goldcrest Capital, Fifth Down Capital, and Shield Capital. If you're ready to do the most important work of your career, join us in the Arena Physica.
What we do
Our AI platform Atlas operationalizes physics-grounded intelligence to verify, debug, and optimize hardware across its lifecycle. Atlas is already trusted globally by the world's most advanced hardware companies, including AMD, Anduril, Sivers Semiconductors, and Bausch & Lomb, for applications across R&D, integration testing, production assembly, and field repair.
About the role
As a Forward Deployed Validation Engineer, you will be the domain expert who makes Atlas indispensable for our customers' datacenter and cluster validation workflows. You'll work at the intersection of deep technical expertise in system validation and cluster testing, customer engagement, and product development — using Atlas to solve real problems for hardware validation teams at leading companies while translating those workflows and insights back to our software and ML teams. Most validation engineers work inside discrete platforms, executing program by program. Here, your expertise will be the training signal that compounds Atlas’s intelligence for every customer. You'll own outcomes across the technical, product, and customer dimensions. If you've ever wanted your domain knowledge to scale beyond your direct work, this is how. A truly unique role!
How you will contribute
Be the validation & performance expert
- Execute datacenter validation and cluster performance testing across GPU/CPU/memory/BIOS/BMC/networking/storage subsystems; benchmark, profile, and optimize system and cluster performance; debug complex hardware/firmware/software interactions and drive root-cause analysis.
Deploy Atlas with customers
- Embed at customer sites to validate datacenter hardware using Atlas as your primary tool, augmenting with your own expertise where needed. Build credibility through technical depth and results.
Codify and scale
- Your value here isn’t just what you fix in the field — it’s what you teach Atlas. Establish validation methodologies for Atlas across common subsystems and testing phases (EVT, DVT, PVT). Alongside these, translate customer workflows and pain points into product requirements and work closely with our engineering team to encode that expertise into Atlas. Every deployment should compound value for Atlas more broadly.
You have
Elite datacenter validation expertise
- 4+ years with AI/ML datacenter infrastructure, GPU cluster validation, or large-scale hardware validation at leading hardware companies or cloud providers; you're the person that hardware teams call to debug complex system issues.
Full-stack hardware debugging mastery
- Deep understanding of GPU/CPU architecture, memory subsystems, BIOS/UEFI/BMC firmware, high-speed interconnects (PCIe/CXL/InfiniBand/RoCE), NVMe storage, and power/thermal management; experience validating systems from deployment through production at node and cluster scale; proven track record debugging issues across hardware, firmware, drivers, and software in distributed ML infrastructure.
Performance optimization at scale
- Strong experience benchmarking and tuning GPU clusters at multiple scales (cluster/rack/node); expertise with profiling tools, GPU utilization patterns, memory bandwidth bottlenecks, interconnect performance, and distributed training efficiency.
Customer-facing technical leadership
- You earn trust through technical credibility, understand workflows and pain points, communicate complex concepts clearly, and build strong relationships.
Automation & software engineering skills
- Proficiency in Python, Bash, or similar for building validation frameworks and automating tests at scale; comfortable with APIs, CI/CD environments, and collaborating with software engineers to productize workflows.
Platform expertise
- Experience with AMD and / or NVIDIA HW and Software stacks - EPYC CPUs, Instinct GPUs, ROCm software stack, or AMD networking technologies, and/or NVIDIA Grace CPUs, H100/B200/GB200 GPUs, CUDA/cuDNN/NCCL/TensorRT software stack and InfiniBand/NVLink networking technologies.
- Willingness to travel domestically and internationally (30-40% of your time) to deploy with customers and validate hardware in the field.
Preferred
- Work in person at Arena Physica’s NYC headquarters when not deployed.
Benefits & Perks Include
- 100% of the monthly premium for Aetna medical insurance, plus vision and dental coverage
- 401(k) Retirement Plan
- Unlimited PTO
- Lunch every day from local restaurants via Sharebite
- Relocation support provided
The base salary range for this position is $150,000 - $250,000 yr. However, base pay offered may vary depending on job-related knowledge, skills, and experience. In addition to base salary, we also offer competitive equity and benefits packages.
This position may require access to information protected under U.S. export control laws and regulations, including the Export Administration Regulations (EAR) and the International Traffic in Arms Regulations (ITAR). Please note that any offer for employment may be conditioned on authorization to receive software or technology controlled under these U.S. export control laws and regulations without sponsorship for an export license.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free