Senior Machine Learning Performance Architect

Fintal Partners

New York · On-site Full-time Senior 2mo ago

About the role

About

A market-leading high-frequency trading firm is building a next-generation AI infrastructure platform to support large-scale model training and ultra-low latency inference workloads. They are seeking a Senior Machine Learning Performance Architect to operate at the intersection of ML research, systems engineering, and cutting-edge hardware.

This role is designed for engineers who bridge the gap between research teams developing state-of-the-art models and the hardware/platform teams responsible for maximizing performance across GPU infrastructure. The focus is on end-to-end optimization of ML workloads across compute, networking, memory, and distributed systems layers.

You will work closely with researchers to understand workload characteristics and partner with hardware and infrastructure engineers to ensure models are fully optimized for modern accelerator architectures. The work spans profiling, benchmarking, systems tuning, distributed training performance, and hardware-aware optimization.

Responsibilities

Optimize large-scale training and inference workloads across GPU clusters
Partner with ML researchers to improve model efficiency and hardware utilization
Profile and analyze bottlenecks across compute, memory, networking, and storage layers
Drive performance improvements across distributed training systems and inference pipelines
Work closely with hardware teams on accelerator performance, topology optimization, and scaling efficiency
Build tooling and benchmarks to evaluate system-level ML performance
Improve throughput, latency, reliability, and cluster efficiency for production AI workloads
Contribute to low-level optimization work across CUDA, NCCL, PyTorch, and distributed systems infrastructure

Requirements

Strong background in machine learning systems and performance engineering
Deep understanding of GPU architecture, distributed systems, and hardware-aware optimization
Experience with CUDA, PyTorch, NCCL, Triton, or similar ML infrastructure technologies
Strong systems programming skills in Python and/or C++
Experience profiling large-scale ML workloads and optimizing GPU utilization
Understanding of networking technologies such as InfiniBand, RDMA, or high-performance interconnects
Experience working closely with research teams on productionizing and scaling models
Computer Science, Engineering, Physics, Mathematics, or related technical degree preferred

The environment is highly technical, collaborative, and performance-driven, offering the opportunity to work on some of the most advanced AI infrastructure challenges in industry alongside leading researchers and engineers.

Skills

C++CUDAInfiniBandMachine LearningNCCLNetworkingNvidia TritonPerformance EngineeringPythonPyTorchRDMA

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Senior Machine Learning Performance Architect

About the role

About

Responsibilities

Requirements

Skills

Similar roles

Software Architect

AOSP Solution Architect (m/w/d)

Senior Android Platform Developer

Don't send a generic resume