Skip to content
mimi

Senior Machine Learning Engineers

Fintal Partners

On-site Full-time Senior 3w ago

About the role

About

A market-leading high-frequency trading firm is seeking Senior Machine Learning Engineers to join a specialist performance engineering team focused on low-level optimization for large-scale AI workloads.

This role is heavily focused on GPU performance, CUDA kernel optimization, and systems-level acceleration work later in the ML pipeline. The team works on extracting maximum performance from modern hardware architectures to support highly demanding training and inference workloads.

You will work close to the metal, optimizing critical components across CUDA, C++, memory management, and GPU execution paths. The work combines deep systems engineering with cutting-edge machine learning infrastructure.

Responsibilities

  • Develop and optimize CUDA kernels for high-performance ML workloads
  • Improve GPU utilization, memory efficiency, and execution performance
  • Profile and optimize bottlenecks across training and inference pipelines
  • Work on compiler/runtime-level optimizations and kernel fusion strategies
  • Collaborate with ML systems and infrastructure teams on end-to-end acceleration
  • Build highly optimized C++ components for latency and throughput-sensitive systems

Requirements

  • Strong C++ and CUDA development experience
  • Deep understanding of GPU architecture and performance optimization
  • Experience profiling and debugging GPU workloads using tools such as Nsight
  • Knowledge of PyTorch internals, Triton, NCCL, CUTLASS, or similar frameworks
  • Strong systems programming background with focus on performance engineering
  • Experience working on high-throughput or low-latency distributed systems
  • Computer Science, Mathematics, Physics, Engineering, or related technical degree preferred

This is an opportunity to work on some of the most technically challenging AI infrastructure problems in the industry, within an environment that values engineering excellence, autonomy, and performance.

Skills

C++CUDACUTLASSMLNCCLNsightPyTorchTriton

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free