Sr. Applied ML Specialist, Research Eng.

Vector Institute

Toronto · Hybrid Full-time Senior $126k – $157k/yr Yesterday

About the role

Senior Applied Machine Learning Specialist, Research Engineering

POSITION SUMMARY

As a Senior Applied Machine Learning Specialist, Research Engineering, you will build and scale the tools, infrastructure, and systems that accelerate applied ML research at Vector and across its partner ecosystem. Working closely with applied ML scientists and researchers, you will implement research ideas in code, extend them to broader datasets, model families, and compute regimes, and develop the research engineering foundations that turn research prototypes into reproducible, scalable capabilities.

KEY RESPONSIBILITIES

Design, build, and maintain scalable ML research infrastructure, including training pipelines, experiment orchestration, evaluation harnesses, and data processing systems, that enable researchers to iterate faster across models, datasets, and compute configurations;
Implement and extend ML research prototypes from papers and internal work, taking them from proof-of-concept to robust, reproducible systems capable of scaling across hardware and data regimes;
Develop internal tooling and libraries that reduce friction in the research lifecycle, covering data ingestion and preprocessing, model training and fine-tuning, benchmarking, and results tracking;
Scale applied research efforts by engineering efficient pipelines for multi-dataset, multi-model, and distributed compute workloads, optimizing for both researcher productivity and resource efficiency;
Build and ship open-source research software, reference implementations, and model toolkits following engineering best practices (testing, versioning, documentation, CI/CD);
Collaborate with Applied ML Scientists and researchers to translate research requirements into concrete AI engineering specifications, ensuring systems are designed for extensibility as research directions evolve;
Take ownership of complex, high-effort research engineering initiatives, defining system architecture, leading implementation, and driving delivery end-to-end for large-scale projects that require significant engineering depth and coordination across research and engineering teams;
Communicate engineering progress, system design decisions, and tooling capabilities through technical documentation, demos, and internal presentations; and,
Other related duties as assigned from time to time.

KEY SUCCESS MEASURES

Measurable improvement in research throughput, i.e. researchers running more experiments, across more models and datasets, with less engineering overhead;
Delivery of reliable, well-documented research tooling and infrastructure that becomes a shared foundation for applied research efforts;
Successful scaling of research prototypes to broader compute, data, and model configurations with reproducible results; and,
Active contribution to research engineering culture through code quality, documentation standards, and knowledge-sharing with the broader team.

PROFILE OF THE IDEAL CANDIDATE

Bachelor's degree in computer science, mathematics, electrical engineering, or a related discipline; MSc/MEng preferred, particularly in a machine learning or systems-adjacent field;
Minimum of four years of experience in research engineering, ML infrastructure, or applied ML, with a track record of building systems that directly accelerate research or experimentation workflows;
Demonstrated experience as a technical lead on research engineering or applied ML projects, including owning system architecture, tooling decisions, and delivery from prototype to scalable implementation;
Experience mentoring or leading a team of engineers or researchers is an asset;
Strong proficiency in Python, with emphasis on writing clean, well-tested, and reusable research code;
Hands‑on experience building and maintaining ML training and evaluation pipelines, including handling large‑scale, heterogeneous, and real‑world datasets;
Deep familiarity with leading ML frameworks such as PyTorch, HuggingFace Transformers, JAX, and experience with CUDA or low‑level GPU optimization is a strong asset;
Strong command of the ML tooling ecosystem, spanning experiment tracking (e.g., MLflow, W&B), model evaluation and benchmarking, dataset versioning, and model registries;
Experience with distributed training, multi‑GPU/multi‑node compute orchestration, and cloud‑native infrastructure including Kubernetes, Docker, and managed cloud services (GCP/AWS/Azure); familiarity with job schedulers (e.g., SLURM) is an asset;
Familiar with the full ML research lifecycle, from problem formulation and data curation through training, evaluation, scaling, and reproducibility;
Experience contributing to or maintaining open‑source ML libraries, research codebases, or shared internal tooling is strongly preferred; and,
Strong written and verbal communication skills, with the ability to translate research requirements into engineering specifications and document systems clearly for both technical and non‑technical audiences.

TOTAL REWARDS

The expected salary for this position will be $125,800 - $157,300 per year, plus benefits if applicable. The final salary offer will reflect the successful candidate's experience, skills, and qualifications, in alignment with the Vector Institute's Compensation Policy and may differ from above.

The Vector Institute’s Total Rewards approach extends beyond traditional compensation and benefits. Full‑time employees are eligible for a comprehensive suite of supports that recognize and value employees, including vacation time, floater days, GRRSP, a Health Spending Account, a Summer Hours program, and flexible work arrangements.

POSITION STATUS

This posting is for an existing vacancy.

INCLUSION AND EQUAL OPPORTUNITY EMPLOYMENT

Vector believes AI powers possibility by advancing cutting‑edge research and translating it into real‑world impact through collaboration with research, industry, and government. Vector is committed to fostering a diverse and inclusive culture that reflects its values.

The Vector Institute welcomes applications from all qualified candidates, including those who are Indigenous, 2SLGBTQIA+, racialized persons/visible minorities, women, and people with disabilities.

If you require an accommodation at any stage of the recruitment or selection process, please contact hr@vectorinstitute.ai. The Vector Institute team will be happy to work with you to ensure your experience is as inclusive and accessible as possible.

JOIN OUR COMMUNITY

Check out the Vector Institute’s Careers Page to explore open opportunities at Vector and Follow Vector on X, LinkedIn, and Bluesky to stay connected with the latest developments in Ontario's AI ecosystem and the Vector Institute.

Requirements

Demonstrated experience as a technical lead on research engineering or applied ML projects, including owning system architecture, tooling decisions, and delivery from prototype to scalable implementation
Experience mentoring or leading a team of engineers or researchers is an asset
Strong proficiency in Python, with emphasis on writing clean, well-tested, and reusable research code
Hands-on experience building and maintaining ML training and evaluation pipelines, including handling large-scale, heterogeneous, and real-world datasets
Deep familiarity with leading ML frameworks such as PyTorch, HuggingFace Transformers, JAX, and experience with CUDA or low-level GPU optimization is a strong asset
Strong command of the ML tooling ecosystem, spanning experiment tracking (e.g., MLflow, W&B), model evaluation and benchmarking, dataset versioning, and model registries
Experience with distributed training, multi-GPU/multi-node compute orchestration, and cloud-native infrastructure including Kubernetes, Docker, and managed cloud services (GCP/AWS/Azure); familiarity with job schedulers (e.g., SLURM) is an asset
Familiar with the full ML research lifecycle, from problem formulation and data curation through training, evaluation, scaling, and reproducibility
Experience contributing to or maintaining open-source ML libraries, research codebases, or shared internal tooling is strongly preferred
Strong written and verbal communication skills, with the ability to translate research requirements into engineering specifications and document systems clearly for both technical and non-technical audiences

Responsibilities

Design, build, and maintain scalable ML research infrastructure, including training pipelines, experiment orchestration, evaluation harnesses, and data processing systems, that enable researchers to iterate faster across models, datasets, and compute configurations
Implement and extend ML research prototypes from papers and internal work, taking them from proof-of-concept to robust, reproducible systems capable of scaling across hardware and data regimes
Develop internal tooling and libraries that reduce friction in the research lifecycle, covering data ingestion and preprocessing, model training and fine-tuning, benchmarking, and results tracking
Scale applied research efforts by engineering efficient pipelines for multi-dataset, multi-model, and distributed compute workloads, optimizing for both researcher productivity and resource efficiency
Build and ship open-source research software, reference implementations, and model toolkits following engineering best practices (testing, versioning, documentation, CI/CD)
Collaborate with Applied ML Scientists and researchers to translate research requirements into concrete AI engineering specifications, ensuring systems are designed for extensibility as research directions evolve
Take ownership of complex, high-effort research engineering initiatives, defining system architecture, leading implementation, and driving delivery end-to-end for large-scale projects that require significant engineering depth and coordination across research and engineering teams
Communicate engineering progress, system design decisions, and tooling capabilities through technical documentation, demos, and internal presentations

Benefits

vacation timefloater daysGRRSPHealth Spending AccountSummer Hours programflexible work arrangements

Skills

AWSAzureCUDADockerGCPHuggingFace TransformersJAXKubernetesMLflowPyTorchPythonSLURMW&B

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free