Senior ML Research Engineer

TwelveLabs

Hybrid Full-time Senior 2mo ago

About the role

WHO WE ARE

We are looking for talent to build the global standard for video understanding AI together!

Twelve Labs is creating the world's best video-specific AI models that effectively process vast amounts of video data, providing video-specific search, analysis, summarization, and insight generation capabilities.

The world's largest sports leagues utilize Twelve Labs models to quickly and accurately select highlights from vast game footage, providing hyper-personalized viewing experiences. Domestic integrated control centers efficiently search CCTV footage with Twelve Labs to respond quickly to crisis situations, and major broadcasters and studios worldwide use Twelve Labs models for content production for billions of viewers.

Twelve Labs is a Deep Tech startup with offices in San Francisco and Seoul, named one of the top 100 AI startups globally by CB Insights for four consecutive years. We have secured over $110 million in funding from world-class VCs and corporations such as NVIDIA, NEA, Index Ventures, Databricks, and Snowflake, and our models are the only AI models developed in Korea to be serviced through Amazon Bedrock. We build innovative products with exceptional colleagues and grow with customers worldwide.

Twelve Labs works around the following core values:

An attitude of honesty and reflection towards oneself and the team
Perseverance and humility, unafraid of failure and feedback
A mindset of continuously learning and enhancing the team's capabilities together

If you enjoy the process of growing by solving challenging problems together, the opportunity is here at Twelve Labs.

ABOUT THE TEAM

This team is responsible for the research and development of Twelve Labs' multimodal embedding model, Marengo. We research and develop models that integrate various modalities such as video, audio, and text into a single embedding space.

We cover various research topics including contrastive learning, temporal video understanding, and multimodal representation learning. We are responsible for the entire model development process, from building large-scale training data pipelines to designing model architectures, optimizing distributed training, and designing evaluation systems. We conduct large-scale experiments rapidly with access to world-class GPU resources like NVIDIA B300.

In an environment where the gap between research and production is very short, we collaborate closely with the Search, Product, and Infrastructure teams to continuously improve the quality of models used by thousands of customers worldwide.

ABOUT THE ROLE

As a Senior ML Research Engineer on the Marengo team, you will drive the research and development of TwelveLabs' multimodal embedding models, from data strategy and training pipeline optimization to model architecture experimentation and evaluation.

This is a research-heavy engineering role at the intersection of multimodal representation learning, large-scale distributed training, and data engineering. We're looking for a strong engineer-researcher who can take well-scoped research problems with moderate ambiguity, design rigorous experiments, and deliver reproducible results that ship to production.

IN THIS ROLE, YOU WILL

Design and execute experiments to improve multimodal embedding model quality, spanning model architecture, training methodology, data composition, and evaluation
Build and optimize large-scale distributed training pipelines (multi-node, multi-GPU) for contrastive and representation learning
Develop and improve data curation, filtering, and quality assessment pipelines at scale
Conduct ablation studies to systematically evaluate design choices and communicate findings to guide technical direction
Implement evaluation frameworks and benchmarks that rigorously measure embedding model quality
Collaborate with the search/serving team to ensure model improvements translate to end-to-end retrieval quality gains

Even if you don't check every box, we encourage you to apply.

If you're a zero-to-one achiever, a ferocious learner, and a kind team player who motivates others, you'll find a home at TwelveLabs.

YOU MAY BE A GOOD FIT IF YOU HAVE

4–7 years of industry experience in computer vision, NLP, or multimodal learning, with a track record of shipping ML systems to production
Strong proficiency in Python and PyTorch, with hands-on experience in distributed model training
Experience in contrastive learning, representation learning, or embedding models, demonstrated through shipped products, publications, or open-source contributions
End-to-end ownership experience: taking a model from research idea through training to production deployment, not just running experiments in isolation
Ability to independently drive research projects from problem definition through experiment design to conclusions
Effective communication skills for collaborating with colleagues from diverse backgrounds

We evaluate based on relevant technical skills and industry impact rather than degrees alone. This role is typically a strong fit for engineers with an MS and meaningful industry experience building ML systems at scale.

PREFERRED QUALIFICATIONS

Experience with temporal video understanding (segmentation, boundary detection, temporal grounding)
Experience with large-scale data curation (filtering, deduplication, quality scoring) for model training
Experience with training infrastructure optimization (mixed precision, gradient checkpointing, communication backends)
Familiarity with experiment tracking and reproducibility tools
Experience with petabyte-scale data processing

WHAT MAKES THIS ROLE UNIQUE

The gap between research and production is remarkably short here. Models you build will be used by thousands of companies worldwide within months. We work as a unified team toward the broader goal of video understanding, rather than solving isolated problems. Our research philosophy balances rigorous experimentation with real-world application: we aim to build multimodal systems that are powerful, trustworthy, and genuinely useful.

OTHERS

Work Location: Seoul Itaewon office + Pangyo satellite office
Additional Info: 전문연구요원 편입/전직 가능합니다. (Possible to enlist/transfer as a research personnel for military service)

HIRING PROCESS

Application Review → Recruiter Interview (Remote/30 min) → Loop Interview [Hiring Manager Interview & Live Coding Test Interview] (On-site/approx. 90 min) → Loop Interview [System Design & Final Round Interview] (Remote/approx. 90 min) → Reference Check → Offer

BENEFITS AND PERKS

Global Team growing with global B2B customers
Hybrid work with both autonomy and collaboration
MacBook and 700,000 KRW worth of remote work equipment support for all employees, with latest equipment replacement every 3 years
Corporate card with a monthly limit of 600,000 KRW for free use on meals, transportation, etc.
Office snack bar (snacks, coffee, fresh food provided)
2-week winter break operation at the end of the year
Annual health check-up support
English education program support

Skills

AWS BedrockComputer VisionDockerNLTKNLPNVIDIA B300NVIDIA GPUOpen-sourcePythonPyTorchSearchSnowflakeSQLSystem DesignTemporal Video UnderstandingTextTransformerVideoVPC

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free