Senior ML Research Engineer
TwelveLabs
About the role
WHO WE ARE
We are looking for talent to build the global standard for video understanding AI together!
Twelve Labs is creating the world's best video-specific AI models that effectively process vast amounts of video data, providing video-specific search, analysis, summarization, and insight generation capabilities.
The world's largest sports leagues utilize Twelve Labs models to quickly and accurately select highlights from vast game footage, providing hyper-personalized viewing experiences. Domestic integrated control centers efficiently search CCTV footage with Twelve Labs to respond quickly to crisis situations, and major broadcasters and studios worldwide use Twelve Labs models for content production for billions of viewers.
Twelve Labs is a Deep Tech startup with offices in San Francisco and Seoul, named one of the top 100 AI startups globally by CB Insights for four consecutive years. We have secured over $110 million in funding from world-class VCs and corporations such as NVIDIA, NEA, Index Ventures, Databricks, and Snowflake, and our models are the only AI models developed in Korea to be serviced through Amazon Bedrock. We build innovative products with exceptional colleagues and grow with customers worldwide.
Twelve Labs works around the following core values:
- An attitude of honesty and reflection towards oneself and the team
- Perseverance and humility, unafraid of failure and feedback
- A mindset of continuously learning and enhancing the team's capabilities together
If you enjoy the process of growing by solving challenging problems together, the opportunity is here at Twelve Labs.
ABOUT THE TEAM
This team is responsible for the research and development of Twelve Labs' multimodal embedding model, Marengo. We research and develop models that integrate various modalities such as video, audio, and text into a single embedding space.
We cover various research topics including contrastive learning, temporal video understanding, and multimodal representation learning. We are responsible for the entire model development process, from building large-scale training data pipelines to designing model architectures, optimizing distributed training, and designing evaluation systems. We conduct large-scale experiments rapidly with access to world-class GPU resources like NVIDIA B300.
In an environment where the gap between research and production is very short, we collaborate closely with the Search, Product, and Infrastructure teams to continuously improve the quality of models used by thousands of customers worldwide.
ABOUT THE ROLE
As a Senior ML Research Engineer on the Marengo team, you will drive the research and development of TwelveLabs' multimodal embedding models, from data strategy and training pipeline optimization to model architecture experimentation and evaluation.
This is a research-heavy engineering role at the intersection of multimodal representation learning, large-scale distributed training, and data engineering. We're looking for a strong engineer-researcher who can take well-scoped research problems with moderate ambiguity, design rigorous experiments, and deliver reproducible results that ship to production.
IN THIS ROLE, YOU WILL
- Design and execute experiments to improve multimodal embedding model quality, spanning model architecture, training methodology, data composition, and evaluation
- Build and optimize large-scale distributed training pipelines (multi-node, multi-GPU) for contrastive and representation learning
- Develop and improve data curation, filtering, and quality assessment pipelines at scale
- Conduct ablation studies to systematically evaluate design choices and communicate findings to guide technical direction
- Implement evaluation frameworks and benchmarks that rigorously measure embedding model quality
- Collaborate with the search/serving team to ensure model improvements translate to end-to-end retrieval quality gains
Even if you don't check every box, we encourage you to apply.
If you're a zero-to-one achiever, a ferocious learner, and a kind team player who motivates others, you'll find a home at TwelveLabs.
YOU MAY BE A GOOD FIT IF YOU HAVE
- 4–7 years of industry experience in computer vision, NLP, or multimodal learning, with a track record of shipping ML systems to production
- Strong proficiency in Python and PyTorch, with hands-on experience in distributed model training
- Experience in contrastive learning, representation learning, or embedding models, demonstrated through shipped products, publications, or open-source contributions
- End-to-end ownership experience: taking a model from research idea through training to production deployment, not just running experiments in isolation
- Ability to independently drive research projects from problem definition through experiment design to conclusions
- Effective communication skills for collaborating with colleagues from diverse backgrounds
We evaluate based on relevant technical skills and industry impact rather than degrees alone. This role is typically a strong fit for engineers with an MS and meaningful industry experience building ML systems at scale.
PREFERRED QUALIFICATIONS
- Experience with temporal video understanding (segmentation, boundary detection, temporal grounding)
- Experience with large-scale data curation (filtering, deduplication, quality scoring) for model training
- Experience with training infrastructure optimization (mixed precision, gradient checkpointing, communication backends)
- Familiarity with experiment tracking and reproducibility tools
- Experience with petabyte-scale data processing
WHAT MAKES THIS ROLE UNIQUE
The gap between research and production is remarkably short here. Models you build will be used by thousands of companies worldwide within months. We work as a unified team toward the broader goal of video understanding, rather than solving isolated problems. Our research philosophy balances rigorous experimentation with real-world application: we aim to build multimodal systems that are powerful, trustworthy, and genuinely useful.
OTHERS
- Work Location: Seoul Itaewon office + Pangyo satellite office
- Additional Info: 전문연구요원 편입/전직 가능합니다. (Possible to enlist/transfer as a research personnel for military service)
HIRING PROCESS
Application Review → Recruiter Interview (Remote/30 min) → Loop Interview [Hiring Manager Interview & Live Coding Test Interview] (On-site/approx. 90 min) → Loop Interview [System Design & Final Round Interview] (Remote/approx. 90 min) → Reference Check → Offer
BENEFITS AND PERKS
- Global Team growing with global B2B customers
- Hybrid work with both autonomy and collaboration
- MacBook and 700,000 KRW worth of remote work equipment support for all employees, with latest equipment replacement every 3 years
- Corporate card with a monthly limit of 600,000 KRW for free use on meals, transportation, etc.
- Office snack bar (snacks, coffee, fresh food provided)
- 2-week winter break operation at the end of the year
- Annual health check-up support
- English education program support
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free