Senior Research Scientist
Ivoclar Vivadent Manufacturing GmbH
About the role
Your role in the team
At Canva, our mission is to empower the world to design. We’re building AI that feels magical and lands real impact for millions of people - helping anyone create with confidence.
We’re looking for a senior research scientist who lives and breathes reinforcement learning, agentic systems and mixture of expert models to push the frontier of reasoning, tool use, latency and reliability - and ship it to users.
You will steer research directions and take a leading role in hands-on work across the agent stack—from reward design and policy optimization to planning, memory, and tool orchestration, dataset construction, to post-training, and the development of novel post-training approaches.
You will design precise experiments, iterate rapidly, and arrive at reliable conclusions.
Most importantly, you’ll help convert research into reliable, safe, and high‑quality product experiences.
Responsibilities
- Develop agent systems (planning, multimodal tool use, retrieval, novel training approaches, modeling ablations) for real tasks in design, vision, and language.
- Scale post-training and RL across distributed systems (PyTorch) with efficient data loaders, tracing/telemetry, stable training of mixture-of-experts (MoE) architectures, and reproducible pipelines; profile, debug, and optimize.
- Contribute to the research agenda for RL/agentic systems aligned with Canva’s product goals; identify high‑leverage bets and retire dead ends quickly.
- Build reward models and learning loops: RLHF/RLAIF, preference modeling, DPO/IPO‑style objectives, offline/online RL, curriculum learning, and credit assignment for multi‑step reasoning.
- Develop simulation and sandbox tasks that surface failure modes (planning errors, tool‑use brittleness, hallucination, unsafe actions) and turn them into measurable targets.
- Help align on rigorous evaluation for agents (task success, reliability, latency, safety, regressions).
- Set up offline suites and online A/B tests; favor simple, controlled experiments that generalize.
- Collaborate and ship: work shoulder‑to‑shoulder with product, design, safety, and platform to land research as reliable features—then iterate.
- Share and elevate: mentor teammates, present findings internally, and contribute back to the community when it helps the field and our users.
What we offer
Achieving our crazy big goals motivates us to work hard - and we do - but you'll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too.
We also offer a stack of benefits to set you up for every success in and outside of work.
Here's a taste of what's on offer:
- Equity packages - we want our success to be yours too.
- Inclusive parental leave policy that supports all parents & carers.
- An annual Vibe & Thrive allowance to support your wellbeing, social connection, home office setup & more.
- Flexible leave options that empower you to be a force for good, take time to recharge and support you personally.
Technologies and skills
- Python
- PyTorch
Our expectations:
Qualifications
- Depth in implementing and post-training MoEs/LLMs/VLMs/Diffusion models, with a track record of shipped research or publications in MoEs, RL or agents.
- Fluency in Python and PyTorch; you’re comfortable in large ML codebases and can profile, debug, and optimize training and inference.
Experience
- Experience modifying and adapting open-source models.
- Starke Erfahrung im experimentellen Design: enge Baselines, saubere Ablationen, Reproduzierbarkeit und klare, datenbasierte Schlussfolgerungen.
- Practical experience building agent loops (planning, tool invocation, retrieval, memory) and evaluating multi-step reasoning quality.
- Hands-on experience with policy optimization, reward modeling, and preference learning (e.g., RLHF/RLAIF, DPO/IPO, actor-critic/PPO, offline RL).
- Experience with large‑scale training (distributed training, experiment tracking, evaluation harnesses) and cloud multimodal tooling.
- Experience with RL for MoE architectures.
Benefits
- Mental Health Care
- Fresh Fruit
- Relaxation Rooms
- Meal Vouchers
- Excellent Traffic Connections
- Tabletop Soccer, etc.
- Health Care Benefits
- Employee Stock Option
- Snacks, Sweets
- Coffee, Tea, etc.
- Flexible Working Hours
- Public Transport Allowance
- No All-In Contracts
- Bicycle Parking Space
- Company Notebook for Private Use
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free