Skip to content
mimi

Multimodal AI Engineer

Studio Jadu

Zürich · On-site 1mo ago

About the role

We’re looking for an engineer to design, build, and improve the core AI workflows behind our product.

This is not a traditional ML role. The work is centered on LLMs, VLMs, and image/video/audio generation models used as part of real production workflows. We’re looking for someone who has strong hands-on experience building systems around these models, evaluating them in practice, and improving quality, reliability, and efficiency over time.

What you’ll do

  • Design and build end-to-end workflows powered by LLMs, VLMs, and multimodal generation models
  • Integrate, manage, and benchmark models for text, image, audio, and video
  • Run experiments on prompts, system prompts, model configurations, and inference pipelines
  • Build evaluation frameworks using human review, automated benchmarks, and LLM-as-judge style approaches where appropriate
  • Analyze model behavior and failure modes, and turn findings into better prompts, better routing, and better workflows
  • Develop scoring, ranking, and recommendation layers for multimodal outputs
  • Build APIs and internal tools that make these systems reusable, reproducible, and efficient

What we’re looking for

  • Proven hands-on experience building applications or internal systems where LLMs, VLMs, or generative media models were central
  • Strong understanding of how these models behave in practice, including prompting, evaluation, reliability, and cost/latency tradeoffs
  • Experience working with image, video, and/or audio generation models, including evaluating output quality and deciding what works in production
  • Strong Python skills and solid software engineering fundamentals
  • Ability to design experiments and iterate quickly based on evidence

Nice to have

  • Experience fine-tuning or training LLMs/VLMs
  • Experience with multimodal retrieval, ranking, or orchestration systems
  • Experience building human-in-the-loop workflows for creative tools

About us

We are a startup at the intersection of AI and media. We are building the next generation of tools that help creatives transform stories into production-ready video and reach their fans. Our mission is to be at the service of good stories.

Skills

LLMPythonVLM

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free