Skip to content
mimi

Backend Engineer, Multimodal & Media

Nucleus AI

Nagpur · On-site Full-time 2w ago

About the role

At Nucleus, we’re building AI systems that can understand and work across more than text — from images and video to rich multimodal signals that make intelligence more grounded, expressive, and useful in the real world.

We’re hiring a Backend Engineer, Multimodal & Media to build the backend services and infrastructure that power video, image, and multimodal pipelines at scale. This role is for an engineer who enjoys working on high-throughput systems, media-heavy workloads, and the backend foundations that make advanced AI products reliable in production.

About the roleAs a Backend Engineer on Multimodal & Media, you will design and operate the services, APIs, and processing pipelines that support multimodal data ingestion, transformation, storage, retrieval, and serving across Nucleus products and research systems.

You’ll work closely with product, infrastructure, research, and applied AI teams to build systems that handle complex media workflows efficiently and reliably. This is a deeply technical role with broad impact: your work will help define how multimodal inputs move through Nucleus systems, how they are processed at scale, and how they are made usable for downstream models and product experiences.

What you’ll do • Design, build, and maintain backend services for video, image, audio, and other multimodal processing pipelines. • Develop APIs and service layers that support media ingestion, transformation, metadata management, storage, retrieval, and delivery. • Build scalable backend infrastructure for high-volume, latency-sensitive multimodal workloads in production. • Optimize processing pipelines for throughput, reliability, cost efficiency, and operational simplicity. • Partner with AI/ML teams to support training, inference, and evaluation workflows that depend on large-scale multimodal data systems. • Improve observability, monitoring, and debugging for distributed media and pipeline infrastructure. • Work with product and platform teams to expose multimodal capabilities through clean, dependable backend interfaces. • Help define backend best practices for media handling, reliability, security, and long-term maintainability.

What we’re looking for • Strong backend engineering experience with Python, Go, Java, or Node.js/TypeScript. • Experience building and operating production backend services, APIs, and distributed systems. • Familiarity with media or data-intensive systems involving video, images, audio, or large binary assets. • Solid understanding of backend fundamentals such as system design, asynchronous processing, queues, storage systems, caching, and failure recovery. • Experience with cloud platforms and infrastructure tooling such as AWS/GCP/Azure, Docker, CI/CD, logging, and monitoring. • Comfort working on high-throughput pipelines where performance, scalability, and cost matter. • Experience supporting AI/ML, data platform, or multimodal product workflows is a strong plus. • Strong collaboration skills and the ability to work across research, infrastructure, and product teams.

What makes Nucleus differentNucleus is building AI systems that interact with the world in richer ways — seeing, interpreting, and reasoning over complex multimodal inputs at meaningful scale. Doing that well requires backend systems that are both technically rigorous and deeply aligned with product and research needs. • In this role, you’ll help create the infrastructure that turns multimodal capability into real-world utility. Your work will shape how media moves through our systems, how models access it, and how multimodal intelligence becomes reliable enough for production use. If you’re excited by backend engineering for video, image, and multimodal systems at the frontier of AI, we’d love to hear from you.

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free