Skip to content
mimi

Senior Software Engineer - Machine Learning

Caffeine

On-site Senior 3w ago

About the role

About Caffeine.ai

Caffeine.ai is building the platform for self-writing apps — where natural language produces full-stack, production-ready applications deployed to the Internet Computer, an open sovereign cloud. Our mission is to make building software as simple as a conversation: ideas become live systems in minutes, with no code required.

What sets Caffeine apart is the infrastructure beneath it. While other self-writing platforms build on traditional stacks, Caffeine runs on a different foundation — one where apps are tamperproof by design, data is guaranteed safe on every update, and backend code is written in Motoko, a language built specifically for AI code generation. This is a platform built for real production software, not just prototypes.

We are a cross-functional team of engineers and researchers building the AI that powers this new paradigm.

About the Role

As a Senior Software Engineer — Machine Learning, you will own the layer between our agentic core and everything the user sees and touches. That means multi-agent orchestration, real-time streaming pipelines, and the persistence layer that holds the state of applications that were never manually written. This is not prompt engineering — it's the industrial‑grade plumbing underneath it.

What You'll Do

  • Own real-time streaming infrastructure: Build and operate the SSE pipeline that delivers agentic job state from backend to client — designing for latency, reliability, and graceful failure at every step.
  • Build the job orchestration layer: Coordinate multi-agent workflows end‑to‑end — dispatch, retries, state recovery, and context continuity across long‑running, non‑deterministic workloads.
  • Design schemas and persistence strategies: Own the database layer for agentic work — jobs, artifacts, agent memory, and the user's evolving application state.
  • Bridge agent output to product: Transform raw agent output into the structured data models the frontend and other services depend on.
  • Instrument the full pipeline: Measure latency, throughput, and failure surfaces — and stay close to production behaviour across every release.

Who You Are

  • Streaming systems experience: You've designed real-time streaming systems in production — SSE, event‑driven architectures, or similar — and you know where they fail under load.
  • Database‑as‑design‑surface thinker: You think about databases not just as storage but as a design surface — schema decisions, consistency guarantees, and state lifecycle are things you get opinionated about.
  • Agentic/LLM pipeline experience: You've worked with agentic or LLM pipelines in a backend context and understand the operational challenges of long‑running, non‑deterministic workloads.
  • Product‑aware infrastructure mindset: You care about the user‑facing effect of your infrastructure choices — latency, dropped events, stale state are product problems as much as engineering ones.
  • High autonomy: Ambiguity doesn't stall you — you scope the surface, make a call, and ship something you can measure.
  • Small‑team energy: You're energised by small teams where your work reaches real users within days, not quarters.

Bonus Points

  • Experience with TypeScript backend frameworks (Node.js, NestJS, Fastify).
  • Familiarity with multi‑agent architectures or AI orchestration systems.
  • Experience with event‑driven architectures and message queues.
  • Knowledge of DevOps (Docker, Kubernetes, Observability).
  • Interest in Web3 or sovereign cloud infrastructure.
  • This is an on‑site role. We work together in person, every day — it's core to how we build. We don't offer remote or hybrid arrangements.

Requirements

  • You've designed real-time streaming systems in production — SSE, event-driven architectures, or similar — and you know where they fail under load.
  • You think about databases not just as storage but as a design surface — schema decisions, consistency guarantees, and state lifecycle are things you get opinionated about.
  • You've worked with agentic or LLM pipelines in a backend context and understand the operational challenges of long-running, non-deterministic workloads.
  • You care about the user-facing effect of your infrastructure choices — latency, dropped events, stale state are product problems as much as engineering ones.
  • Ambiguity doesn't stall you — you scope the surface, make a call, and ship something you can measure.
  • You're energised by small teams where your work reaches real users within days, not quarters.

Responsibilities

  • Own real-time streaming infrastructure: Build and operate the SSE pipeline that delivers agentic job state from backend to client — designing for latency, reliability, and graceful failure at every step.
  • Build the job orchestration layer: Coordinate multi-agent workflows end-to-end — dispatch, retries, state recovery, and context continuity across long-running, non-deterministic workloads.
  • Design schemas and persistence strategies: Own the database layer for agentic work — jobs, artifacts, agent memory, and the user's evolving application state.
  • Bridge agent output to product: Transform raw agent output into the structured data models the frontend and other services depend on.
  • Instrument the full pipeline: Measure latency, throughput, and failure surfaces — and stay close to production behaviour across every release.

Skills

DockerKubernetesLLMMotokoNode.jsNestJSFastifySSETypeScriptWeb3

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free