Senior Software Engineer - Machine Learning
Caffeine
About the role
About Caffeine.ai
Caffeine.ai is building the platform for self-writing apps — where natural language produces full-stack, production-ready applications deployed to the Internet Computer, an open sovereign cloud. Our mission is to make building software as simple as a conversation: ideas become live systems in minutes, with no code required.
What sets Caffeine apart is the infrastructure beneath it. While other self-writing platforms build on traditional stacks, Caffeine runs on a different foundation — one where apps are tamperproof by design, data is guaranteed safe on every update, and backend code is written in Motoko, a language built specifically for AI code generation. This is a platform built for real production software, not just prototypes.
We are a cross-functional team of engineers and researchers building the AI that powers this new paradigm.
About the Role
As a Senior Software Engineer — Machine Learning, you will own the layer between our agentic core and everything the user sees and touches. That means multi-agent orchestration, real-time streaming pipelines, and the persistence layer that holds the state of applications that were never manually written. This is not prompt engineering — it's the industrial‑grade plumbing underneath it.
What You'll Do
- Own real-time streaming infrastructure: Build and operate the SSE pipeline that delivers agentic job state from backend to client — designing for latency, reliability, and graceful failure at every step.
- Build the job orchestration layer: Coordinate multi-agent workflows end‑to‑end — dispatch, retries, state recovery, and context continuity across long‑running, non‑deterministic workloads.
- Design schemas and persistence strategies: Own the database layer for agentic work — jobs, artifacts, agent memory, and the user's evolving application state.
- Bridge agent output to product: Transform raw agent output into the structured data models the frontend and other services depend on.
- Instrument the full pipeline: Measure latency, throughput, and failure surfaces — and stay close to production behaviour across every release.
Who You Are
- Streaming systems experience: You've designed real-time streaming systems in production — SSE, event‑driven architectures, or similar — and you know where they fail under load.
- Database‑as‑design‑surface thinker: You think about databases not just as storage but as a design surface — schema decisions, consistency guarantees, and state lifecycle are things you get opinionated about.
- Agentic/LLM pipeline experience: You've worked with agentic or LLM pipelines in a backend context and understand the operational challenges of long‑running, non‑deterministic workloads.
- Product‑aware infrastructure mindset: You care about the user‑facing effect of your infrastructure choices — latency, dropped events, stale state are product problems as much as engineering ones.
- High autonomy: Ambiguity doesn't stall you — you scope the surface, make a call, and ship something you can measure.
- Small‑team energy: You're energised by small teams where your work reaches real users within days, not quarters.
Bonus Points
- Experience with TypeScript backend frameworks (Node.js, NestJS, Fastify).
- Familiarity with multi‑agent architectures or AI orchestration systems.
- Experience with event‑driven architectures and message queues.
- Knowledge of DevOps (Docker, Kubernetes, Observability).
- Interest in Web3 or sovereign cloud infrastructure.
- This is an on‑site role. We work together in person, every day — it's core to how we build. We don't offer remote or hybrid arrangements.
Requirements
- You've designed real-time streaming systems in production — SSE, event-driven architectures, or similar — and you know where they fail under load.
- You think about databases not just as storage but as a design surface — schema decisions, consistency guarantees, and state lifecycle are things you get opinionated about.
- You've worked with agentic or LLM pipelines in a backend context and understand the operational challenges of long-running, non-deterministic workloads.
- You care about the user-facing effect of your infrastructure choices — latency, dropped events, stale state are product problems as much as engineering ones.
- Ambiguity doesn't stall you — you scope the surface, make a call, and ship something you can measure.
- You're energised by small teams where your work reaches real users within days, not quarters.
Responsibilities
- Own real-time streaming infrastructure: Build and operate the SSE pipeline that delivers agentic job state from backend to client — designing for latency, reliability, and graceful failure at every step.
- Build the job orchestration layer: Coordinate multi-agent workflows end-to-end — dispatch, retries, state recovery, and context continuity across long-running, non-deterministic workloads.
- Design schemas and persistence strategies: Own the database layer for agentic work — jobs, artifacts, agent memory, and the user's evolving application state.
- Bridge agent output to product: Transform raw agent output into the structured data models the frontend and other services depend on.
- Instrument the full pipeline: Measure latency, throughput, and failure surfaces — and stay close to production behaviour across every release.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free