Skip to content
mimi

Research Engineer, Reinforcement Learning

techire ai

Millbrae · On-site Full-time Today

About the role

About

  • Want to build the large-scale RL environments frontier labs use to train agents that can truly reason and act?
  • This team are creating complex reinforcement learning environments - simulations where advanced agents learn to plan, adapt, and solve multi-step problems that stretch beyond standard benchmarks.
  • The focus isn't on training the models themselves, but on building the worlds that make meaningful learning and evaluation possible - the foundation for more capable, aligned systems.

Responsibilities

  • You'll work end-to-end across environment design, reward dynamics, and scalable simulation - developing the feedback loops that define what "good" looks like for intelligent behaviour.
  • It's open-ended, research-driven work where the task definition, data, and reward structure are often the hardest and most important problems to solve.
  • You'll collaborate closely with researchers tackling unsolved challenges in reinforcement learning and agent behaviour, shaping experiments, scaling infrastructure, and refining how agents learn in the loop.

Requirements

  • It suits someone with strong ML and RL experience, deep intuition for agent dynamics, and the curiosity to explore problems that don't come with clear instructions.

Location

  • On-site in San Francisco.

Compensation

  • Compensation up to $300 K base (negotiable, depending on experience) plus equity.

Application

  • If you want to help build the environments that teach the next generation of AI systems how to think, act, and adapt - we'd love to hear from you.

Requirements

  • Strong ML and RL experience
  • Deep intuition for agent dynamics
  • Curiosity to explore problems that don't come with clear instructions

Responsibilities

  • Work end-to-end across environment design, reward dynamics, and scalable simulation - developing the feedback loops that define what "good" looks like for intelligent behaviour.
  • Collaborate closely with researchers tackling unsolved challenges in reinforcement learning and agent behaviour, shaping experiments, scaling infrastructure, and refining how agents learn in the loop.

Skills

MLRL

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free