Skip to content
mimi

AI Prompt Engineer/ ML Engineer

ShineBask Technologies LLC

San Francisco · On-site Full-time Mid Level 2w ago

About the role

About

We're hiring an AI Prompt & Agent Developer to own behavioral slice(s) of our voice agents. That behavior splits into two categories: behavior shared across every deployment, and behavior specific to a subset of deployments. You'll write prompts, design subagent architectures, build evals, and push automation rates up one small, measurable win at a time.

Responsibilities

  • Write and maintain the prompts that run in production. Intent classification, information extraction, availability negotiation, closing phrases, insurance verification flows, objection handling, edge-case recovery. You own behavior that touches every customer call.
  • Ship iteratively against real call data. Every morning, you'll listen to failed calls from yesterday. Every afternoon, you'll deploy a fix. You’ll be using and helping to develop dashboards, call review tooling, and automated agents to accelerate the work.
  • Build evaluation harnesses. You'll develop offline eval sets, run automated prompt optimization (we use GEPA-style approaches), and establish the test suites that let us ship changes without breaking live deployments.
  • Human-in-the-loop onboarding. New customers come online constantly. You'll work with and iterate on our internal AI agents that translate a practice's intake form, their scheduling rules, and their quirks into an agent configuration. Every week, you'll be designing new evaluation metrics for these customers and helping to improve existing ones.
  • QA and continuous improvement. You'll simulate real-world customer scenarios, measure outcomes, and monitor production agent performance so you can catch drift early and fix it fast.

What we're looking for

  • You've shipped prompts that broke production. Doesn't matter if it was at OpenAI, a chatbot startup, a research lab, or your own project. What matters is that you've felt the specific pain of a prompt that worked beautifully in dev and broke the second it hit real users.
  • You're meticulous and careful. Looking at data for long stretches energizes you, as long as there's a signal. You stay organized when five things are in flight. We deploy multiple times a day, and we also run healthcare workflows where a bad change costs real money for real practices. You know the difference between moving fast and breaking things.
  • Writing sensibility. The best prompt engineers are good writers. You notice register, rhythm, and word choice. You can tell why "Hello, cornerside dental? This is Ava, how can I help you out today? sounds warmer than "Hello, Cornerside Dental, this is Ava. How can I help you out today" out of a TTS.
  • Analytical and empirical. You are relentlessly data-driven. Before you make changes, you proactively run experiments and measure. You don't ship because "I think this is better." You justify a change with "this moved booking rate from 78.2% to 81.4% on n=412 calls."
  • Comfort with code. You don't need to be a senior engineer, but you should read Python fluently and TypeScript comfortably, and you can get almost any coding task done by pairing with modern AI coding tools.

Requirements

  • 2+ years of experience with AI/ML, NLP, or prompt engineering in production
  • Strong analytical and problem-solving mindset; comfort with ambiguity
  • Excellent written and verbal communication skills
  • Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field

Preferred Qualifications

  • Python chops beyond reading: APIs, data pipelines, testing frameworks
  • Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.)
  • Contact center, SaaS, or customer-facing tech background
  • Healthcare or medical operations experience — you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling
  • Automated prompt optimization experience (DSPy, GEPA, MIPROv2)
  • Fine-tuning experience

Additional Information

  • Full Time Role & On-Site Interview
  • Locations: San Francisco (222 Columbus Ave, San Francisco, CA 94133)
  • In-office minimum 5 days a week
  • Relocation assistance: Yes
  • Only US Citizens or Green Card holders
  • Expect 60+ hours/week for now

Interview Process

  • Stage 1: 30min coding with founder - 30% pass rate
  • Stage 2: 45min deeper coding with Founder/CTO - 50% pass rate
  • Final: 2.5hr onsite mini project with Founder/CTO
  • End-to-end system design, any tools allowed.

Skills

PythonTypeScriptAIML

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free