ST
AI Prompt Engineer/ ML Engineer
ShineBask Technologies LLC
San Francisco · On-site Full-time Mid Level 2w ago
About the role
About
We're hiring an AI Prompt & Agent Developer to own behavioral slice(s) of our voice agents. That behavior splits into two categories: behavior shared across every deployment, and behavior specific to a subset of deployments. You'll write prompts, design subagent architectures, build evals, and push automation rates up one small, measurable win at a time.
Responsibilities
- Write and maintain the prompts that run in production. Intent classification, information extraction, availability negotiation, closing phrases, insurance verification flows, objection handling, edge-case recovery. You own behavior that touches every customer call.
- Ship iteratively against real call data. Every morning, you'll listen to failed calls from yesterday. Every afternoon, you'll deploy a fix. You’ll be using and helping to develop dashboards, call review tooling, and automated agents to accelerate the work.
- Build evaluation harnesses. You'll develop offline eval sets, run automated prompt optimization (we use GEPA-style approaches), and establish the test suites that let us ship changes without breaking live deployments.
- Human-in-the-loop onboarding. New customers come online constantly. You'll work with and iterate on our internal AI agents that translate a practice's intake form, their scheduling rules, and their quirks into an agent configuration. Every week, you'll be designing new evaluation metrics for these customers and helping to improve existing ones.
- QA and continuous improvement. You'll simulate real-world customer scenarios, measure outcomes, and monitor production agent performance so you can catch drift early and fix it fast.
What we're looking for
- You've shipped prompts that broke production. Doesn't matter if it was at OpenAI, a chatbot startup, a research lab, or your own project. What matters is that you've felt the specific pain of a prompt that worked beautifully in dev and broke the second it hit real users.
- You're meticulous and careful. Looking at data for long stretches energizes you, as long as there's a signal. You stay organized when five things are in flight. We deploy multiple times a day, and we also run healthcare workflows where a bad change costs real money for real practices. You know the difference between moving fast and breaking things.
- Writing sensibility. The best prompt engineers are good writers. You notice register, rhythm, and word choice. You can tell why "Hello, cornerside dental? This is Ava, how can I help you out today? sounds warmer than "Hello, Cornerside Dental, this is Ava. How can I help you out today" out of a TTS.
- Analytical and empirical. You are relentlessly data-driven. Before you make changes, you proactively run experiments and measure. You don't ship because "I think this is better." You justify a change with "this moved booking rate from 78.2% to 81.4% on n=412 calls."
- Comfort with code. You don't need to be a senior engineer, but you should read Python fluently and TypeScript comfortably, and you can get almost any coding task done by pairing with modern AI coding tools.
Requirements
- 2+ years of experience with AI/ML, NLP, or prompt engineering in production
- Strong analytical and problem-solving mindset; comfort with ambiguity
- Excellent written and verbal communication skills
- Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field
Preferred Qualifications
- Python chops beyond reading: APIs, data pipelines, testing frameworks
- Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.)
- Contact center, SaaS, or customer-facing tech background
- Healthcare or medical operations experience — you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling
- Automated prompt optimization experience (DSPy, GEPA, MIPROv2)
- Fine-tuning experience
Additional Information
- Full Time Role & On-Site Interview
- Locations: San Francisco (222 Columbus Ave, San Francisco, CA 94133)
- In-office minimum 5 days a week
- Relocation assistance: Yes
- Only US Citizens or Green Card holders
- Expect 60+ hours/week for now
Interview Process
- Stage 1: 30min coding with founder - 30% pass rate
- Stage 2: 45min deeper coding with Founder/CTO - 50% pass rate
- Final: 2.5hr onsite mini project with Founder/CTO
- End-to-end system design, any tools allowed.
Skills
PythonTypeScriptAIML
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free