AI Prompt Engineer/ ML Engineer

ShineBask Technologies LLC

San Francisco · On-site Full-time Mid Level 2mo ago

About the role

About

We're hiring an AI Prompt & Agent Developer to own behavioral slice(s) of our voice agents. That behavior splits into two categories: behavior shared across every deployment, and behavior specific to a subset of deployments. You'll write prompts, design subagent architectures, build evals, and push automation rates up one small, measurable win at a time.

Responsibilities

Write and maintain the prompts that run in production. Intent classification, information extraction, availability negotiation, closing phrases, insurance verification flows, objection handling, edge-case recovery. You own behavior that touches every customer call.
Ship iteratively against real call data. Every morning, you'll listen to failed calls from yesterday. Every afternoon, you'll deploy a fix. You’ll be using and helping to develop dashboards, call review tooling, and automated agents to accelerate the work.
Build evaluation harnesses. You'll develop offline eval sets, run automated prompt optimization (we use GEPA-style approaches), and establish the test suites that let us ship changes without breaking live deployments.
Human-in-the-loop onboarding. New customers come online constantly. You'll work with and iterate on our internal AI agents that translate a practice's intake form, their scheduling rules, and their quirks into an agent configuration. Every week, you'll be designing new evaluation metrics for these customers and helping to improve existing ones.
QA and continuous improvement. You'll simulate real-world customer scenarios, measure outcomes, and monitor production agent performance so you can catch drift early and fix it fast.

What we're looking for

You've shipped prompts that broke production. Doesn't matter if it was at OpenAI, a chatbot startup, a research lab, or your own project. What matters is that you've felt the specific pain of a prompt that worked beautifully in dev and broke the second it hit real users.
You're meticulous and careful. Looking at data for long stretches energizes you, as long as there's a signal. You stay organized when five things are in flight. We deploy multiple times a day, and we also run healthcare workflows where a bad change costs real money for real practices. You know the difference between moving fast and breaking things.
Writing sensibility. The best prompt engineers are good writers. You notice register, rhythm, and word choice. You can tell why "Hello, cornerside dental? This is Ava, how can I help you out today? sounds warmer than "Hello, Cornerside Dental, this is Ava. How can I help you out today" out of a TTS.
Analytical and empirical. You are relentlessly data-driven. Before you make changes, you proactively run experiments and measure. You don't ship because "I think this is better." You justify a change with "this moved booking rate from 78.2% to 81.4% on n=412 calls."
Comfort with code. You don't need to be a senior engineer, but you should read Python fluently and TypeScript comfortably, and you can get almost any coding task done by pairing with modern AI coding tools.

Requirements

2+ years of experience with AI/ML, NLP, or prompt engineering in production
Strong analytical and problem-solving mindset; comfort with ambiguity
Excellent written and verbal communication skills
Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field

Preferred Qualifications

Python chops beyond reading: APIs, data pipelines, testing frameworks
Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.)
Contact center, SaaS, or customer-facing tech background
Healthcare or medical operations experience — you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling
Automated prompt optimization experience (DSPy, GEPA, MIPROv2)
Fine-tuning experience

Additional Information

Full Time Role & On-Site Interview
Locations: San Francisco (222 Columbus Ave, San Francisco, CA 94133)
In-office minimum 5 days a week
Relocation assistance: Yes
Only US Citizens or Green Card holders
Expect 60+ hours/week for now

Interview Process

Stage 1: 30min coding with founder - 30% pass rate
Stage 2: 45min deeper coding with Founder/CTO - 50% pass rate
Final: 2.5hr onsite mini project with Founder/CTO
End-to-end system design, any tools allowed.

Skills

PythonTypeScriptAIML

Similar roles

backend developer

skoobe

Software Architect

L-Acoustics

AOSP Solution Architect (m/w/d)

KTM AG

From €52k/yr

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free