AI/ML Engineer — Clinical Intelligence
eMedicalPractice
About the role
About
MedicalPractice, based in Delray Beach, Florida, is a healthcare technology company dedicated to streamlining operations for ambulatory practices and medical management groups. Since 2008, the company has provided fully integrated and customizable solutions, including Electronic Health Records (EHR), Practice Management, Telemedicine, Billing, and Patient Engagement platforms. Trusted nationwide by healthcare professionals across various specialties, eMedicalPractice empowers organizations to enhance workflow efficiency and deliver high-quality patient care.
Role Description
This is an on-site, full-time AI/ML Engineer — Clinical Intelligence position located in Mangalagiri. The role involves developing and implementing machine learning models and algorithms focused on clinical insights, with a focus on improving patient care and healthcare operations. The engineer will contribute to exploratory data analysis, pattern recognition, and neural network design while collaborating with cross-functional teams to devise innovative data-driven solutions. Additional responsibilities include evaluating and optimizing existing algorithms, interpreting statistical data, and driving actionable insights to meet clinical and operational goals.
What You’ll Work On
- LLM fine-tuning and inference serving in production
- Real-time NLP pipelines on live audio transcription streams
- Per-user model personalization using LoRA adapters
- Evaluation, monitoring, and continuous model improvement
- HIPAA-compliant AI data pipelines
Requirements
Required
- 3+ years building and deploying LLMs in production
- Hands‑on experience with vLLM, TGI, or similar serving frameworks
- Practical LoRA / QLoRA fine‑tuning experience (PEFT)
- Strong Python — inference services and data pipelines
- GPU infrastructure — CUDA, quantization, VRAM management
- NLP — NER, entity extraction, text classification
- Linux, Docker, systemd
Preferred
- Healthcare AI or clinical NLP experience
- RAG systems — vector stores, embedding models
- Python ↔ .NET/C# integration
- OCI, AWS, or Azure GPU instances
Requirements
- 3+ years building and deploying LLMs in production
- Hands-on experience with vLLM, TGI, or similar serving frameworks
- Practical LoRA / QLoRA fine-tuning experience (PEFT)
- Strong Python — inference services and data pipelines
- GPU infrastructure — CUDA, quantization, VRAM management
- NLP — NER, entity extraction, text classification
- Linux, Docker, systemd
Responsibilities
- LLM fine-tuning and inference serving in production
- Real-time NLP pipelines on live audio transcription streams
- Per-user model personalization using LoRA adapters
- Evaluation, monitoring, and continuous model improvement
- HIPAA-compliant AI data pipelines
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free