Skip to content
mimi

Lead Engineer Text to Speech

Mihup.ai

India ยท On-site Full-time Lead Yesterday

About the role

As a Lead Engineer - Text to Speech (TTS) at Mihup, your role involves leading the development of TTS systems, designing and implementing algorithms, and collaborating with cross-functional teams to improve voice synthesis technologies. You will be responsible for driving innovation, optimizing pipelines, and ensuring high-quality deliverables that align with Mihup's vision in the Voice AI domain.

Key Responsibilities: - Lead the design and development of state-of-the-art Text-to-Speech (TTS) systems - Own end-to-end TTS pipeline: data preparation, model training, evaluation, and deployment - Architect scalable, low-latency, production-grade speech synthesis solutions - Drive experimentation with modern TTS architectures (Tacotron, FastSpeech, VITS, etc.) - Improve voice naturalness, prosody, and expressiveness across use cases - Define and track quality metrics (MOS, latency, intelligibility) - Build, mentor, and manage a team of ML engineers and speech scientists - Establish best practices for model development, code quality, and experimentation - Partner with hiring teams to scale TTS capabilities - Collaborate with Product, ASR, and NLP teams to deliver end-to-end voice AI experiences - Translate business and product requirements into technical roadmaps - Work closely with stakeholders on voice customization and deployment needs - Drive innovation in areas like voice cloning, multilingual TTS, and emotion modeling - Stay updated with latest research and integrate advancements into production systems - Define data strategy including collection, annotation, and dataset curation - Optimize model performance for cost, speed, and scalability in cloud environments

Qualifications: - Bachelors / Masters / PhD in Computer Science, Machine Learning, Signal Processing, or related field - 610+ years in ML / Speech AI, with 3+ years in TTS or speech synthesis - Hands-on experience building production-grade ML systems - Proven experience leading teams or owning large-scale ML systems - Strong experience with deep learning frameworks (PyTorch / TensorFlow) - Hands-on experience with TTS architectures (Tacotron, FastSpeech, VITS, etc.) - Knowledge of DSP (Digital Signal Processing) fundamentals - Experience with model deployment, optimization, and scaling (ONNX, TensorRT, etc.) - Familiarity with cloud infrastructure (AWS / GCP / Azure) As a Lead Engineer - Text to Speech (TTS) at Mihup, your role involves leading the development of TTS systems, designing and implementing algorithms, and collaborating with cross-functional teams to improve voice synthesis technologies. You will be responsible for driving innovation, optimizing pipelines, and ensuring high-quality deliverables that align with Mihup's vision in the Voice AI domain.

Key Responsibilities: - Lead the design and development of state-of-the-art Text-to-Speech (TTS) systems - Own end-to-end TTS pipeline: data preparation, model training, evaluation, and deployment - Architect scalable, low-latency, production-grade speech synthesis solutions - Drive experimentation with modern TTS architectures (Tacotron, FastSpeech, VITS, etc.) - Improve voice naturalness, prosody, and expressiveness across use cases - Define and track quality metrics (MOS, latency, intelligibility) - Build, mentor, and manage a team of ML engineers and speech scientists - Establish best practices for model development, code quality, and experimentation - Partner with hiring teams to scale TTS capabilities - Collaborate with Product, ASR, and NLP teams to deliver end-to-end voice AI experiences - Translate business and product requirements into technical roadmaps - Work closely with stakeholders on voice customization and deployment needs - Drive innovation in areas like voice cloning, multilingual TTS, and emotion modeling - Stay updated with latest research and integrate advancements into production systems - Define data strategy including collection, annotation, and dataset curation - Optimize model performance for cost, speed, and scalability in cloud environments

Qualifications: - Bachelors / Masters / PhD in Computer Science, Machine Learning, Signal Processing, or related field - 610+ years in ML / Speech AI, with 3+ years in TTS or speech synthesis - Hands-on experience building production-grade ML systems - Proven experience leading teams or owning large-scale ML systems - Strong experience with deep learning frameworks (PyTorch / TensorFlow) - Hands-on experience with TTS architectures (Tacotron, FastSpeech, VITS, etc.) - Knowledge of DSP (Digital Signal Processing) fundamentals - Experience with model deployment, optimization, and scaling (ONNX, TensorRT, etc.) - Familiarity with cloud infrastructure (AWS / GCP / Azure)

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free