Senior Generative AI Engineer
Meril
About the role
Job Title: Senior Generative AI Engineer (Drafting & RAG Systems)
Role Overview
We are looking for a Senior Generative AI Engineer to lead the development and deployment of our next-generation Automated Drafting Tool . You will be responsible for the entire lifecycle of the AI features—from local prototyping using Ollama to scaling globally via OpenAI APIs .
The ideal candidate has a "Full-Stack AI" mindset: you understand how to retrieve context using RAG , manage high-dimensional data in Vector Databases , and ensure the final drafted output is coherent, accurate, and contextually aware.
Key Responsibilities
1. AI Architecture & Drafting Logic • Design and implement end-to-end Retrieval-Augmented Generation (RAG) pipelines specifically optimized for document drafting. • Develop advanced Prompt Engineering strategies to handle complex drafting constraints (tone, legal/technical compliance, and formatting). • Implement hybrid model strategies, utilizing Ollama for local development, testing, and privacy-sensitive tasks, while orchestrating OpenAI (GPT-4o/o1) for production-level reasoning. 2. Data & Vector Engineering • Build and maintain scalable Vector Databases (e.g., Pinecone, Weaviate, Milvus, or FAISS). • Optimize document ingestion pipelines: chunking strategies, embedding model selection, and metadata filtering to improve retrieval precision. • Implement "Agentic RAG" where the system can self-correct or multi-step reason through a draft. 3. Deployment & MLOps (Local to Cloud) • Bridge the gap between local ideation (running models on Ollama/Local GPUs) and cloud production environments. • Deploy AI services using containerization ( Docker/Kubernetes ) and manage API latency, rate limits, and token costs. • Establish monitoring for AI performance, including hallucination detection and "groundedness" metrics. Required Skills & Qualifications Mandatory Experience • Experience: 3+ years of professional experience in AI/Machine Learning or Backend Engineering with a heavy GenAI focus. • LLM Orchestration: Deep hands-on experience with LangChain or LlamaIndex . • Model Proficiency: Expert knowledge of the OpenAI API ecosystem and local model runners like Ollama . • Vector Expertise: Proven track record of implementing and optimizing Vector Databases and RAG workflows. • Programming: Mastery of Python (FastAPI/Flask) and asynchronous programming • JIRA + Confluence exposure is must have Technical Stack • Models: OpenAI (GPT-4), Ollama (Llama 3, Mistral, Mixtral). • Tools: LangChain, LlamaIndex, LangSmith (for tracing). • Database: Pinecone, ChromaDB, or pgvector. • Infrastructure: Docker, AWS/GCP/Azure, GitHub Actions for CI/CD. What We Look For (The "Hacker" Mindset) • Production Proven: You have moved at least one GenAI product from a Jupyter Notebook/Local Script to a live environment with real users. • Problem Solver: You know how to handle the "stochastic" nature of LLMs and can build guardrails to prevent hallucinations in drafting. • Architecture First: You care about token optimization and latency just as much as you care about the quality of the text generated.
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free