II
Generative AI Engineer
Infojini Inc
Philadelphia · On-site Contract 1mo ago
About the role
Core Experience
- Hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environments
- Strong proficiency in Python for LLM inference, prompt engineering, and integration
- Experience with CPU-based inference, model quantization, and performance tuning
Vector Databases & RAG
- Practical experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvector
- Proven implementation of Retrieval-Augmented Generation (RAG) pipelines
- Experience in generating and managing embeddings and metadata filtering
Security & Governance
- Understanding of data privacy, air-gapped deployments, and enterprise security requirements
- Experience implementing access controls and audit logging
Nice to Have
- Experience with LangChain or LlamaIndex
- Exposure to Rust, Go, or C++ for high-performance services
- Familiarity with Docker and Kubernetes for on-prem deployments
- Knowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers)
- Prior work in regulated or enterprise environments
Deliverables
- Reference architecture and deployment guidance
- Working prototype (LLM + vector DB + RAG)
- Documentation and knowledge transfer to internal teams
Skills
C++ChromaDockerGoHugging Face TransformersKubernetesLangChainLlama 3LlamaIndexllama.cppMilvusMistralMixtralpgvectorPythonQdrantRustvLLM
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free