Skip to content
mimi

Generative AI Engineer

Infojini Inc

Philadelphia · On-site Contract 1mo ago

About the role

Core Experience

  • Hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environments
  • Strong proficiency in Python for LLM inference, prompt engineering, and integration
  • Experience with CPU-based inference, model quantization, and performance tuning

Vector Databases & RAG

  • Practical experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvector
  • Proven implementation of Retrieval-Augmented Generation (RAG) pipelines
  • Experience in generating and managing embeddings and metadata filtering

Security & Governance

  • Understanding of data privacy, air-gapped deployments, and enterprise security requirements
  • Experience implementing access controls and audit logging

Nice to Have

  • Experience with LangChain or LlamaIndex
  • Exposure to Rust, Go, or C++ for high-performance services
  • Familiarity with Docker and Kubernetes for on-prem deployments
  • Knowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers)
  • Prior work in regulated or enterprise environments

Deliverables

  • Reference architecture and deployment guidance
  • Working prototype (LLM + vector DB + RAG)
  • Documentation and knowledge transfer to internal teams

Skills

C++ChromaDockerGoHugging Face TransformersKubernetesLangChainLlama 3LlamaIndexllama.cppMilvusMistralMixtralpgvectorPythonQdrantRustvLLM

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free