Skip to content
mimi

senior Datascience & AI engineer

S2Integrators

Hyderabad · On-site Full-time Senior Today

About the role

Responsibilities (technical)

  • Build robust analytics code in Python using pandas/numpy to compute, validate, and reconcile KPIs (costing, margins, QBR metrics, operational metrics).
  • Write efficient transformations (vectorization, memory optimization), and implement repeatable pipelines with tests and data validation.
  • Develop SQL to extract/shape datasets from enterprise sources and/or a cloud data warehouse; optimize queries as needed.
  • Implement a governed GenAI “ask the data” prototype:
    • Use Llama-family models via Ollama (or llama.cpp/vLLM as needed)
    • Build RAG over structured + semi-structured data (chunking, embeddings, retrieval, reranking)
    • Produce structured outputs (tables/JSON) and drill‑down‑ready answers
    • Add basic guardrails: grounded responses, citations/traceback to data, and safe handling of sensitive fields.
  • Apply light‑to‑moderate ML where useful:
    • anomaly detection (cost variances, outliers, feed failures)
    • simple forecasting / trend analysis for key metrics
    • model evaluation and error analysis
  • Create reproducible experimentation and evaluation:
    • test question sets for the LLM
    • accuracy/groundedness checks
    • latency profiling and performance tuning
  • Package deliverables for deployment (Docker, config management), and produce technical documentation/runbooks.

Required skills & experience

  • 7+ years hands‑on in data science / analytics engineering / ML engineering (individual contributor).
  • Expert in Python, especially:
    • pandas, numpy
    • data cleaning, joins/merges, windowed calculations, time‑series handling
    • performance optimization (vectorization, profiling, memory management)
  • Strong SQL (complex joins, aggregates, window functions; tuning mindset).
  • Solid fundamentals in statistics and ML:
    • feature engineering basics, evaluation metrics, overfitting awareness
    • scikit‑learn (or equivalent) for quick modeling
  • GenAI implementation experience:
    • Llama models (or comparable open LLMs)
    • Ollama for local inference (or similar)
    • RAG frameworks (LangChain/LlamaIndex) or custom retrieval pipelines
    • embeddings + vector stores (FAISS/pgvector/Weaviate/Pinecone)
  • Good engineering habits:
    • unit tests, data tests, logging, error handling
    • Git, CI basics
    • Docker and environment management

Nice-to-have

  • Snowflake experience (or similar modern cloud data platform).
  • dbt experience (modeling, tests, docs).
  • Experience with enterprise “semantic layers” or metric definitions at scale.
  • Experience building lightweight APIs (FastAPI) for analytics/LLM endpoints.
  • Familiarity with security constraints (RBAC concepts, masking, audit logs).

Tools / Stack (typical)

Python, pandas, numpy, SQL, scikit‑learn, Jupyter, Git, Docker, FastAPI (optional), LangChain/LlamaIndex (optional), Ollama, Llama models, vector DB (FAISS/pgvector/Weaviate), cloud data warehouse (Snowflake or equivalent).

Requirements

  • 7+ years hands-on in data science / analytics engineering / ML engineering (individual contributor).
  • Expert in Python, especially: pandas, numpy
  • data cleaning, joins/merges, windowed calculations, time-series handling
  • performance optimization (vectorization, profiling, memory management)
  • Strong SQL (complex joins, aggregates, window functions; tuning mindset).
  • Solid fundamentals in statistics and ML: feature engineering basics, evaluation metrics, overfitting awareness
  • scikit-learn (or equivalent) for quick modeling
  • GenAI implementation experience: Llama models (or comparable open LLMs)
  • Ollama for local inference (or similar)
  • RAG frameworks (LangChain/LlamaIndex) or custom retrieval pipelines
  • embeddings + vector stores (FAISS/pgvector/Weaviate/Pinecone)
  • Good engineering habits: unit tests, data tests, logging, error handling
  • Git, CI basics
  • Docker and environment management

Responsibilities

  • Build robust analytics code in Python using pandas/numpy to compute, validate, and reconcile KPIs (costing, margins, QBR metrics, operational metrics).
  • Write efficient transformations (vectorization, memory optimization), and implement repeatable pipelines with tests and data validation.
  • Develop SQL to extract/shape datasets from enterprise sources and/or a cloud data warehouse; optimize queries as needed.
  • Implement a governed GenAI “ask the data” prototype: Use Llama-family models via Ollama (or llama.cpp/vLLM as needed)
  • Build RAG over structured + semi-structured data (chunking, embeddings, retrieval, reranking)
  • Produce structured outputs (tables/JSON) and drill-down-ready answers
  • Add basic guardrails: grounded responses, citations/traceback to data, and safe handling of sensitive fields.
  • Apply light-to-moderate ML where useful: anomaly detection (cost variances, outliers, feed failures)
  • simple forecasting / trend analysis for key metrics
  • model evaluation and error analysis
  • Create reproducible experimentation and evaluation: test question sets for the LLM
  • accuracy/groundedness checks
  • latency profiling and performance tuning
  • Package deliverables for deployment (Docker, config management), and produce technical documentation/runbooks.

Skills

DockerFastAPIFAISSGitGenAIJupyterLangChainLlamaLlamaIndexLLMMLOllamaPandasPineconePythonRAGSQLScikit-learnSnowflakeWeaviate

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free