All jobs · Machine Learning Engineer jobs

senior Datascience & AI engineer

S2Integrators

Hyderabad · On-site Full-time Senior Today

Apply with a tailored resume Save job

About the role

Responsibilities (technical)

Build robust analytics code in Python using pandas/numpy to compute, validate, and reconcile KPIs (costing, margins, QBR metrics, operational metrics).
Write efficient transformations (vectorization, memory optimization), and implement repeatable pipelines with tests and data validation.
Develop SQL to extract/shape datasets from enterprise sources and/or a cloud data warehouse; optimize queries as needed.
Implement a governed GenAI “ask the data” prototype:
- Use Llama-family models via Ollama (or llama.cpp/vLLM as needed)
- Build RAG over structured + semi-structured data (chunking, embeddings, retrieval, reranking)
- Produce structured outputs (tables/JSON) and drill‑down‑ready answers
- Add basic guardrails: grounded responses, citations/traceback to data, and safe handling of sensitive fields.
Apply light‑to‑moderate ML where useful:
- anomaly detection (cost variances, outliers, feed failures)
- simple forecasting / trend analysis for key metrics
- model evaluation and error analysis
Create reproducible experimentation and evaluation:
- test question sets for the LLM
- accuracy/groundedness checks
- latency profiling and performance tuning
Package deliverables for deployment (Docker, config management), and produce technical documentation/runbooks.

Required skills & experience

7+ years hands‑on in data science / analytics engineering / ML engineering (individual contributor).
Expert in Python, especially:
- pandas, numpy
- data cleaning, joins/merges, windowed calculations, time‑series handling
- performance optimization (vectorization, profiling, memory management)
Strong SQL (complex joins, aggregates, window functions; tuning mindset).
Solid fundamentals in statistics and ML:
- feature engineering basics, evaluation metrics, overfitting awareness
- scikit‑learn (or equivalent) for quick modeling
GenAI implementation experience:
- Llama models (or comparable open LLMs)
- Ollama for local inference (or similar)
- RAG frameworks (LangChain/LlamaIndex) or custom retrieval pipelines
- embeddings + vector stores (FAISS/pgvector/Weaviate/Pinecone)
Good engineering habits:
- unit tests, data tests, logging, error handling
- Git, CI basics
- Docker and environment management

Nice-to-have

Snowflake experience (or similar modern cloud data platform).
dbt experience (modeling, tests, docs).
Experience with enterprise “semantic layers” or metric definitions at scale.
Experience building lightweight APIs (FastAPI) for analytics/LLM endpoints.
Familiarity with security constraints (RBAC concepts, masking, audit logs).

Tools / Stack (typical)

Python, pandas, numpy, SQL, scikit‑learn, Jupyter, Git, Docker, FastAPI (optional), LangChain/LlamaIndex (optional), Ollama, Llama models, vector DB (FAISS/pgvector/Weaviate), cloud data warehouse (Snowflake or equivalent).

Requirements

7+ years hands-on in data science / analytics engineering / ML engineering (individual contributor).
Expert in Python, especially: pandas, numpy
data cleaning, joins/merges, windowed calculations, time-series handling
performance optimization (vectorization, profiling, memory management)
Strong SQL (complex joins, aggregates, window functions; tuning mindset).
Solid fundamentals in statistics and ML: feature engineering basics, evaluation metrics, overfitting awareness
scikit-learn (or equivalent) for quick modeling
GenAI implementation experience: Llama models (or comparable open LLMs)
Ollama for local inference (or similar)
RAG frameworks (LangChain/LlamaIndex) or custom retrieval pipelines
embeddings + vector stores (FAISS/pgvector/Weaviate/Pinecone)
Good engineering habits: unit tests, data tests, logging, error handling
Git, CI basics
Docker and environment management

Responsibilities

Build robust analytics code in Python using pandas/numpy to compute, validate, and reconcile KPIs (costing, margins, QBR metrics, operational metrics).
Write efficient transformations (vectorization, memory optimization), and implement repeatable pipelines with tests and data validation.
Develop SQL to extract/shape datasets from enterprise sources and/or a cloud data warehouse; optimize queries as needed.
Implement a governed GenAI “ask the data” prototype: Use Llama-family models via Ollama (or llama.cpp/vLLM as needed)
Build RAG over structured + semi-structured data (chunking, embeddings, retrieval, reranking)
Produce structured outputs (tables/JSON) and drill-down-ready answers
Add basic guardrails: grounded responses, citations/traceback to data, and safe handling of sensitive fields.
Apply light-to-moderate ML where useful: anomaly detection (cost variances, outliers, feed failures)
simple forecasting / trend analysis for key metrics
model evaluation and error analysis
Create reproducible experimentation and evaluation: test question sets for the LLM
accuracy/groundedness checks
latency profiling and performance tuning
Package deliverables for deployment (Docker, config management), and produce technical documentation/runbooks.

Skills

DockerFastAPIFAISSGitGenAIJupyterLangChainLlamaLlamaIndexLLMMLOllamaPandasPineconePythonRAGSQLScikit-learnSnowflakeWeaviate

Similar roles

Verification IP Expert

Synopsys

Lead GenAI Backend Platform Software Engineer

FINRA

$114k – $249k/yr

Cloud Engineer – GCP

LightFeather

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free