S
senior Datascience & AI engineer
S2Integrators
Hyderabad · On-site Full-time Senior Today
About the role
Responsibilities (technical)
- Build robust analytics code in Python using pandas/numpy to compute, validate, and reconcile KPIs (costing, margins, QBR metrics, operational metrics).
- Write efficient transformations (vectorization, memory optimization), and implement repeatable pipelines with tests and data validation.
- Develop SQL to extract/shape datasets from enterprise sources and/or a cloud data warehouse; optimize queries as needed.
- Implement a governed GenAI “ask the data” prototype:
- Use Llama-family models via Ollama (or llama.cpp/vLLM as needed)
- Build RAG over structured + semi-structured data (chunking, embeddings, retrieval, reranking)
- Produce structured outputs (tables/JSON) and drill‑down‑ready answers
- Add basic guardrails: grounded responses, citations/traceback to data, and safe handling of sensitive fields.
- Apply light‑to‑moderate ML where useful:
- anomaly detection (cost variances, outliers, feed failures)
- simple forecasting / trend analysis for key metrics
- model evaluation and error analysis
- Create reproducible experimentation and evaluation:
- test question sets for the LLM
- accuracy/groundedness checks
- latency profiling and performance tuning
- Package deliverables for deployment (Docker, config management), and produce technical documentation/runbooks.
Required skills & experience
- 7+ years hands‑on in data science / analytics engineering / ML engineering (individual contributor).
- Expert in Python, especially:
- pandas, numpy
- data cleaning, joins/merges, windowed calculations, time‑series handling
- performance optimization (vectorization, profiling, memory management)
- Strong SQL (complex joins, aggregates, window functions; tuning mindset).
- Solid fundamentals in statistics and ML:
- feature engineering basics, evaluation metrics, overfitting awareness
- scikit‑learn (or equivalent) for quick modeling
- GenAI implementation experience:
- Llama models (or comparable open LLMs)
- Ollama for local inference (or similar)
- RAG frameworks (LangChain/LlamaIndex) or custom retrieval pipelines
- embeddings + vector stores (FAISS/pgvector/Weaviate/Pinecone)
- Good engineering habits:
- unit tests, data tests, logging, error handling
- Git, CI basics
- Docker and environment management
Nice-to-have
- Snowflake experience (or similar modern cloud data platform).
- dbt experience (modeling, tests, docs).
- Experience with enterprise “semantic layers” or metric definitions at scale.
- Experience building lightweight APIs (FastAPI) for analytics/LLM endpoints.
- Familiarity with security constraints (RBAC concepts, masking, audit logs).
Tools / Stack (typical)
Python, pandas, numpy, SQL, scikit‑learn, Jupyter, Git, Docker, FastAPI (optional), LangChain/LlamaIndex (optional), Ollama, Llama models, vector DB (FAISS/pgvector/Weaviate), cloud data warehouse (Snowflake or equivalent).
Requirements
- 7+ years hands-on in data science / analytics engineering / ML engineering (individual contributor).
- Expert in Python, especially: pandas, numpy
- data cleaning, joins/merges, windowed calculations, time-series handling
- performance optimization (vectorization, profiling, memory management)
- Strong SQL (complex joins, aggregates, window functions; tuning mindset).
- Solid fundamentals in statistics and ML: feature engineering basics, evaluation metrics, overfitting awareness
- scikit-learn (or equivalent) for quick modeling
- GenAI implementation experience: Llama models (or comparable open LLMs)
- Ollama for local inference (or similar)
- RAG frameworks (LangChain/LlamaIndex) or custom retrieval pipelines
- embeddings + vector stores (FAISS/pgvector/Weaviate/Pinecone)
- Good engineering habits: unit tests, data tests, logging, error handling
- Git, CI basics
- Docker and environment management
Responsibilities
- Build robust analytics code in Python using pandas/numpy to compute, validate, and reconcile KPIs (costing, margins, QBR metrics, operational metrics).
- Write efficient transformations (vectorization, memory optimization), and implement repeatable pipelines with tests and data validation.
- Develop SQL to extract/shape datasets from enterprise sources and/or a cloud data warehouse; optimize queries as needed.
- Implement a governed GenAI “ask the data” prototype: Use Llama-family models via Ollama (or llama.cpp/vLLM as needed)
- Build RAG over structured + semi-structured data (chunking, embeddings, retrieval, reranking)
- Produce structured outputs (tables/JSON) and drill-down-ready answers
- Add basic guardrails: grounded responses, citations/traceback to data, and safe handling of sensitive fields.
- Apply light-to-moderate ML where useful: anomaly detection (cost variances, outliers, feed failures)
- simple forecasting / trend analysis for key metrics
- model evaluation and error analysis
- Create reproducible experimentation and evaluation: test question sets for the LLM
- accuracy/groundedness checks
- latency profiling and performance tuning
- Package deliverables for deployment (Docker, config management), and produce technical documentation/runbooks.
Skills
DockerFastAPIFAISSGitGenAIJupyterLangChainLlamaLlamaIndexLLMMLOllamaPandasPineconePythonRAGSQLScikit-learnSnowflakeWeaviate
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free