Principal AI/ML Architect

altysys

India · Hybrid Full-time Lead Today

About the role

Principal AI-ML Architect

Role Overview

We are looking for an AI/ML Architect with strong hands‑on experience in Python‑based ML systems and GenAI solutions. This role focuses on designing and deploying production‑grade AI systems, especially leveraging LLMs, RAG pipelines, and MLOps practices.

You will work closely with engineering and product teams to build scalable, secure, and efficient AI‑powered applications.

Key Responsibilities

AI/ML System Design
- Design and implement end‑to‑end ML pipelines (data ingestion training evaluation deployment).
- Architect LLM‑based solutions using advanced prompting strategies, RAG (Retrieval‑Augmented Generation) and agentic workflows.
- Define scalable patterns for ML/ GenAI application development.
Model Development & Optimization
- Work on data analysis, quality benchmarking, lineage detection and curation, ingestion into vector stores
- Work on statistical model training, evaluation, hyper‑parameter tuning, feature engineering
- Work on fine‑tuning of LLMs for specific tasks and prompt optimization (no expectation to build models from scratch at large scale).
- Evaluate and select appropriate models (open weights or closed weights).
- Collaborate with data teams for feature engineering and dataset readiness.
MLOps & Deployment
- Implement MLOps best practices:
  - Model versioning
  - Experiment tracking
  - Monitoring & retraining pipelines
  - Prompt versioning
  - Drift detection
  - Token costs
- Handle model deployment in production environments (APIs, batch, streaming).
- Ensure performance, scalability, and reliability of AI systems.
Platform & Integration
- Integrate AI solutions with existing microservices and backend systems.
- Work with vector databases, caching, and APIs for GenAI use cases.
- Ensure security and governance in AI deployments.
Collaboration
- Partner with product managers and engineers to translate business problems into AI solutions.
- Mentor engineers on AI/ML and GenAI best practices.

Must Have Skills

Core
- 10–15 years of experience in software engineering / ML systems.
- Strong programming skills in Python (mandatory).
- Experience in building production‑grade ML systems (not just notebooks).
AI/ML & GenAI
- Hands‑on experience with:
  - Data Analysis and curation
  - Feature engineering
  - Statistical model training, evaluation & hyper parameter tuning
  - LLMs / GenAI applications
  - RAG pipeline design
  - Prompt engineering & model tuning
- Experience with frameworks like Tensorflow, PyTorch, Sci‑kit, LangChain, LlamaIndex, or similar.
- Understanding of embeddings, vector search, and retrieval systems.
- Exposure to custom model fine‑tuning (good to have, not mandatory).
MLOps & Deployment
- Experience with:
  - Model deployment (API‑based or batch)
  - CI/CD pipelines for ML
  - Monitoring and logging
- Familiarity with tools like MLflow, Kubeflow, or similar (any one is fine).
Cloud & Scalability
- Experience with at least one cloud: AWS / Azure / GCP.
- Understanding of scalable system design and APIs.
Data & Systems
- Working knowledge of databases (SQL/NoSQL).
- Experience with vector databases (Milvus, Pinecone, Weaviate, FAISS, etc.).

Good to Have (Optional)

Experience in AIOps or AI for observability/use‑case automation.
Background in data engineering or analytics pipelines.
Exposure to Kubernetes/Docker.
Experience in telecom or high‑scale product environments.

Location

Hyderabad / Bangalore (Work from office

Requirements

Strong programming skills in Python (mandatory).
Experience in building production-grade ML systems (not just notebooks).
Hands-on experience with Data Analysis and curation.
Hands-on experience with Feature engineering.
Hands-on experience with Statistical model training, evaluation & hyper parameter tuning.
Hands-on experience with LLMs / GenAI applications.
Hands-on experience with RAG pipeline design.
Hands-on experience with Prompt engineering & model tuning.
Experience with frameworks like Tensorflow, PyTorch, Sci-kit, LangChain, LlamaIndex, or similar.
Understanding of embeddings, vector search, and retrieval systems.
Experience with Model deployment (API-based or batch).
Experience with CI/CD pipelines for ML.
Experience with Monitoring and logging.
Familiarity with tools like MLflow, Kubeflow, or similar (any one is fine).
Experience with at least one cloud: AWS / Azure / GCP.
Understanding of scalable system design and APIs.
Working knowledge of databases (SQL/NoSQL).
Experience with vector databases (Milvus, Pinecone, Weaviate, FAISS, etc.).

Responsibilities

Design and implement end-to-end ML pipelines (data ingestion training evaluation deployment).
Architect LLM-based solutions using advanced prompting strategies, RAG (Retrieval-Augmented Generation) and agentic workflows.
Define scalable patterns for ML/ GenAI application development.
Work on data analysis, quality benchmarking, lineage detection and curation, ingestion into vector stores.
Work on statistical model training, evaluation, hyper-parameter tuning, feature engineering.
Work on fine-tuning of LLMs for specific tasks and prompt optimization (no expectation to build models from scratch at large scale).
Evaluate and select appropriate models (open weights or closed weights).
Collaborate with data teams for feature engineering and dataset readiness.
Implement MLOps best practices: Model versioning, Experiment tracking, Monitoring & retraining pipelines, Prompt versioning, Drift detection, Token costs.
Handle model deployment in production environments (APIs, batch, streaming).
Ensure performance, scalability, and reliability of AI systems.
Integrate AI solutions with existing microservices and backend systems.
Work with vector databases, caching, and APIs for GenAI use cases.
Ensure security and governance in AI deployments.
Partner with product managers and engineers to translate business problems into AI solutions.
Mentor engineers on AI/ML and GenAI best practices.

Skills

AWSAzureCI/CDDockerEmbeddingsFAISSFeature engineeringGCPGenAIKubeflowKubernetesLangChainLlamaIndexLLMMLOpsMLflowMilvusMonitoringNoSQLPineconePrompt engineeringPyTorchPythonRAGSci-kitSQLTensorflowVector databasesWeaviate

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free