Skip to content
mimi

QA/Validation Engineer – Agentic AI & Machine Learning

E-Solutions

Toronto · On-site Contract 4d ago

About the role

• *QA / Validation Engineer – Agentic AI & Machine Learning** • *Toronto (Hybrid)**

We are seeking a QA / Validation Engineer to assure the quality, safety, and reliability of Machine Learning and Agentic AI solutions (LLM/RAG/tool-using agents) from development through production. This is a hands-on engineering role focused on designing test strategies, building automated evaluation pipelines, and implementing quality gates for data, models, prompts, tools, and end-to-end agent workflows. You will work primarily in Python and leverage an open-source AI/ML stack, collaborating closely with ML/GenAI engineers, data engineering, and platform teams in environments that may include Databricks and Spark. Key Objectives • Define and execute an AI/ML quality strategy covering data, model, and agent behavior validation across offline evaluation and production monitoring. • Build repeatable, automated evaluation and regression patterns for ML models and agentic workflows (including prompt, tool, and retrieval changes). • Improve reliability, safety, and user trust by systematically reducing hallucinations, tool misuse, regressions, and unintended behaviors. • Partner with engineering and platform teams to implement scalable, governed validation pipelines (including Databricks/Spark where applicable) while meeting security, privacy, and Responsible AI requirements.

Primary Responsibilities • Own end-to-end QA/validation for ML and Agentic AI solutions: requirements-to-metrics, test planning, execution, defect triage, and release sign-off. • Design and maintain evaluation frameworks for agentic systems: task success rate, tool-call correctness, grounding/citation quality (where used), latency/cost, and regression detection. • Build automated test suites in Python for: data validation (schema, drift, anomalies), feature/label quality checks, model inference correctness, and agent workflow validation (multi-step, tool-using, and memory-based flows). • Implement LLM/agent-specific quality checks: hallucination and factuality testing, prompt injection and jailbreak resistance testing, PII leakage checks, toxicity/safety filters, and policy conformance. • Validate RAG systems end-to-end: document chunking/embedding quality, retrieval accuracy (precision/recall), reranking behavior, and answer faithfulness to retrieved context. • Establish test data and “golden” datasets: curated evaluation sets, adversarial test cases, synthetic data generation (where appropriate), and clear acceptance criteria. • Integrate quality gates into CI/CD: unit/integration tests, evaluation runs, reporting dashboards, and release-blocking thresholds. • Partner with engineers to instrument observability: tracing, structured logs, metrics, error cohorts, and production monitoring for drift, degradation, bias, latency, and cost. • Collaborate with platform teams to run validations at scale (Databricks jobs, Spark pipelines, scheduled workflows) and ensure governance over data/model access. • Document validation approaches, test evidence, and risk assessments; support audits and compliance needs for regulated or high-impact use cases.

Required Skills & Experience • Strong QA engineering experience with a focus on AI/ML systems, including validation strategies beyond traditional functional testing. • Strong Python skills (test design, automation frameworks, packaging, code quality, and performance awareness). • Experience validating ML models and pipelines: dataset splits, leakage checks, metric selection, thresholding, and regression testing. • Hands-on familiarity with an open-source AI stack (examples: scikit-learn, PyTorch/TensorFlow, XGBoost/LightGBM, Hugging Face ecosystem). • Experience testing GenAI/LLM/agentic systems: prompt/version management, evaluation harnesses, and quality metrics for non-deterministic outputs. • Understanding of RAG concepts (embeddings, vector search, retrieval, reranking) and how to evaluate them. • Working knowledge of MLOps/LLMOps practices: experiment tracking, model/prompt versioning, reproducibility, and monitoring (e.g., MLflow or equivalent). • Experience with CI/CD, containerization (Docker), and test reporting; ability to integrate evaluations into automated pipelines. • Strong data skills: SQL fundamentals and experience with data analysis/validation using pandas/NumPy. • Clear communication and stakeholder management—able to translate quality risks into actionable engineering work.

Preferred / Nice to Have • Awareness of Databricks concepts (workspaces, notebooks, jobs, clusters) and how QA/validation can be operationalized via Databricks workflows. • Experience with Spark for large-scale data validation and distributed test execution. • Familiarity with Databricks MLflo

Requirements

  • Strong QA engineering experience with a focus on AI/ML systems
  • Strong Python skills
  • Experience validating ML models and pipelines
  • Hands-on familiarity with an open-source AI stack
  • Experience testing GenAI/LLM/agentic systems
  • Understanding of RAG concepts
  • Working knowledge of MLOps/LLMOps practices
  • Experience with CI/CD, containerization, and test reporting
  • Strong data skills

Responsibilities

  • Own end-to-end QA/validation for ML and Agentic AI solutions
  • Design and maintain evaluation frameworks for agentic systems
  • Build automated test suites in Python
  • Implement LLM/agent-specific quality checks
  • Validate RAG systems end-to-end
  • Establish test data and 'golden' datasets
  • Integrate quality gates into CI/CD
  • Partner with engineers to instrument observability
  • Collaborate with platform teams to run validations at scale
  • Document validation approaches, test evidence, and risk assessments

Benefits

Hybrid work environment

Skills

PythonAI/MLscikit-learnPyTorch/TensorFlowXGBoost/LightGBMHugging Face ecosystemGenAI/LLM/agentic systemsRAG conceptsMLOps/LLMOpsCI/CDDockerSQLpandas/NumPy

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free