AI Solutions Architect - Machine Learning Operations (MLOps)
WhatJobs Direct
About the role
About
Our client is seeking an experienced AI Solutions Architect with a strong focus on Machine Learning Operations (MLOps) to join their innovative team. This position is office‑based, fostering intensive collaboration and hands‑on problem‑solving. You will be instrumental in designing, building, and deploying robust, scalable, and production‑ready machine learning systems. This involves architecting the infrastructure, pipelines, and tools necessary for efficient model development, training, deployment, monitoring, and management. The ideal candidate will possess a deep understanding of the entire ML lifecycle, cloud technologies, containerization, and automation.
Responsibilities
- Design and implement end‑to‑end MLOps pipelines for automated model training, validation, deployment, and monitoring.
- Architect scalable and reliable cloud infrastructure (AWS, Azure, GCP) to support ML workloads.
- Select and integrate appropriate ML tools, frameworks, and platforms (e.g., Kubeflow, MLflow, Sagemaker, Azure ML).
- Develop strategies for model versioning, artifact management, and reproducibility.
- Implement robust monitoring solutions to track model performance, detect drift, and trigger retraining.
- Collaborate closely with data scientists and software engineers to ensure seamless integration of ML models into production applications.
- Automate infrastructure provisioning and configuration using tools like Terraform or CloudFormation.
- Ensure security best practices are implemented throughout the ML lifecycle.
- Troubleshoot and resolve complex issues related to ML system performance, deployment, and operation.
- Stay current with the latest advancements in MLOps, AI, and cloud technologies.
- Provide technical guidance and mentorship to team members on MLOps best practices.
- Document system architecture, processes, and operational procedures.
Qualifications
- Master's or Bachelor's degree in Computer Science, Engineering, or a related quantitative field.
- Minimum of 5 years of experience in software engineering or cloud architecture, with a significant focus on MLOps and machine learning systems.
- Hands‑on experience with cloud platforms (AWS, Azure, or GCP) and their ML services.
- Proficiency in containerization technologies (Docker) and orchestration platforms (Kubernetes).
- Strong scripting and programming skills (e.g., Python) and experience with infrastructure‑as‑code tools.
- Familiarity with ML frameworks (e.g., TensorFlow, PyTorch, Scikit‑learn) and MLOps tools.
- Solid understanding of CI/CD principles and practices.
- Excellent problem‑solving, analytical, and communication skills.
- Ability to work effectively in a collaborative, team‑oriented environment.
- Experience in designing and managing production ML systems is highly desirable.
Location
- On‑site position located in Bauchi, Bauchi, NG, offering a collaborative and dynamic work environment.
Requirements
- Master's or Bachelor's degree in Computer Science, Engineering, or a related quantitative field.
- Minimum of 5 years of experience in software engineering or cloud architecture, with a significant focus on MLOps and machine learning systems.
- Hands-on experience with cloud platforms (AWS, Azure, or GCP) and their ML services.
- Proficiency in containerization technologies (Docker) and orchestration platforms (Kubernetes).
- Strong scripting and programming skills (e.g., Python) and experience with infrastructure-as-code tools.
- Familiarity with ML frameworks (e.g., TensorFlow, PyTorch, Scikit-learn) and MLOps tools.
- Solid understanding of CI/CD principles and practices.
- Excellent problem-solving, analytical, and communication skills.
- Ability to work effectively in a collaborative, team-oriented environment.
- Experience in designing and managing production ML systems is highly desirable.
Responsibilities
- Design and implement end-to-end MLOps pipelines for automated model training, validation, deployment, and monitoring.
- Architect scalable and reliable cloud infrastructure (AWS, Azure, GCP) to support ML workloads.
- Select and integrate appropriate ML tools, frameworks, and platforms (e.g., Kubeflow, MLflow, Sagemaker, Azure ML).
- Develop strategies for model versioning, artifact management, and reproducibility.
- Implement robust monitoring solutions to track model performance, detect drift, and trigger retraining.
- Collaborate closely with data scientists and software engineers to ensure seamless integration of ML models into production applications.
- Automate infrastructure provisioning and configuration using tools like Terraform or CloudFormation.
- Ensure security best practices are implemented throughout the ML lifecycle.
- Troubleshoot and resolve complex issues related to ML system performance, deployment, and operation.
- Stay current with the latest advancements in MLOps, AI, and cloud technologies.
- Provide technical guidance and mentorship to team members on MLOps best practices.
- Document system architecture, processes, and operational procedures.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free