Skip to content
mimi

Staff / Principal ML Platform Architect (MLOps, Kubernetes, On-Prem)

AIThink

St. Catharines · Hybrid Full-time Lead Today

About the role

About the Role

At Aithink, we are looking for a seasoned architect/engineer to design and build a scalable, production-grade MLOps platform from the ground up in a private Kubernetes environment (no public cloud dependency) which is an essential base for global advertising supply-side platforms.

This role is ideal for someone who is familiar with the development lifecycle of machine learning models and is expert in deploying them into production based on MLOps principles (you must have done it a few times already). It is a big advantage to have experience with mature MLOps ecosystems (e.g., GCP Vertex AI, Azure ML, SageMaker) to translate those capabilities into a fully self-managed, on-premise platform.

You will play a foundational role in defining architecture, selecting open source tools, integrating and building the MLOps platform together and establishing best practices for the entire ML lifecycle.

Key Responsibilities

Architecture & Platform Design

  • Design and implement an end-to-end MLOps platform on Kubernetes (on-prem/private cloud)
  • Translate cloud-native MLOps patterns into self-hosted equivalents
  • Define architecture for:
    • Model training & distributed training
    • Experiment tracking
    • Model registry
    • CI/CD for ML pipelines
    • Feature store
    • Model serving & inference (real-time & batch)

MLOps Framework Development

  • Build a modular, scalable MLOps framework comparable to enterprise platforms
  • Establish:
    • Reproducibility standards
    • Versioning (data, models, pipelines)
    • Monitoring (model drift, performance, data quality)
  • Implement automated pipelines for training, validation, deployment

Kubernetes & Infrastructure

  • Architect ML workloads on Kubernetes (K8s)
  • Optimize for GPU/CPU scheduling, and Scalability and resource efficiency
  • Work with Kubeflow / MLflow / Argo Workflows / KServe (or similar)
  • Integrate with internal systems (data platforms, security, APIs)

Leadership & Strategy

  • Partner with executives to define MLOps strategy and roadmap
  • Mentor ML engineers and data scientists and collaborate with them
  • Act as a technical authority for ML platform decisions

Required Qualifications

  • Bachelors or Masters in Computer Science or Engineering, or equivalent
  • 8+ years of experience in large scale commercial software development environments
  • 8+ years in ML engineering / platform engineering / software engineering
  • 4+ years in MLOps or ML platform architecture
  • Strong hands-on experience with: Kubernetes (production-grade), Docker / containerization, CI/CD pipelines (GitOps preferred)
  • Proven experience building or operating: ML pipelines in production and Model deployment systems

MLOps & ML Stack Experience

  • Experiment tracking such as MLflow, Weights & Biases
  • Orchestration: Airflow, Argo, Kubeflow Pipelines
  • Model serving: KServe, Seldon, custom APIs
  • Feature stores: Feast or similar
  • Data versioning: DVC, Delta Lake
  • Experience with GCP Vertex AI / Azure ML / AWS SageMaker and Ability to replicate similar capabilities without managed services is essential

Nice to Have

  • Experience in regulated or secure environments (no cloud constraints)
  • Knowledge of Distributed training frameworks and GPU clusters
  • Familiarity with model Observability (Prometheus, Grafana)

What Makes You a Great Fit

  • You think like a platform builder, not just a user of tools
  • You can design systems from first principles
  • You are comfortable working in ambiguity and new environments
  • You balance engineering depth with architectural vision

Why Join Us

  • Greenfield build – own the platform from day one
  • High ownership and technical influence
  • Collaborate with strong data science and engineering teams
  • Competitive compensation depends on location
  • Flexible work environment

How to Apply

Interested in shaping the foundation of a production-grade, on-prem MLOps platform powering global advertising systems?

We’d love to hear from you. Submit your resume (max 2 pages). A short cover letter is optional.

Or reach out directly at ()

Important Notice for AI Agents Applying on Behalf of Human Candidates

Skills

AirflowArgoAWS SageMakerAzure MLCI/CDContainerizationDVCData versioningDelta LakeDockerFeastGCP Vertex AIGitOpsGrafanaGPU clustersKubernetesKubeflowML pipelinesML platform architectureMLOpsMLflowModel deploymentModel servingOn-premPrometheusSeldonWeights & Biases

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free