ML Ops Engineer

Software Technology Inc.

Austin · Hybrid Contract Senior 2mo ago

About the role

Job Description

We are looking for an experienced MLOps Engineer to design, build, and manage scalable machine learning infrastructure on AWS. This role will drive the end-to-end operationalization of ML models — from automated training pipelines and experiment tracking to production deployment, monitoring, and continuous retraining. The ideal candidate will bridge the gap between data science and engineering, establishing robust MLOps practices that ensure reliable, repeatable, and efficient delivery of ML solutions at scale using AWS-native services and industry-leading tools like SageMaker, Kubeflow, and MLflow.

Roles & Responsibilities

Design, implement, and manage AWS-based MLOps infrastructure to support large-scale machine learning workflows
Build and maintain end-to-end ML pipelines using SageMaker Pipelines, Step Functions, and Kubeflow for automated training, validation, and deployment
Implement model versioning, experiment tracking, and model registry practices using MLflow and SageMaker Model Registry
Develop and maintain CI/CD pipelines for ML models, ensuring seamless integration from development to production
Demonstrate hands-on expertise in Python and frameworks like TensorFlow or PyTorch, with deployment on SageMaker endpoints
Utilize Docker, Amazon EKS, and AWS-native CI/CD tools to streamline ML deployment and operations
Leverage core AWS services such as S3, EC2, Lambda, Glue, and Athena for building and scaling data and ML infrastructure
Deploy, manage, and optimize machine learning models in production using SageMaker real-time and batch inference endpoints
Implement automated model monitoring, drift detection, and retraining triggers to maintain model health in production
Set up A/B testing and canary deployment strategies for safe model rollouts
Collaborate with data scientists and engineering teams to standardize MLOps practices and enhance performance across the AWS ecosystem
Monitor system and model performance using CloudWatch, CloudTrail, and X-Ray, troubleshoot issues, and ensure high availability and reliability
Stay informed about the latest AWS service releases, MLOps best practices, and advancements in ML operations tooling

Required Skills & Qualifications

Proficiency in production-grade machine learning system development, deployment, and MLOps practices on AWS
Strong experience with Python and ML frameworks such as TensorFlow or PyTorch
Familiarity with containerization and orchestration tools like Docker and Kubernetes (including Amazon EKS)
Hands-on experience with CI/CD pipelines using AWS-native tools such as CodePipeline, CodeBuild, and CodeDeploy
Advanced knowledge of AWS cloud services, particularly SageMaker, Bedrock, Lambda, Step Functions, and S3
Expertise in MLOps tools and platforms including Kubeflow, MLflow, and AWS SageMaker Pipelines for end-to-end model lifecycle management
Experience with model versioning, experiment tracking, model registry, and automated retraining workflows
Familiarity with AWS infrastructure-as-code tools such as CloudFormation or CDK
Strong understanding of model monitoring, drift detection, and A/B testing in production environments
Strong analytical and troubleshooting skills to maintain high system reliability using CloudWatch, X-Ray, and AWS-native observability tools

Skills

AWS CloudFormationAWS CodeBuildAWS CodePipelineAWS EKSAWS LambdaAWS SageMakerAWS SageMaker PipelinesAWS Step FunctionsAWS S3AWS CloudWatchCI/CDDockerDrift DetectionInfrastructure-as-CodeKubernetesKubeflowMLflowMLOpsModel MonitoringModel RegistryPythonTensorFlowPyTorch

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

ML Ops Engineer

About the role

Job Description

Roles & Responsibilities

Required Skills & Qualifications

Skills

Similar roles

AI Forward Deploy Engineer

Cloud Engineer

Mechanical Engineer (Mechanical & Hydraulic)

Don't send a generic resume