A
Senior ML Engineer, ML Systems and Infrastructure
Autodesk
Canada · On-site Full-time Senior Today
About the role
About the Role
Autodesk is seeking a Senior ML Engineer, ML Systems and Infrastructure to design and scale the systems that enable machine learning across research and product development. You will help build the infrastructure behind large-scale data pipelines, distributed training systems, evaluation frameworks, and production ML workflows that support foundation models and ML-powered product features. You will operate independently across multiple parts of the stack and help define strong engineering practices for reliability, performance, and maintainability.
What You'll Do
- Design and build scalable systems for ML training, evaluation, deployment, and monitoring
- Develop and improve data pipelines that process large-scale structured and semi-structured technical datasets
- Optimize distributed workflows for performance, reliability, resource utilization, and cost efficiency
- Build platform capabilities such as experiment tracking, model versioning, checkpointing, reproducibility, and observability
- Contribute to model deployment, inference services, and production monitoring workflows
- Improve data quality, lineage, provenance, and operational transparency across ML pipelines
- Contribute to architecture and design discussions across the team
- Identify and resolve bottlenecks in data, compute, orchestration, and observability layers
- Mentor engineers through code reviews, design guidance, and knowledge sharing
- Collaborate closely with researchers, product engineers, and platform partners to turn ML workflows into robust engineering systems
Who You Are
- This role is ideal for an engineer who is deeply interested in scalable systems and production-grade ML infrastructure
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent industry experience
- At least 3 to 4 years of industry experience building and operating production software, ML systems, distributed infrastructure, or large-scale data pipelines
- Strong experience in software engineering, distributed systems, backend systems, or ML infrastructure
- Strong proficiency in Python and experience delivering production-quality systems
- Experience designing and operating scalable data or compute pipelines
- Experience with cloud platforms such as AWS, Azure, or GCP
- Familiarity with containers, CI/CD, observability, and release quality practices
- Ability to independently drive technical execution on complex work with limited oversight
- Experience building data pipelines for large-scale structured and semi-structured technical datasets
- Experience with data lineage, provenance, governance, and responsible data usage in ML systems
- Experience with distributed data processing and orchestration systems such as Ray, Airflow, Spark, or similar platforms
- Experience with model deployment, inference services, monitoring, and observability for production ML systems
- Experience building ML-ready representations for geometry, graph, hierarchical, or multimodal data
- Experience with distributed ML frameworks such as PyTorch, Lightning, DeepSpeed, FSDP, Megatron, or similar
- Familiarity with AEC workflows, design data, BIM/CAD formats, or Autodesk products
- Thinks like a systems engineer and executes like a strong software developer
- Can balance short-term delivery with long-term platform health
- Brings strong technical judgment and ownership
- Improves team effectiveness through mentoring and engineering rigor
- Enjoys solving scaling, performance, and reliability challenges
Skills
AWSAzureBIM/CADCI/CDCloudContainerData LineageDeepSpeedFSDPGCPGraphInferenceLightningMegatronMLModel DeploymentObservabilityOrchestrationPyTorchPythonRaySparkSQLSystem EngineeringTraining
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free