KR
Senior Cloud Engineer – ML/AI Platform
KTek Resourcing
Toronto · On-site Full-time Senior Today
About the role
About the Role
We are seeking a Senior Cloud Engineer with deep expertise in AWS and Azure AI/ML services to drive our enterprise ML/AI platform capabilities. You will evaluate and enable cloud AI/ML services, build reusable architectural patterns, and develop automated MLOps solutions in a highly regulated banking environment. This role requires hands-on experience with modern AI/ML platforms and the ability to design secure, compliant solutions that accelerate AI adoption across the organization.
What You Will Do
- Evaluate and enable AWS and Azure AI/ML services (SageMaker, Bedrock, Azure OpenAI, Azure AI Foundry) through proof-of-concepts and comprehensive assessments
- Design and implement reusable architectural patterns for secure AI/ML integrations including private endpoints, customer-managed keys, and service-to-service authentication
- Build end-to-end MLOps platforms and automated ML pipelines for model training, evaluation, deployment, and monitoring
- Produce technical reports on security, networking, compliance, guardrails, and cost analysis for AI/ML service enablement
- Develop frameworks, infrastructure-as-code, and automation to accelerate AI/ML adoption
- Implement observability solutions with model monitoring, metrics, and drift detection
- Partner with Enterprise Architecture and senior stakeholders to align platform capabilities with strategic roadmaps
- Provide technical leadership and mentorship on AI/ML cloud best practices
What You Need to Succeed
Must Have
- 5–7 years of cloud engineering experience with 3+ years focused on AI/ML platforms
- Deep hands-on expertise with AWS AI/ML services: SageMaker (training, pipelines, inference, JumpStart), Bedrock
- Deep hands-on expertise with Azure AI/ML services: Azure Machine Learning, Azure OpenAI, Azure AI Foundry
- Experience building MLOps platforms and automated ML pipelines
- Strong knowledge of LLMOps, LLM lifecycle management, agentic AI, RAG (retrieval-augmented generation), and prompt engineering
- Experience implementing guardrails and governance for LLM services
- Proficiency in Python and infrastructure-as-code (Terraform, CloudFormation, ARM/Bicep)
- Experience with MLflow (or similar tool), experiment tracking, and model registries
- Expertise in cloud security patterns including private endpoints, customer-managed keys, and network isolation for AI/ML services
- Strong understanding of cloud networking architecture in regulated environments
- Experience working in highly regulated industries with compliance requirements
- Agile delivery experience.
Nice to Have
- AWS or Azure AI/ML certifications
- Experience with vector databases and embedding models
- Knowledge of model optimization and inference acceleration
- Background in financial services or banking
Skills
AWSAWS CloudFormationAWS SageMakerAzureAzure AI FoundryAzure AI MLAzure OpenAIDockerKubernetesLLMOpsMLflowPythonRAGTerraform
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free