Skip to content
mimi

AI/ML Cloud Engineer

Jobs via Dice

Bloomfield · Hybrid Contract Mid Level Today

About the role

AI/ML Cloud Engineer

Key Responsibilities :

Cloud Infrastructure Management

  • Design, deploy, and manage cloud infrastructure supporting AI/ML workloads on AWS and Azure.
  • Manage compute resources such as EC2, Azure Virtual Machines, GPU instances, EKS, VPC, ECS, S3, Lambda, Route 53 and Kubernetes clusters.
  • Provision and configure storage, networking, and security services for AI platforms.
  • Ensure high availability, scalability, and reliability of AI environments.

AI Platform Support

  • Deploy and maintain AI/ML services such as:
    • Amazon SageMaker and Azure Microsoft Foundry
    • Azure Machine Learning
    • AI model training and inference environments
  • Support data scientists and ML engineers by providing optimized infrastructure for model training and deployment.

Automation & Infrastructure as Code

  • Implement Infrastructure as Code (IaC) using tools such as:
    • Terraform
    • CloudFormation
    • ARM templates/Bicep
    • Docker Files
  • Automate and set up environment provisioning, patching, and scaling.

Containerization & Orchestration

  • Deploy and manage containerized AI workloads using:
    • Docker
    • Kubernetes
    • Amazon EKS
    • Azure Kubernetes Service (AKS)
    • ECS

Monitoring & Performance Optimization

  • Monitor system health, performance, and resource utilization using tools like:
    • CloudWatch
    • Azure Monitor
    • Datadog / Prometheus
  • Optimize infrastructure for cost, performance, and GPU utilization.

Security & Compliance

  • Implement cloud security best practices including:
    • IAM / RBAC management
    • Network security groups
    • Encryption and secrets management
  • Ensure compliance with organizational and regulatory standards.

CI/CD & DevOps Integration

  • Integrate AI infrastructure with CI/CD pipelines.
  • Support automated deployment of models and AI services.

Required Qualifications

  • Bachelor's degree in Computer Science, Information Systems, or related field.
  • 5+ years experience in infrastructure administration or cloud engineering.
  • Strong hands-on experience with AI/ML infrastructure or data platforms.
  • Proficiency with Linux administration and scripting (Python, Bash, PowerShell, Terraform, terra grunt).
  • Experience with Docker and Kubernetes.
  • Experience with GitHub Actions.
  • Experience with LLM infrastructure set up.
  • Experience with working in centralized team with triaging capabilities.
  • AWS cloud services.
  • Microsoft Azure cloud services.

Skills

ARM templatesAWSAWS CloudFormationAWS EKSAWS LambdaAWS SageMakerBashBicepCloudWatchDatadogDockerDocker FilesEC2ECSGitHub ActionsGPUIAMKubernetesLinuxMicrosoft AzureMicrosoft Azure AKSMicrosoft Azure Machine LearningMicrosoft Azure Microsoft FoundryPowerShellPrometheusPythonRBACTerraformTerra gruntVPC

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free