Skip to content
mimi

AI Engineer [AI Infrastructure | Multi GPU | distributed training] - Singapore based

GRIT

US · On-site Full-time 2w ago

About the role

We’re partnering with a fast-growing technology company in Singapore building at the intersection of AI, high-performance compute, and cloud infrastructure. The team works closely with advanced AI workloads and is focused on helping technical customers run demanding systems more efficiently, reliably, and at scale.

This is a company with strong momentum in a high-growth market, offering the chance to work on next-generation AI infrastructure, complex distributed environments, and products used by sophisticated engineering teams. The environment is hands-on, technical, and suited to people who enjoy solving hard systems problems with real-world impact.

This role is required to be based in Singapore.

Responsibilities

  • Design, build, and improve systems that support large-scale AI / ML workloads
  • Work on performance, reliability, and scalability across compute, training, inference, or platform infrastructure
  • Develop internal or customer-facing tooling, frameworks, APIs, or workflows for AI-related systems
  • Contribute to workload orchestration, resource management, scheduling, or platform automation
  • Partner with engineers, researchers, and technical stakeholders to improve system efficiency and user experience
  • Help define best practices, reusable templates, benchmarks, and operational standards
  • Troubleshoot complex system issues across software, infrastructure, and distributed environments

Requirements

  • Strong software engineering background with experience in distributed systems, infrastructure, platform engineering, or machine learning systems
  • Experience in one or more of the following areas:
    • AI / ML infrastructure
    • model training or inference systems
    • workload orchestration or scheduling
    • cloud platform or compute infrastructure
    • performance optimization for large-scale systems
  • Solid coding skills in at least one common backend or systems language such as Python, Go, Java, or C++
  • Familiarity with modern infrastructure and platform tooling such as Kubernetes, containerized environments, cloud platforms, or cluster management systems
  • Ability to work in technically complex environments with a practical, problem-solving mindset
  • Comfortable collaborating across engineering and product stakeholders in a fast-moving setting

Nice to have

  • Experience with GPU workloads, high-performance computing, or large-scale distributed training
  • Exposure to ML frameworks, model optimization, benchmarking, or platform observability
  • Background building internal developer platforms, APIs, job systems, or shared infrastructure tools
  • Experience supporting technical customers, internal engineering users, or research teams

Skills

C++GoJavaKubernetesPython

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free