Skip to content
mimi

Senior MLOps / LLMOps Engineer Kubernetes & AI Inference Platforms

ITCAPS LLC

Jersey City · On-site Contract Senior 1mo ago

About the role

Job Summary

We are seeking a highly skilled Senior MLOps / LLMOps Engineer to design, deploy, and support enterprise-scale AI/LLM platforms in production environments. The ideal candidate will have strong experience with Kubernetes/OpenShift, NVIDIA TensorRT-LLM, Triton Inference Server, and scalable AI infrastructure. This role focuses on building reliable, secure, and high-performance inference platforms for mission-critical AI applications.

Key Responsibilities

  • Deploy, manage, and troubleshoot containerized AI/LLM applications on Kubernetes/OpenShift platforms.
  • Configure, optimize, and support LLM inference workloads using NVIDIA TensorRT-LLM and Triton Inference Server.
  • Design and maintain scalable MLOps/LLMOps and container deployment pipelines.
  • Build CI/CD workflows for AI models, containers, and infrastructure deployments.
  • Package and deploy AI models across UAT, testing, and production environments.
  • Monitor platform performance, GPU utilization, availability, and operational health.
  • Implement logging, alerting, monitoring, and automated operational support processes.
  • Troubleshoot model deployment, scaling, networking, and load balancing issues.
  • Support model optimization techniques including quantization, pruning, and performance tuning.
  • Create operational runbooks, deployment procedures, health checks, and support documentation.
  • Support backup, restore, disaster recovery, failover, and business continuity planning.
  • Ensure platform security, RBAC, compliance, and governance standards are maintained.
  • Collaborate with AI, infrastructure, DevOps, and operations teams to deliver scalable AI solutions.

Required Qualifications

  • 5+ years of experience in Kubernetes/OpenShift administration and containerized environments.
  • Strong hands-on experience with NVIDIA TensorRT-LLM and Triton Inference Server.
  • Experience deploying and supporting LLM/AI inference services in production.
  • Strong knowledge of Docker, microservices, and API-based architectures.
  • Experience building and supporting MLOps/LLMOps pipelines and CI/CD workflows.
  • Expertise in monitoring, logging, and troubleshooting distributed systems.
  • Experience with NVIDIA GPU infrastructure and AI workload optimization.
  • Understanding of incident management, change management, and operational best practices.
  • Strong problem-solving, communication, and collaboration skills.

Preferred Qualifications

  • Experience with OpenShift AI and enterprise AI platforms.
  • Knowledge of model optimization and inference acceleration techniques.
  • Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
  • Familiarity with Infrastructure as Code (Terraform, Ansible, Helm, etc.).
  • Kubernetes/OpenShift or cloud certifications are a plus.

Skills

AIAnsibleAPIAWSAzureCI/CDDockerGCPGPUHelmKubernetesLLMMicroservicesMLOpsNVIDIA TensorRT-LLMOpenShiftTriton Inference ServerTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free