Senior MLOps / LLMOps Engineer Kubernetes & AI Inference Platforms

ITCAPS LLC

Jersey City · On-site Contract Senior 2mo ago

About the role

Job Summary

We are seeking a highly skilled Senior MLOps / LLMOps Engineer to design, deploy, and support enterprise-scale AI/LLM platforms in production environments. The ideal candidate will have strong experience with Kubernetes/OpenShift, NVIDIA TensorRT-LLM, Triton Inference Server, and scalable AI infrastructure. This role focuses on building reliable, secure, and high-performance inference platforms for mission-critical AI applications.

Key Responsibilities

Deploy, manage, and troubleshoot containerized AI/LLM applications on Kubernetes/OpenShift platforms.
Configure, optimize, and support LLM inference workloads using NVIDIA TensorRT-LLM and Triton Inference Server.
Design and maintain scalable MLOps/LLMOps and container deployment pipelines.
Build CI/CD workflows for AI models, containers, and infrastructure deployments.
Package and deploy AI models across UAT, testing, and production environments.
Monitor platform performance, GPU utilization, availability, and operational health.
Implement logging, alerting, monitoring, and automated operational support processes.
Troubleshoot model deployment, scaling, networking, and load balancing issues.
Support model optimization techniques including quantization, pruning, and performance tuning.
Create operational runbooks, deployment procedures, health checks, and support documentation.
Support backup, restore, disaster recovery, failover, and business continuity planning.
Ensure platform security, RBAC, compliance, and governance standards are maintained.
Collaborate with AI, infrastructure, DevOps, and operations teams to deliver scalable AI solutions.

Required Qualifications

5+ years of experience in Kubernetes/OpenShift administration and containerized environments.
Strong hands-on experience with NVIDIA TensorRT-LLM and Triton Inference Server.
Experience deploying and supporting LLM/AI inference services in production.
Strong knowledge of Docker, microservices, and API-based architectures.
Experience building and supporting MLOps/LLMOps pipelines and CI/CD workflows.
Expertise in monitoring, logging, and troubleshooting distributed systems.
Experience with NVIDIA GPU infrastructure and AI workload optimization.
Understanding of incident management, change management, and operational best practices.
Strong problem-solving, communication, and collaboration skills.

Preferred Qualifications

Experience with OpenShift AI and enterprise AI platforms.
Knowledge of model optimization and inference acceleration techniques.
Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
Familiarity with Infrastructure as Code (Terraform, Ansible, Helm, etc.).
Kubernetes/OpenShift or cloud certifications are a plus.

Skills

AIAnsibleAPIAWSAzureCI/CDDockerGCPGPUHelmKubernetesLLMMicroservicesMLOpsNVIDIA TensorRT-LLMOpenShiftTriton Inference ServerTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Senior MLOps / LLMOps Engineer Kubernetes & AI Inference Platforms

About the role

Job Summary

Key Responsibilities

Required Qualifications

Preferred Qualifications

Skills

Similar roles

backend developer

Fullstack Software Architect / Lead Engineer

Java Backend Engineer (all gender)

Don't send a generic resume