Skip to content
mimi

On-prem Platform Engineer

Saransh Inc

Charlotte · On-site Contract 2w ago

About the role

About

On-prem Platform Engineer

Location

Brevard, Charlotte

Technologies

Arize AI, Claude Cowork, GCP, Terraform

Inference Stack

vLLM, TensorRT LLM, Triton Inference Server, SGLang, Inference Optimization, Continuous Batching, Speculative Decoding, KV Cache / Prefix Caching, FP8 / AWQ / GPTQ, Tensor Parallelism

ML Serving & Orchestration

Kubernetes, ML Serving, KServe, OpenShift AI, Helm / Operators, GPU Orchestration, Run:AI

Observability & Performance

Performance Benchmarking, CUDA / NCCL / MIG, Prometheus / Grafana, ML Observability

Load Testing

GuideLLM, Locust

Responsibilities

  • Build, configure, and operate on prem Kubernetes/OpenShift AI platforms for deploying and serving GenAI models and LLM inference workloads.
  • Design and optimize high performance inference stacks using vLLM, TensorRT LLM, Triton Inference Server, SGLang, and advanced techniques (continuous batching, speculative decoding, KV caching).
  • Manage GPU orchestration and capacity using Run:AI, MIG, CUDA/NCCL, and tensor parallelism to maximize utilization and throughput.
  • Deploy and operate Kubernetes ML serving frameworks (KServe, Helm, Operators) for scalable, reliable model serving.
  • Drive inference optimization and benchmarking, leveraging FP8, AWQ, GPTQ, and performance tools such as GuideLLM and Locust.
  • Implement observability and ML monitoring using Prometheus, Grafana, Arize AI, ensuring SLA/SLO compliance for GenAI services.
  • Collaborate with ML and research teams to onboard new models, tune inference performance, and productionize GenAI use cases.

Skills

Arize AIAWQClaude CoworkContinuous BatchingCUDAFP8GCPGPTQGuideLLMHelmInference OptimizationKServeKubernetesKV CacheLocustMIGML ObservabilityML ServingNCCLOpenShiftOperatorsPerformance BenchmarkingPrefix CachingPrometheusRun:AISGLangSpeculative DecodingTensor ParallelismTensorRT LLMTerraformTriton Inference ServervLLM

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free