GPU Infrastructure & AI Platform Engineer

Link Consulting Services

Herndon · On-site Full-time Mid Level 3d ago

About the role

Role Overview:

We are seeking a hands-on engineer to deliver end-to-end GPU infrastructure and AI/GenAI environment deployment within a lab or data center. This role covers hardware installation, platform setup, infrastructure optimization, and monitoring implementation, ensuring a fully operational and validated environment.

Key Responsibilities:

Install and rack-mount GPU servers, including cabling, firmware/OS baseline configuration, driver installation, and integration testing
Set up AI/GenAI environments using container runtimes (Docker/Kubernetes) and deploy inference tooling, delivering at least one validated use case
Perform rack modernization and infrastructure cleanup, including audit, optimized rack design, equipment reorganization, and structured power/data cable remediation
Implement monitoring solutions for GPU servers and lab infrastructure, including dashboards, alerts, agent deployment, and documentation handover

Required Skills:

Experience with GPU servers and data center environments (rack, power, cabling)
Strong Linux administration and system configuration
Knowledge of GPU drivers, CUDA, and performance validation
Experience with Docker and/or Kubernetes
Familiarity with AI/GenAI inference tools (e.g., Triton, vLLM, Ollama, or similar)
Experience with monitoring tools (Prometheus/Grafana, Zabbix, or equivalent)

Experience:

5+ years in systems, infrastructure, or data center engineering
Proven experience delivering GPU or AI infrastructure deployments

Skills

AICUDADockerGenAIGrafanaGPUKubernetesLinuxOllamaPrometheusTritonvLLMZabbix

Similar roles

Lead Full-Stack Developer

EncaptureMD by Flexible Informatics

Cloud Architect

Geotab

Data Solutions Architect

Air Canada

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free