LC
GPU Infrastructure & AI Platform Engineer
Link Consulting Services
Herndon · On-site Full-time Mid Level 3d ago
About the role
Role Overview:
We are seeking a hands-on engineer to deliver end-to-end GPU infrastructure and AI/GenAI environment deployment within a lab or data center. This role covers hardware installation, platform setup, infrastructure optimization, and monitoring implementation, ensuring a fully operational and validated environment.
Key Responsibilities:
- Install and rack-mount GPU servers, including cabling, firmware/OS baseline configuration, driver installation, and integration testing
- Set up AI/GenAI environments using container runtimes (Docker/Kubernetes) and deploy inference tooling, delivering at least one validated use case
- Perform rack modernization and infrastructure cleanup, including audit, optimized rack design, equipment reorganization, and structured power/data cable remediation
- Implement monitoring solutions for GPU servers and lab infrastructure, including dashboards, alerts, agent deployment, and documentation handover
Required Skills:
- Experience with GPU servers and data center environments (rack, power, cabling)
- Strong Linux administration and system configuration
- Knowledge of GPU drivers, CUDA, and performance validation
- Experience with Docker and/or Kubernetes
- Familiarity with AI/GenAI inference tools (e.g., Triton, vLLM, Ollama, or similar)
- Experience with monitoring tools (Prometheus/Grafana, Zabbix, or equivalent)
Experience:
- 5+ years in systems, infrastructure, or data center engineering
- Proven experience delivering GPU or AI infrastructure deployments
Skills
AICUDADockerGenAIGrafanaGPUKubernetesLinuxOllamaPrometheusTritonvLLMZabbix
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free