Skip to content
mimi

GPU Systems Engineer

Base-2 Solutions

Bethesda · On-site Full-time Senior $152k – $172k/yr 2d ago

About the role

Position Summary

Support enterprise AI mission systems by designing, developing, and optimizing GPU clusters, with deep focus on operating systems, hardware, GPU platforms, and high-speed networking in a secure customer environment.

Essential Duties And Responsibilities

  • Design, configure, and maintain GPU clusters.
  • Collaborate with a multidisciplinary team to define and optimize architectures for performance, power efficiency, and required features.
  • Work closely with AI/ML engineers to integrate GPUs with Linux-based systems.
  • Optimize GPU drivers for compatibility, reliability, and performance.
  • Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers.
  • Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments.
  • Leverage Bash, Python, Ansible, Puppet, and Salt for tooling and automation.
  • Maintain technical documentation, architectural specifications, and Linux best practices.
  • Support ATO activities and ensure compliance with federal security standards.

Required Qualifications

  • Active TS/SCI with ability to obtain a CI Polygraph.
  • Bachelor's degree with a minimum of six years of experience in the category field. Three additional years of experience may be substituted for the bachelor's degree.
  • Experience managing NVIDIA GPU data center platforms, including DGX, HGX, H200, H100, and L4s.
  • Knowledge of enterprise server components, including storage/network controllers, HBAs, and SSDs.
  • Strong expertise with Linux distributions, including RHEL, Ubuntu, Oracle, and Rocky.
  • Excellent problem-solving skills and the ability to collaborate within a team.
  • Meet DoD 8570.11 IAT Level II certification requirements at a minimum; IAT Level III is also acceptable.
  • U.S. citizenship is required due to the nature of the government contracts supported.

Preferred Qualifications

  • Experience with Kubernetes cluster management and AI/ML workflow orchestration, including Argo, Airflow, and Kubeflow.
  • Familiarity with GPU virtualization and cloud computing.
  • Experience with Prometheus and Grafana for monitoring.
  • Knowledge of distributed resource scheduling systems such as Slurm, LSF, or similar tools.

Required Education and Experience Equivalency

Education Years of Experience
High School Diploma/GED 9
Associates Degree 9
Bachelors’ Degree 6
Masters’ Degree 6
PhD 6

Required Certifications

  • DoD 8570.11 IAT Level II certification: Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP.

Required Security Clearance

  • Active TS/SCI with ability to obtain a CI Polygraph.

Pay & Benefit Highlights

Compensation

  • Competitive fixed salary or hourly pay (based on experience, skills, location, and internal equity).
  • Employee referral bonuses up to $10,000 per hired referral.
  • Additional bonus opportunities for exceptional performance and contributions to business development and company growth (role-dependent).

Health

  • 100% company-paid medical premiums for employees and eligible dependents.
  • Choose from multiple plan options with CareFirst, Kaiser, and UnitedHealthcare, including PPO, POS, HMO, and HSA-compatible plans.
  • 100% company-paid dental premiums for employees and eligible dependents.
  • 100% company-paid vision premiums for employees and eligible dependents.

Income Protection

  • 100% company-paid premiums for short-term disability.
  • 100% company-paid premiums for long-term disability.
  • 100% company-paid premiums for accidental death & dismemberment (AD&D).
  • 100% company-paid premiums for life insurance up to $200,000.

Retirement

  • 401(k) with immediate vesting: 4% company match plus a 4% non-elective company contribution (8% total).
  • 401(k) pre-tax and Roth options.

Leave

  • Up to 20 days of flexible paid time off (PTO).
  • 11 paid floating holidays.

Work-Life Balance

  • Flexible work schedules, including flex time and compressed work periods (contract and project-dependent).

Skills

AnsibleBashDockerGrafanaH100H200KubernetesL4sLinuxNVIDIAPuppetPythonSaltSlurm

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free