Skip to content
mimi

HPC Scientific Software Engineer - Research Computing

John Hopkins University

Baltimore · On-site Full-time Senior $100k – $175k/yr 5d ago

About the role

About

We, at IT@JH Research Computing, are looking for a HPC Sr. Scientific Software Engineer to join our team and contribute to the design, construction, and support of Johns Hopkins Universitys high-performance computing and AI research infrastructure. We provide a dynamic environment that brings together systems and software engineering to deliver scalable and reproducible solutions for data-intensive research. Our team thrives on collaboration, continuous learning, and supporting innovative research initiatives. This full-time position offers a starting salary range of $99,800 to $175,000 annually, commensurate with experience, and is based at the Johns Hopkins Bayview campus, operating Monday to Friday from 8:30 AM to 5 PM.

Responsibilities

  • Design and implement strategies for deploying scientific software on HPC and AI systems.
  • Create computational workflows, selecting the most effective software configurations, utilizing tools such as Ansible for automation.
  • Assist teams in tuning and optimizing AI models and gateway applications like XDMoD, Coldfront, Open OnDemand, CryoSPARC Live, SBGrid, and AI Agents.
  • Analyze and enhance the performance of AI models and HPC applications, prioritizing GPU-enabled computing.
  • Establish parallel processing, distributed computing, and resource management methods for efficient job execution.
  • Develop, debug, and maintain software tools, libraries, and frameworks essential for HPC and AI tasks.
  • Collaborate with system teams and software vendors including NVIDIA, Intel, and Matlab to optimize performance.
  • Utilize CUDA, DNN, TensorRT, and Intel Compilers to boost system efficiency.
  • Oversee scientific software deployment across HPC, cloud, and colocation facilities.
  • Manage the installation, configuration, and upkeep of HPC packages using tools like CMake, Make, EasyBuild, Spack, and Lua module files.
  • Engage closely with cross-functional teams, including researchers and software developers, to tackle complex HPC/AI problems.
  • Mentor junior engineers and promote a culture of continuous learning.
  • Resolve technical challenges and conduct root cause analyses for HPC/AI software issues.
  • Implement solutions to enhance system reliability and prevent issues from reoccurring.
  • Conduct training workshops for researchers and students on troubleshooting, workflow optimization, and utilizing HPC systems effectively.
  • Remain updated on advancements in HPC and AI technologies and methodologies.
  • Integrate new research into current systems to enhance performance and capabilities.
  • Develop and oversee container orchestration strategies ensuring application scalability, reliability, and security.
  • Create thorough documentation for system architectures, performance metrics, and project progress.
  • Ensure adherence to security and regulatory requirements for all HPC and AI platforms.

Requirements

  • PhD in a quantitative discipline.
  • Five years of experience in HPC user support, software deployment, and performance optimization within an academic or research environment.
  • Additional education may substitute for required experience and additional related experience may substitute for required education beyond a high school diploma as permitted by the JHU equivalency formula.
  • Eight or more years of professional experience in high-performance computing, large-scale systems, or research software engineering (preferred).
  • Deep proficiency in Linux systems administration, performance tuning, and automation tools such as Ansible, Terraform, Jenkins, or similar (preferred).
  • Experience with cluster management, workload schedulers (e.g., Slurm), and distributed or parallel file systems (e.g., GPFS, Lustre, WekaFS, Ceph) (preferred).
  • Strong programming or scripting skills in languages such as Python, Bash, C/C++, Go, or Rust (preferred).
  • Familiarity with containerization and orchestration technologies used in HPC (e.g., Singularity, Apptainer, Docker, Kubernetes) (preferred).
  • Understanding of high-speed interconnects (InfiniBand, 100/400 Gb Ethernet) and storage/data access patterns for AI and analytics (preferred).
  • Experience in developing or maintaining CI/CD pipelines and module environments (Lmod/Spack) for research software (preferred).
  • Knowledge of GPU computing (CUDA, ROCm), MPI/OpenMP, and AI/ML frameworks (preferred).
  • Demonstrated ability to collaborate with researchers on performance optimization, workflow design, and reproducible computing (preferred).

Tech Stack

  • AI
  • Ansible
  • Bash
  • CI/CD
  • Cloud
  • Ceph
  • CUDA
  • Docker
  • ELK
  • Ethernet
  • Grafana
  • InfiniBand
  • Support
  • Jenkins
  • Kubernetes
  • Linux
  • Matlab
  • Prometheus
  • Python
  • Rust
  • Security
  • Terraform
  • Machine-Learning

Location

Keswick Road 3910, Baltimore, United States

Salary

$99,800 - 175,000 per year

Benefits

  • None explicitly listed.

Skills

AIAnsibleBashC++CephCI/CDCloudCMakeContainerizationCUDADockerEasyBuildELKEthernetGrafanaInfiniBandIntel CompilersJenkinsKubernetesLinuxLmodLuaMakeMatlabMPIOpenMPPrometheusPythonROCmRustSecuritySlurmSpackTerraformTensorRTXDMOD

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free