Skip to content
mimi

Senior Cloud Engineer

TechVirtue LLC

Warren · On-site Full-time Senior Yesterday

About the role

About

The Senior Cloud Engineer (The Senior Cloud EngineerOnsite Lead / Architect - AWS HPCOps) is responsible for designing, implementing, and managing large-scale high-performance computing (HPC) platforms on AWS for scientific and research-driven workloads. This role provides architecture oversight, technical leadership, and hands‑on engineering support, while ensuring operational excellence and strategic alignment with enterprise goals.

As the onsite lead, you will guide a distributed engineering team, collaborate with scientific and IT stakeholders, and introduce emerging technologies that enhance HPC scalability, reliability, and automation. The role requires deep expertise in AWS HPC architecture, strong leadership capabilities, and the ability to operate autonomously in complex environments.

Requirements

AWS HPC Architecture & Engineering

  • Architect scalable HPC solutions using AWS ParallelCluster, AWS Batch, EC2 Spot, Auto Scaling, and other core AWS services
  • Design computing environments supporting computational chemistry, molecular dynamics, genomics, and high‑throughput scientific workloads
  • Develop and optimize data storage solutions using S3, EFS, FSx for Lustre, and high-performance data access patterns

Engineering Leadership & Delivery Management

  • Lead onsite activities and manage a team of ~10 offshore HPC/cloud engineers
  • Drive the planning, execution, integration, and operationalization of HPC projects
  • Ensure delivery quality, cost efficiency, and alignment with enterprise cloud strategy

Operations & Platform Management

  • Oversee cluster operations, job scheduling (Slurm), compute scaling, patching, and incident response
  • Implement best practices for observability using CloudWatch, Prometheus, Grafana, and logging frameworks
  • Ensure high availability, reliability, and performance of HPC workloads and underlying infrastructure

Automation & CI/CD

  • Build infrastructure using Terraform and CloudFormation with fully automated IaC delivery pipelines
  • Deploy cloud-native automation for cluster lifecycle management, cost optimization, and environment provisioning

Innovation & Emerging Technologies

  • Lead PoCs to evaluate new HPC frameworks, containerization strategies (Docker, Singularity), and workflow engines (Nextflow, Cromwell)
  • Recommend architectural enhancements and modernization approaches

Stakeholder Engagement

  • Communicate architecture, progress, risks, and recommendations to technical and non‑technical stakeholders
  • Collaborate with scientific computing, research, security, and enterprise architecture teams
  • Act as a trusted advisor to business partners on HPC and cloud-enabled computing

Technical Skills & Competencies

Cloud & HPC Expertise

  • Deep experience designing HPC systems on AWS (ParallelCluster, Batch, EC2 Spot, FSx for Lustre)
  • Strong Linux administration and troubleshooting skills
  • Expertise in parallel computing technologies (MPI, OpenMP), job schedulers (Slurm), and distributed systems

Automation & DevOps

  • Strong Terraform and CloudFormation experience
  • Hands-on CI/CD experience (GitHub Actions, GitLab CI, Jenkins)
  • Experience managing multi-account AWS org structures

Performance & Optimization

  • Skilled in tuning compute, storage, and network performance for HPC workloads
  • Strong knowledge of cost optimization strategies for large cluster deployments

Communication & Collaboration

  • Ability to simplify complex HPC/cloud architectures for broader audiences
  • Strong cross-functional influence and stakeholder management skills

Leadership Expectations

  • Proven ability to lead onsite/offshore teams, mentor engineers, and guide architecture decisions
  • Able to work across regions, functions, and cultures
  • Excellent written and verbal communication skills
  • Demonstrates a mindset of diversity, inclusion, and continuous learning
  • Inspires collaboration, accountability, and innovation

Decision-Making & Autonomy

  • Makes high-impact architectural and operational decisions independently
  • Incorporates diverse stakeholder input to develop robust solutions
  • Drives rapid and high-quality implementation of technical strategies
  • Accountable for architecture governance, delivery quality, and risk mitigation

Interaction & Influence

  • Represents HPC function in customer meetings, architecture committees, and design reviews
  • Builds strong partnerships with internal teams, affiliates, and external vendors
  • Navigates change effectively and supports organizational transformation efforts

Innovation

  • Challenges legacy designs and introduces new technologies for performance, automation, and cost efficiency
  • Identifies emerging trends in cloud HPC and applies them to business needs
  • Continuously seeks opportunities to enhance reliability, scalability, and scientific throughput

Complexity

  • Operates within a high-complexity global environment with diverse scientific and cloud requirements
  • Requires deep subject matter expertise and the ability to consider enterprise‑wide impacts

Education & Qualifications

  • Bachelor's degree in computer science, Computational Science, Engineering, or related field (required)
  • Master's Degree (preferred)
  • Preferred certifications:
    • AWS Solutions Architect - Professional
    • AWS Advanced Networking / Data Engineering Specialty
    • Linux or HPC-specific certifications (optional)

Experience Requirements

  • 8-10+ years in cloud engineering, HPC operations, or scientific computing
  • 5+ years architecting HPC workloads on AWS
  • 3+ years leading distributed teams
  • Experience with scientific research environments is a plus (chemistry, biology, genomics, material science)

Work Model

  • Onsite role with daily collaboration with client researchers and engineering teams
  • Coordinates and leads a 10-person offshore engineering team

Skills

AWS BatchAWS ParallelClusterCloudFormationCloudWatchDockerEC2 SpotEFSFSx for LustreGrafanaGitHub ActionsGitLab CIHPCJenkinsLinuxMPINextflowOpenMPPrometheusS3SingularitySlurmTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free