Skip to content
mimi

Senior / Staff DevOps & Site Reliability Engineer

Superscale

Baunatal · Hybrid Senior 2w ago

About the role

About the Role

We're scaling fast — and we want to do it without the chaos that usually comes with it.

As our Senior/Staff DevOps & Site Reliability Engineer, you'll own the infrastructure that powers Superscale's AI platform. But this isn't a traditional "keep the lights on" SRE role. You'll be building an infrastructure layer designed for a new kind of engineering team: one where every developer works alongside multiple AI coding agents, and the infra itself is a force multiplier.

You'll be our first dedicated infrastructure hire, which means you get to set the standard — from observability and incident response to CI/CD pipelines and cloud architecture. You'll make sure we scale smoothly as load, team size, and AI workloads grow, and you'll be the counterpart engineers rely on to ship systems that are resilient from day one.

We believe in hiring for breadth and building leverage through AI tooling. We're not growing the team by stacking people in the same roles — we're hiring unique skill sets and amplifying everyone through best-in-class infrastructure and AI-native workflows. You'll be central to making that philosophy real.

Key Responsibilities

  • Own and evolve our AWS infrastructure: containerized services, networking, security, and cost optimization — building toward a setup that scales with both user load and AI workloads
  • Design and implement state-of-the-art monitoring, alerting, and observability with Datadog (no more "is this broken for everyone?" Slack messages — you'll know before anyone asks)
  • Build proactive systems for incident detection and response — shifting the team from reactive firefighting to confident, data-informed operations
  • Architect and deploy infrastructure for AI-native development: cloud-based coding agent environments where multiple agents per developer can build, test, and deploy in parallel
  • Prepare our infrastructure for AI-specific load patterns: bursty GPU/LLM workloads, intelligent request routing, and cost-efficient scaling strategies
  • Create a developer platform that treats coding agents as first-class citizens — giving them access to the same data, tools, secrets, and deployment pipelines that human engineers use
  • Design CI/CD pipelines and deployment workflows that are fast, reliable, and safe — optimized for high-frequency pushes from both humans and agents
  • Partner with the engineering team to build systems that are scaling- and future-proof from the architecture level, not patched after the fact
  • Establish infrastructure-as-code practices, documentation, and runbooks that make the whole team more autonomous

Requirements

  • 5+ years of experience in DevOps, SRE, or platform engineering, with deep hands-on AWS expertise
  • Strong experience with container orchestration (ECS or Kubernetes), infrastructure-as-code (Terraform, Pulumi), and modern CI/CD systems (e.g GitHub Actions)
  • Proven track record of building observability stacks (Datadog, Grafana, Prometheus, CloudWatch, or similar) that actually prevent incidents, not just log them
  • Experience designing infrastructure for service-oriented architectures with relational databases and modern web frontends (Next.js experience is a plus)
  • You understand load balancing, auto-scaling, and cost optimization at a level where you can make real architectural trade-offs
  • Security-minded: you bake in least-privilege access, secrets management, and network segmentation without making developers hate their lives
  • AI-native working style: you actively use LLMs, coding agents, and automation tools in your own workflow. We're building toward 10x coding agents per developer — you'll be the one making that infrastructure possible
  • Strong communicator who can translate infrastructure decisions into language the product and engineering teams understand

Nice to Have

  • Experience building developer platforms or internal tooling that improved team velocity measurably
  • Background in managing AI/ML infrastructure: GPU scheduling, model serving, LLM gateway/proxy setups
  • Experience at an early-stage startup where you built infra foundations that lasted through 10x growth
  • Contributions to open-source infrastructure or DevOps tooling

What We Offer

  • Competitive salary and equity/stock options in a high-growth AI company
  • Flexible remote or hybrid work arrangement
  • Generous paid time off and company holidays
  • Professional development budget for conferences, courses, and certifications
  • Greenfield opportunity — you're setting the infrastructure standard
  • A team that values horizontal skill over narrow specialization, and is investing in AI tooling and agent infrastructure
  • Direct, visible impact — every engineer and every AI agent on the team will feel the quality of what you build

How to Apply

Please send your application to magnus@superscale.ai with your LinkedIn / GitHub profile and a short note on why this role excites you and what you'd change about our setup on day one.

We are an equal opportunity employer and welcome candidates of all backgrounds.

Requirements

  • 5+ years of experience in DevOps, SRE, or platform engineering, with deep hands-on AWS expertise
  • Strong experience with container orchestration (ECS or Kubernetes), infrastructure-as-code (Terraform, Pulumi), and modern CI/CD systems (e.g GitHub Actions)
  • Proven track record of building observability stacks (Datadog, Grafana, Prometheus, CloudWatch, or similar) that actually prevent incidents, not just log them
  • Experience designing infrastructure for service-oriented architectures with relational databases and modern web frontends (Next.js experience is a plus)
  • You understand load balancing, auto-scaling, and cost optimization at a level where you can make real architectural trade-offs
  • Security-minded: you bake in least-privilege access, secrets management, and network segmentation without making developers hate their lives
  • AI-native working style: you actively use LLMs, coding agents, and automation tools in your own workflow. We're building toward 10x coding agents per developer — you'll be the one making that infrastructure possible
  • Strong communicator who can translate infrastructure decisions into language the product and engineering teams understand

Responsibilities

  • Own and evolve our AWS infrastructure: containerized services, networking, security, and cost optimization — building toward a setup that scales with both user load and AI workloads
  • Design and implement state-of-the-art monitoring, alerting, and observability with Datadog (no more "is this broken for everyone?" Slack messages — you'll know before anyone asks)
  • Build proactive systems for incident detection and response — shifting the team from reactive firefighting to confident, data-informed operations
  • Architect and deploy infrastructure for AI-native development: cloud-based coding agent environments where multiple agents per developer can build, test, and deploy in parallel
  • Prepare our infrastructure for AI-specific load patterns: bursty GPU/LLM workloads, intelligent request routing, and cost-efficient scaling strategies
  • Create a developer platform that treats coding agents as first-class citizens — giving them access to the same data, tools, secrets, and deployment pipelines that human engineers use
  • Design CI/CD pipelines and deployment workflows that are fast, reliable, and safe — optimized for high-frequency pushes from both humans and agents
  • Partner with the engineering team to build systems that are scaling- and future-proof from the architecture level, not patched after the fact
  • Establish infrastructure-as-code practices, documentation, and runbooks that make the whole team more autonomous

Benefits

equity/stock optionspaid time offcompany holidaysprofessional development budget

Skills

AWSCloudWatchDatadogECSGitHub ActionsGrafanaKubernetesNext.jsPrometheusPulumiTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free