Skip to content
mimi

Site Reliability Engineer

Beyond-ED

Remote (Global) Senior Today

About the role

About

We’re hiring a Senior Site Reliability Engineer to build and scale the reliability backbone of a leading GPU-powered platform.

Job Requirements

  • Degree in Computer Science or a related discipline or equivalent practical experience / solid proof of expertise.
  • 4+ years of software development experience in one or more languages (Go ideal; Rust/Python)
  • 4+ years designing, analyzing, and troubleshooting distributed systems and production services.
  • Proficiency in debugging, profiling, and performance tuning of large-scale Linux systems.
  • Experience with Kubernetes (or similar schedulers), containerized services, and IaC (Terraform/Pulumi/CloudFormation).
  • Experience with observability (metrics, logs, traces), progressive delivery (canary/blue green), and incident management.
  • Track record of OSS contributions.
  • Linux internals, networking, and kernel/perf tooling.
  • Exposure to hypervisors (KVM/) or virtual machine introspection concepts.
  • Knowledge of GPU architectures and CUDA programming.
  • Cybersecurity experience (runtime security, hardening, secrets management).
  • Building distributed systems on Kubernetes and high-throughput data pipelines (e.g., Kafka/Redpanda/Fluent Bit).
  • Experience with multi-cloud operations, cost/perf optimization, and compliance-minded engineering.

Responsibilities

  • Build and maintain systems that keep the platform stable, fast, and always available
  • Automate repetitive operational tasks to reduce manual work and human errors
  • Monitor system performance and set clear reliability targets (uptime, response time, etc.)
  • Detect issues early and respond quickly to incidents to minimize downtime
  • Work closely with engineering teams to improve system design, scalability, and efficiency
  • Optimize infrastructure performance and cost across cloud environments
  • Improve deployment processes to make releases safer and smoother
  • Contribute to building internal tools that help teams operate systems more efficiently
  • Continuously enhance system reliability, performance, and security

Preferred

  • Developers and volunteers contributing to open-source libraries related to Linux environments

Candidate Background

  • Only Computing Background

Location

  • Fully Remote

Job Level

  • Senior

Talent Country

  • Egypt

Technologies

  • Python, GoLang, Linux, Terraform, Rust, kernel, Cloud Architecture, DevOps, Backend, Kubernetes, Security, SRE

Requirements

  • 4+ years of software development experience in one or more languages (Go ideal; Rust/Python)
  • 4+ years designing, analyzing, and troubleshooting distributed systems and production services.
  • Proficiency in debugging, profiling, and performance tuning of large-scale Linux systems.
  • Experience with Kubernetes (or similar schedulers), containerized services, and IaC (Terraform/Pulumi/CloudFormation).
  • Experience with observability (metrics, logs, traces), progressive delivery (canary/blue green), and incident management.
  • Linux internals, networking, and kernel/perf tooling.
  • Exposure to hypervisors (KVM/) or virtual machine introspection concepts.
  • Knowledge of GPU architectures and CUDA programming.
  • Cybersecurity experience (runtime security, hardening, secrets management).
  • Building distributed systems on Kubernetes and high-throughput data pipelines (e.g., Kafka/Redpanda/Fluent Bit).
  • Experience with multi-cloud operations, cost/perf optimization, and compliance-minded engineering.

Responsibilities

  • Build and maintain systems that keep the platform stable, fast, and always available
  • Automate repetitive operational tasks to reduce manual work and human errors
  • Monitor system performance and set clear reliability targets (uptime, response time, etc.)
  • Detect issues early and respond quickly to incidents to minimize downtime
  • Work closely with engineering teams to improve system design, scalability, and efficiency
  • Optimize infrastructure performance and cost across cloud environments
  • Improve deployment processes to make releases safer and smoother
  • Contribute to building internal tools that help teams operate systems more efficiently
  • Continuously enhance system reliability, performance, and security

Skills

Cloud ArchitectureCUDADevOpsFluent BitGoGoLangGPUIaCKafkakernelKubernetesLinuxNetworkingPulumiPythonRedpandaRustSecuritySRETerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free