Site Reliability Engineer

Beyond-ED

Remote (Global) Senior 1mo ago

About the role

We’re hiring a Senior Site Reliability Engineer to build and scale the reliability backbone of a leading GPU-powered platform.

Degree in Computer Science or a related discipline or equivalent practical experience / solid proof of expertise.
4+ years of software development experience in one or more languages (Go ideal; Rust/Python)
4+ years designing, analyzing, and troubleshooting distributed systems and production services.
Proficiency in debugging, profiling, and performance tuning of large-scale Linux systems.
Experience with Kubernetes (or similar schedulers), containerized services, and IaC (Terraform/Pulumi/CloudFormation).
Experience with observability (metrics, logs, traces), progressive delivery (canary/blue green), and incident management.
Track record of OSS contributions.
Linux internals, networking, and kernel/perf tooling.
Exposure to hypervisors (KVM/) or virtual machine introspection concepts.
Knowledge of GPU architectures and CUDA programming.
Cybersecurity experience (runtime security, hardening, secrets management).
Building distributed systems on Kubernetes and high-throughput data pipelines (e.g., Kafka/Redpanda/Fluent Bit).
Experience with multi-cloud operations, cost/perf optimization, and compliance-minded engineering.

Build and maintain systems that keep the platform stable, fast, and always available
Automate repetitive operational tasks to reduce manual work and human errors
Monitor system performance and set clear reliability targets (uptime, response time, etc.)
Detect issues early and respond quickly to incidents to minimize downtime
Work closely with engineering teams to improve system design, scalability, and efficiency
Optimize infrastructure performance and cost across cloud environments
Improve deployment processes to make releases safer and smoother
Contribute to building internal tools that help teams operate systems more efficiently
Continuously enhance system reliability, performance, and security

Developers and volunteers contributing to open-source libraries related to Linux environments

Python, GoLang, Linux, Terraform, Rust, kernel, Cloud Architecture, DevOps, Backend, Kubernetes, Security, SRE

Cloud ArchitectureCUDADevOpsFluent BitGoGoLangGPUIaCKafkakernelKubernetesLinuxNetworkingPulumiPythonRedpandaRustSecuritySRETerraform

Randstad USA

STARC

€3k – €6k/mo

Randstad Digital

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.