BM
Platform / DevOps Engineer (WebRTC, Edge + Cloud)
Blue Machines AI
India · Hybrid Full-time Senior Today
About the role
About the role
We’re building and operating a LiveKit-like real-time communications platform (WebRTC) that must scale to millions of calls with edge PoPs for ultra-low latency and multi-region cloud reliability. This is a hands‑on, high‑ownership role focused on production systems, performance, and resilience. We’re especially interested in engineers who’ve seen scale in real-time/streaming infra.
What you’ll do
- Own reliability and performance of signaling, SFU/media nodes, TURN, routing, failover, and capacity planning
- Build and run multi‑region Kubernetes platforms with secure networking and zero‑downtime deployments
- Design edge + cloud architecture: PoPs, global routing, failover, autoscaling, DR
- Implement SLOs/SLIs, incident response, postmortems, and operational excellence
- Create strong observability: metrics, logs, tracing, and real‑time QoE/latency metrics
- Ship Infrastructure‑as‑Code and automation: Terraform, Helm, GitOps, CI/CD
Required skills
- Strong production experience with Kubernetes at scale (multi‑cluster/multi‑region)
- Strong Linux + networking fundamentals (UDP/TCP, NAT, conntrack, DNS, load balancing)
- Experience with IaC + delivery: Terraform, Helm, GitOps (ArgoCD/Flux), CI/CD
- Proven on‑call ownership for high‑availability systems
Nice to have
- WebRTC/RTC operations: ICE, STUN/TURN, SFU scaling, packet loss/jitter tuning
- Edge/PoP and traffic management experience (global routing, Anycast/DNS strategies)
- Cost optimization for bandwidth‑heavy workloads
- Experience operating realtime/streaming systems at very high concurrency
What success looks like
- You can keep a real‑time system stable through traffic spikes, packet loss, ISP variability, zone/region failures
- You think in terms of latency budgets, concurrency, bandwidth, packets/sec, not just pods and nodes
- You build platforms that are observable, automatable, and easy to operate
Requirements
- Strong production experience with Kubernetes at scale (multi-cluster/multi-region)
- Strong Linux + networking fundamentals (UDP/TCP, NAT, conntrack, DNS, load balancing)
- Experience with IaC + delivery : Terraform, Helm, GitOps (ArgoCD/Flux), CI/CD
- Proven on-call ownership for high-availability systems
Responsibilities
- Own reliability and performance of signaling, SFU/media nodes, TURN , routing, failover, and capacity planning
- Build and run multi-region Kubernetes platforms with secure networking and zero-downtime deployments
- Design edge + cloud architecture: PoPs, global routing, failover, autoscaling, DR
- Implement SLOs/SLIs , incident response, postmortems, and operational excellence
- Create strong observability : metrics, logs, tracing, and real-time QoE/latency metrics
- Ship Infrastructure-as-Code and automation: Terraform, Helm, GitOps, CI/CD
Skills
ArgoCDCI/CDDNSFluxGitOpsHelmIaCKubernetesLinuxNATNetworkingTerraformTCPUDPWebRTC
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free