Systems Reliability Engineer

Claryo

New York · On-site Full-time Mid Level $150k – $170k/yr 1mo ago

About the role

We’re looking for a Systems Reliability Engineer to own the reliability of our system across cloud, edge, and real-world environments. Our platform runs across distributed infrastructure—connecting cloud services, on-site compute, and live video/data pipelines inside warehouses. This role is responsible for making systems observable, diagnosable, and repeatable as we scale across deployments. You’ll work closely with engineering and deployment teams to ensure the system performs reliably in production—not just in ideal conditions.

What You’ll Own

Own reliability of systems across cloud (Kubernetes), edge compute, and on-site deployments
Build and maintain monitoring, alerting, and observability systems
Define and improve incident response, severity levels, and on-call processes
Improve deployment and bring-up workflows across facilities
Diagnose issues across infrastructure, networking, and distributed systems
Partner with engineering to identify root causes and prevent recurring issues
Improve system visibility, debugging, and operational tooling
Help make deployments repeatable and scalable across sites

Required Qualifications

3+ years of experience in SRE, infrastructure, or distributed systems
Strong Linux and networking fundamentals
Experience operating systems in production environments
Experience working with networking in constrained or distributed environments (e.g., VPNs, secure tunnels, on-site networking)
Experience with:
- Kubernetes and containerized systems
- Cloud platforms (GCP, AWS, or Azure)
- Observability tools (Prometheus, Grafana, OpenTelemetry, etc.)
Ability to debug issues across multiple layers of the stack (infra services network)
Comfortable working in real-world, imperfect environments (not just clean cloud systems)
Strong ownership and ability to drive issues to resolution

Preferred Qualifications

Experience with multi-site or edge deployments
Experience with event-driven systems (Kafka or similar)
Familiarity with video or streaming systems (RTSP, WebRTC)
Experience working with hardware-integrated systems
Exposure to security/compliance frameworks (SOC2, ISO27001, etc.)
US citizen/ permanent resident
Located in SFBAY or NY area

Why This Role Matters

We’re scaling from a small number of deployments to many, and this role is critical to making the following happen:

Systems that work outside ideal environments
Fast, reliable diagnosis and recovery when things break
Repeatable deployments across real-world facilities

Equal Opportunity Statement

We’re an equal opportunity employer that values diversity and inclusion. We welcome teammates of all backgrounds and don’t discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.

Benefits

At Claryo, we offer a competitive benefits package that supports your health and well-being, including — top-tier medical, dental, and vision coverage, 401k with employer matching, equity, parental leave, and unlimited vacation.

Compensation Range

$150K - $170K

Skills

AWSAzureGCPGrafanaKubernetesLinuxOpenTelemetryPrometheus

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Systems Reliability Engineer

About the role

What You’ll Own

Required Qualifications

Preferred Qualifications

Why This Role Matters

Equal Opportunity Statement

Benefits

Compensation Range

Skills

Similar roles

Senior Database Engineer

Team Leads

Software Engineer (Rust)

Don't send a generic resume