System Reliability Engineer

arculus

München · flexible Full-time Mid Level 2mo ago

About the role

About us

At arculus, we design, build, and maintain cutting‑edge autonomous mobile robots and the software ecosystem around them. Our Development department brings together software, infrastructure, and product experts in a collaborative, international environment, focused on delivering reliable and high‑quality products that make a real difference in intralogistics.

Your Role

As a System Reliability Engineer, you will be responsible for ensuring the stability, performance, and scalability of our Automation Software platform. Your mission begins with a strong focus on the “Now”: building robust monitoring, automation, and operational practices that keep our systems reliable under real‑world conditions.

Operating at the intersection of software development and operations, you will proactively prevent incidents, optimize system behavior, and enable fast, reliable service delivery. By aligning reliability engineering with product and architectural goals, you will ensure our systems meet critical KPIs such as uptime, latency, and deployment velocity across the entire lifecycle.

Your Tasks & Responsibilities

Design and operate monitoring, alerting, and incident response systems to ensure high availability
Define and manage SLIs, SLOs, and SLAs; proactively mitigate reliability, performance, and capacity risks
Automate deployments, scaling, and operational workflows; implement infrastructure as code and self‑healing patterns
Optimize CI/CD pipelines for faster, safer, and more reliable releases
Lead or support incident response, root cause analysis, and post‑mortems; translate findings into preventive measures
Collaborate with architects, developers, and product teams to ensure scalable, reliable system design
Review system changes for operational, performance, and reliability impact
Support capacity planning, performance benchmarking, and scaling strategies
Contribute to security monitoring and ensure secure system operations
Drive continuous improvement in observability, reliability, and operational efficiency

Your Experience

3+ years in Site Reliability Engineering, DevOps, or similar roles in production environments
Proven experience improving system reliability, reducing downtime, and enhancing deployment processes
Strong expertise in cloud platforms (AWS, GCP, Azure) and Kubernetes
Hands‑on experience with observability tools (Prometheus, Grafana, ELK stack)
Solid scripting and automation skills (e.g., Python, Bash)
Experience operating and scaling distributed systems in large production environments
Familiarity with CI/CD pipelines, infrastructure as code, and modern DevOps practices

Who You Are

Passionate about building reliable, scalable, and observable systems
Strong communicator, able to collaborate effectively across engineering, product, and operations teams
Proactive and solution‑oriented, with a strong sense of ownership and accountability
Analytical and structured thinker with a focus on continuous improvement
Comfortable working in fast‑paced, complex environments with evolving system landscapes
Motivated to ensure technical excellence translates into stable and high‑performing real‑world systems

WHY ARCULUS

We are a diverse, global team of 100+ creative thinkers, algorithmic brains, makers, movers, and shakers.
Our approach comes from a continuous cycle: assemble, weld, code, test, deploy or delete, and repeat. That is how we deliver innovative solutions to tackle the biggest intralogistics challenges.
Our tech space is located in the eastern region of Munich, featuring state‑of‑the‑art meeting rooms, a fully‑equipped electronics lab, and a spacious robotics testing area, plus various social spaces on the modern Neue Balan campus.
We are more than just a workplace: we are a community. Activities include hiking trips, running events, ping‑pong tournaments, and quiz nights.
Competitive salaries and benefits such as EGYM Wellpass, language courses, Jobrad, and flexible working hours.
Relocation and visa support are provided for candidates moving to join our team.

ABOUT THE COMPANY

arculus is a part of Jungheinrich and independently develops high‑end mobile robots and software products for intralogistics automation. From mechanics to electronics and code – our engineering powerhouse has it all. We combine the speed and creativity of an agile tech company with the strength of a leading global intralogistics player. Collaboration, innovation, and continuous learning: that is how we achieve an open‑minded and fast‑paced working culture.

COMMITTED TO DIVERSITY AND INCLUSION

We are an equal opportunity employer and highly value diversity and inclusivity, which we see as strengths. While we are making progress, we are not yet where we want to be. Still, we believe in the power of a diverse workforce and welcome applicants of all genders, ethnicities, ages, national origins, sexual orientations, cultures, and educational backgrounds. Our goal is to create a work culture where everyone feels equally heard and included.

Skills

AWSAzureBashCI/CDELK stackGCPGrafanaKubernetesPrometheusPython

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free