All jobs

Site Reliability Engineer - Observability & Internal Tools

smartclip

On-site Entry Level 6d ago

Apply with a tailored resume Save job

About the role

Your role in the team

• Remote in our day-to-day work. On-site when it matters.

• We work remote by default - focused, efficient, and with full ownership. For larger features, architectural decisions, and real brainstorming sessions, we come together in Berlin or Cologne - fast, hands-on, and without unnecessary meeting overhead.

• We use AI to accelerate - not to replace thinking.

• We design the system, steer the output, and take responsibility for what we ship.

• Fast where it makes sense. Careful where it matters.

• Take full ownership of smartclip's internal utility and platform tooling.

• Focus your energy on the intersection of observability, automation, and developer infrastructure.

• Don't just maintain existing systems - evolve them, research cutting-edge open-source alternatives, and implement them.

• Forget expensive enterprise SaaS. Invest in deep in-house expertise.

• Understand our systems end-to-end, maintain total flexibility, and contribute back to the open-source ecosystem we depend on.

• Build & Evolve: Operate and advance our observability stack (including Prometheus, Grafana, and Forgejo).

• Go Open Source First: Replace 'buy' decisions with robust 'build & maintain' strategies.

• Engineer the Platform: Design observability as a platform capability. Define SLOs and create actionable alerting to stop incidents before they start.

• Secure the Stack: Embed security engineering into the delivery process. Find vulnerabilities before the pen tests do.

• Master the Infrastructure: Navigate Linux systems and distributed tooling. Balance bold exploration with production stability.

What we offer

• Ownership over tickets: You're trusted with real responsibility, not just tasks. No unnecessary bureaucracy, no micromanagement - we rely on you to take things forward.

• Build > Talk: We test what works - not what sounds good. Fail fast, learn faster.

• High standards, low ego: We take our work seriously, but not ourselves. Direct feedback, honest collaboration, no drama.

• Stay sharp: Hackathons, conferences, community - we invest in your growth and keep you at the cutting edge.

• Remote flexibility. In person, when it matters.: You work flexibly remote, with a connection to our Berlin or Cologne locations, where our TV Labs are and we experiment, build, and learn together.

• And yes - the fundamentals are covered too: 30 days of vacation + Dec 24 & 31 off, Smart Fridays (4 days week possible), mobility (Germany ticket & JobRad), sports & health offerings, mental health support, corporate benefits, RTL+ access, and more.

Technologies and skills

• Google Cloud Platform

• Linux

• Prometheus

• Grafana

Our expectations:

Qualifications

• Sei motiviert durch systemisches Denken und eine tiefgehende technische Neugier.

• Stop being a consumer - start being a builder.

• Must-haves: Apply an Observability Mindset: Implement a clear strategy for metrics, logs, and traces. Transform 'noisy alerts' into 'actionable insights.'

• Embrace Ownership: Live the 'you build it, you run it' philosophy. Stop the ticket ping-pong and end the excuses.

• Nice-to-haves: Design and evolve production-grade setups on GCP or AWS.

• Show us your contributions to open-source projects.

• Turn your passion for root-cause analysis into blameless post-mortems.

Benefits

•

• Fitness Offers

• Fresh Fruit

• Jobbike

• Coffee, Tea, etc.

•

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer - Observability & Internal Tools

About the role

Similar roles

OpenClaw Trace Contributor — AI Agent Data

Senior Consultant - Data Science / Data Lake

Senior Cybersecurity Engineer (Controls & Assessment Lead)

Don't send a generic resume