DevOps / Infrastructure Engineer | Berlin, DE | Home-Office
enneo GmbH
About the role
About
Enneo is an AI-native customer service platform that automates customer interactions across voice, chat, email, and messaging channels. By combining modern AI models with omnichannel automation, our platform helps companies handle customer requests faster, more efficiently, and with higher service quality.
The company was founded in 2022 by Dr. Kyung-Hun Ha and Dr. Richard Lohwasser, who previously built and exited the neo-utility company Lition Energy and spent 25+ years in total transforming and digitizing customer service operations within large-scale utilities. After experiencing the operational complexity of customer service and absence of tailored solutions firsthand, they set out to build a platform designed around the possibilities and capabilities of AI for truly intelligent automation.
Today, Enneo serves several of Germany's largest utility companies and partners with leading and emerging business process outsourcing providers and system integrators. Our platform processes millions of customer interactions every day and enables service teams to automate complex routine tasks, reduce response times, and deliver better customer experiences.
Our mission is to become the leading platform for AI-powered agentic customer service across the DACH region, helping companies modernize their operations while improving experiences for both customers and service teams.
Responsibilities
What You'll Do
- Keep the Lights On - and Make Them Brighter: You manage and evolve our production infrastructure: Kubernetes orchestration, bare metal servers for customer-facing workloads, cloud servers for internal services. You ensure uptime, performance, and security across the board.
- Build the Deployment Pipeline: You own CI/CD, release processes, and infrastructure-as-code. You make deployments boring - predictable, automated, and rollback-safe. When something goes wrong in production, you're the one who finds it, fixes it, and builds the system to prevent it from happening again.
- Observability & Incident Response: You expand and maintain our observability stack (SigNoz) so the team can debug fast and spot issues before customers do. You orchestrate LLM agents so they build dashboards, alerts, and tracing that give everyone confidence in what's running.
- Scale the AI Engine: Our platform routes large volumes of LLM requests (>1 bn tokens per day and growing) and relies on OpenClaw for analytics and development assistance. You ensure this layer performs reliably under load, optimize costs, and help the team iterate on AI infrastructure without friction.
- Automate Everything You Can: We run a lot of our infrastructure management through Python-based automations and tooling. You'll build, maintain, and improve these - and increasingly leverage agentic development patterns to make our operations smarter and more self-healing. We grow fast - and with the click of a button we should be able to do so.
- Shape the Stack: We're transparent about what we use - check out our full stack at stackshare.io/companies/enneo. You'll have a direct voice in evolving it as we grow.
Requirements
Must-Haves:
- Strong hands-on experience with Kubernetes - not just deploying into it, but managing, troubleshooting, and scaling self-hosted clusters in production.
- Solid Linux systems knowledge and experience managing both cloud and bare metal server environments.
- Proficiency in Python for automation, scripting, and building internal tooling. This is not optional - Python is how we manage our infra.
- Familiarity with agentic development workflows. You understand how AI-assisted tooling and automation can be applied to infrastructure work, and you're already using it.
- Experience building and maintaining CI/CD pipelines, infrastructure-as-code, and automated deployment workflows.
- Experience managing and operating production databases - in particular MySQL in redundant/replicated setups. Familiarity with vector databases (we use Weaviate) is a strong plus.
- Strong understanding of observability: metrics, logging, tracing using the OpenTelemetry standard.
- A true startup mindset: you're comfortable in environments where not everything is documented, things change fast, and your decisions have immediate, real impact. You don't wait for a ticket - you see what needs to happen and you do it.
Nice-to-Haves:
- Backend Software development background
- Experience with LLM infrastructure - request routing, token management, cost optimization (e.g., LiteLLM or similar proxies).
- Experience with incident response processes, on-call rotations, and post-mortems.
- Familiarity with networking, DNS, load balancing, and CDN configuration at a production level.
Application Process
We aim to keep the process efficient:
- Intro call (30 minutes): A short conversation with our team to get to know each other, discuss your background, and answer your first questions about the role and the company.
- Technical discussion (60 minutes): A technical conversation with our engineering team. We will discuss infrastructure topics such as Kubernetes operations, production debugging, CI/CD, and system architecture. The goal is to understand how you think about real systems rather than to run algorithm interviews.
- Founder conversation & live coding (30-45 minutes): Co-working session and final conversation with one of our founders to discuss the role, expectations, and how you would shape our infrastructure as we scale.
Requirements
- Strong hands-on experience with Kubernetes - managing, troubleshooting, and scaling self-hosted clusters in production.
- Solid Linux systems knowledge and experience managing both cloud and bare metal server environments.
- Proficiency in Python for automation, scripting, and building internal tooling.
- Familiarity with agentic development workflows.
- Experience building and maintaining CI/CD pipelines, infrastructure-as-code, and automated deployment workflows.
- Experience managing and operating production databases - in particular MySQL in redundant/replicated setups.
- Strong understanding of observability: metrics, logging, tracing using the OpenTelemetry standard.
- Comfortable in environments where not everything is documented, things change fast, and decisions have immediate, real impact.
Responsibilities
- Manage and evolve production infrastructure: Kubernetes orchestration, bare metal servers for customer-facing workloads, cloud servers for internal services.
- Own CI/CD, release processes, and infrastructure-as-code.
- Expand and maintain observability stack (SigNoz) so the team can debug fast and spot issues before customers do.
- Ensure the AI engine layer performs reliably under load, optimize costs, and help the team iterate on AI infrastructure without friction.
- Build, maintain, and improve Python-based automations and tooling for infrastructure management.
- Have a direct voice in evolving the stack as the company grows.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free