Skip to content
mimi

Platform (Site Reliability) Engineering Manager (m/w/d)

Humanoo | eTherapists GmbH

Hybrid Senior 3d ago

About the role

About Telus Health

TELUS Health is empowering every person to live their healthiest life. Guided by our vision to create a healthier future, we leverage cutting‑edge technology and focus on the uniqueness of each individual to create the future of health. As a leading global health and well‑being provider – encompassing physical, mental and financial health – TELUS Health improves health outcomes for consumers, patients, healthcare professionals, employers and employees.

Telus Health supports the total health and well‑being of over 35 million lives worldwide with our clinical expertise, global presence and digital well‑being platform offered through our Integrated Health Solutions. We empower healthier, happier, and more productive employees by combining our award‑winning Employee Assistance Program with proactive wellness solutions in a digital ecosystem that helps them prevent and manage issues in family, health, life, money, and work.

We are seeking a Platform (Site Reliability) Engineering Manager (w/m/d) to join our Engineering team in Berlin. This is a hybrid position, requiring 2 days in the office (Wednesdays & Fridays).


Mission

You will lead and evolve our Platform and Site Reliability Engineering function, ensuring the reliability, scalability, and security of our global services while building and developing a high‑performing team.


Responsibilities

  • Lead, develop and grow a team of Site Reliability and Platform Engineers, fostering a culture of ownership and continuous improvement
  • Define and drive the reliability strategy across services, including SLIs, SLOs and error budgets
  • Ensure high availability, scalability and performance across multi‑region AWS environments
  • Own and improve incident management processes, on‑call practices and operational excellence
  • Drive automation and reduce operational toil through tooling and standardisation
  • Partner with Security and Compliance teams to ensure adherence to standards such as GDPR, ISO 27001 and SOC 2
  • Provide architectural guidance across infrastructure, networking and platform services
  • Collaborate with engineering, product, data and AI teams to support reliable and scalable systems
  • Communicate risks, performance metrics and priorities to both technical and non‑technical stakeholders

Requirements

  • Strong experience in Site Reliability Engineering, DevOps or Platform Engineering within AWS environments
  • Proven experience leading and developing engineering teams
  • Deep expertise in AWS services (e.g., EC2, S3, RDS, Lambda, VPC, IAM)
  • Strong knowledge of Infrastructure as Code (Terraform or CloudFormation)
  • Experience with container orchestration (ECS or EKS)
  • Solid understanding of distributed systems and reliability engineering principles
  • Experience designing and maintaining CI/CD pipelines
  • Strong understanding of networking, security and observability practices
  • Experience managing incident response and operational processes
  • Excellent stakeholder management and communication skills
  • Fluent English

Nice to Have

  • Experience with globally distributed systems and large‑scale production environments
  • Exposure to security incident response and compliance audits
  • Experience supporting AI/ML infrastructure on AWS
  • Experience mentoring senior engineers or managers
  • Relevant certifications (e.g., AWS, Kubernetes, Terraform)

Success Criteria (first 6 months)

  • Established clear reliability standards and SLO frameworks
  • Team operates effectively with strong ownership and mature on‑call practices
  • Platform reliability, scalability and operational efficiency have measurably improved
  • Built strong alignment with cross‑functional stakeholders and influenced engineering practices beyond the team

Interview Process

  1. Initial screening with recruiter (up to 45 mins)
  2. Technical and leadership interviews (up to 90 mins)
  3. Final stakeholder interview (up to 60 mins)

What We Offer

Values

  • Customer First – We passionately put our communities and customers first
  • Embrace Change – We embrace change and innovate courageously
  • Grow Together – We grow together through spirited teamwork

Benefits

  • Additional health insurance coverage for €900 per year
  • 30 days of remote work allowance
  • Up to 33 days annual vacation allowance (28 base days + tenure bonus up to 3 days + 2 days for using the Humanoo app)
  • Subsidy for your UrbanSportsClub membership
  • €70 monthly cashback to cover daily expenses
  • €500 yearly personal learning and development budget
  • Self‑development days allowance (workshops, conferences, training sessions, etc.)
  • Team events and company events
  • Referral bonus for employees
  • Corporate benefits
  • Humanoo for Humanoos
  • Employee Assistance Programme (EAP) for mental well‑being

Please note: our offices close over the Christmas and New Year period for approximately two weeks. We gift the 24th and 31st December as extra vacation days and any additional non‑public holiday days during this time will be taken from your annual leave.


About Us (summary)

As a global‑leading health and well‑being provider – encompassing physical, mental and financial health – TELUS Health improves health outcomes for consumers, patients, healthcare professionals, employers and employees. We provide comprehensive EAP services, offering emotional and practical support to client companies and their employees. Our counsellors deliver face‑to‑face short‑term counselling services to our clients.

Requirements

  • Strong experience in Site Reliability Engineering, DevOps or Platform Engineering within AWS environments
  • Proven experience leading and developing engineering teams
  • Deep expertise in AWS services (e.g., EC2, S3, RDS, Lambda, VPC, IAM)
  • Strong knowledge of Infrastructure as Code (Terraform or CloudFormation)
  • Experience with container orchestration (ECS or EKS)
  • Solid understanding of distributed systems and reliability engineering principles
  • Experience designing and maintaining CI/CD pipelines
  • Strong understanding of networking, security and observability practices
  • Experience managing incident response and operational processes
  • Excellent stakeholder management and communication skills
  • Fluent English

Responsibilities

  • Lead, develop and grow a team of Site Reliability and Platform Engineers
  • Define and drive the reliability strategy across services, including SLIs, SLOs and error budgets
  • Ensure high availability, scalability and performance across multi-region AWS environments
  • Own and improve incident management processes, on‑call practices and operational excellence
  • Drive automation and reduce operational toil through tooling and standardisation
  • Partner with Security and Compliance teams to ensure adherence to standards such as GDPR, ISO 27001 and SOC 2
  • Provide architectural guidance across infrastructure, networking and platform services
  • Collaborate with engineering, product, data and AI teams to support reliable and scalable systems
  • Communicate risks, performance metrics and priorities to both technical and non‑technical stakeholders

Benefits

Additional health insurance coverage (900 EUR per year)30 days remote work allowanceUp to 33 days annual vacation allowance (28 base days + tenure bonus + Humanoo app days)Subsidy for UrbanSportsClub membership70 EUR monthly cashback for daily expenses500 EUR yearly personal learning and development budgetSelf‑development days allowance (workshops, conferences, training)Team events and company eventsReferral bonus for employeesCorporate benefitsHumanoo for HumanoosEmployee Assistance Programme (EAP)

Skills

AWS (EC2, S3, RDS, Lambda, VPC, IAM)Terraform or CloudFormationECS or EKSCI/CD pipelinesNetworkingSecurityObservabilityIncident responseStakeholder managementEnglish

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free