Platform (Site Reliability) Engineering Manager (w/m/d)

Telus Health (previously known as "Humanoo"), trading as eTherapists GmbH

Berlin · Hybrid Lead 3d ago

About the role

About TELUS Health

TELUS Health is empowering every person to live their healthiest life. Guided by our vision to create a healthier future, we are leveraging the power of our cutting-edge technology and focusing on the uniqueness of each individual to create the future of health. As a leading global health and well-being provider – encompassing physical, mental and financial health – TELUS Health is improving health outcomes for consumers, patients, healthcare professionals, employers and employees. TELUS Health supports the total health and well-being of over 35 million lives worldwide with our clinical expertise, global presence and digital well-being platform offered through our Integrated Health Solutions. We empower healthier, happier, and more productive employees by combining our award-winning Employee Assistance Program with proactive wellness solutions in a digital ecosystem that helps them prevent and manage issues in family, health, life, money, and work.

We're seeking a Platform (Site Reliability) Engineering Manager (w/m/d) to join our Engineering team in Berlin. This is a hybrid position, requiring 2 days in the office (Wednesdays & Fridays).

Responsibilities

Lead, develop and grow a team of Site Reliability and Platform Engineers, fostering a culture of ownership and continuous improvement
Define and drive the reliability strategy across services, including SLIs, SLOs and error budgets
Ensure high availability, scalability and performance across multi-region AWS environments
Own and improve incident management processes, on-call practices and operational excellence
Drive automation and reduce operational toil through tooling and standardisation
Partner with Security and Compliance teams to ensure adherence to standards such as GDPR, ISO 27001 and SOC 2
Provide architectural guidance across infrastructure, networking and platform services
Collaborate with engineering, product, data and AI teams to support reliable and scalable systems
Communicate risks, performance metrics and priorities to both technical and non-technical stakeholders

Requirements

Strong experience in Site Reliability Engineering, DevOps or Platform Engineering within AWS environments
Proven experience leading and developing engineering teams
Deep expertise in AWS services (e.g. EC2, S3, RDS, Lambda, VPC, IAM)
Strong knowledge of Infrastructure as Code (Terraform or CloudFormation)
Experience with container orchestration (ECS or EKS)
Solid understanding of distributed systems and reliability engineering principles
Experience designing and maintaining CI/CD pipelines
Strong understanding of networking, security and observability practices
Experience managing incident response and operational processes
Excellent stakeholder management and communication skills
Fluent English

Nice to have

Experience with globally distributed systems and large-scale production environments
Exposure to security incident response and compliance audits
Experience supporting AI/ML infrastructure on AWS
Experience mentoring senior engineers or managers
Relevant certifications (e.g. AWS, Kubernetes, Terraform)

Success in Role (after 6 months)

Established clear reliability standards and SLO frameworks
Team operates effectively with strong ownership and mature on-call practices
Platform reliability, scalability and operational efficiency have measurably improved
Built strong alignment with cross-functional stakeholders and influenced engineering practices beyond your team

Interview Process

If your application progresses positively, you will be invited to participate in our hiring process, which includes the following stages:

Initial screening with recruiter (up to 45 mins)
Technical and leadership interviews (up to 90 mins)
Final stakeholder interview (up to 60 mins)

What We Offer

What we stand for:

Customer First: We passionately put our communities and customers first
Embrace Change: We embrace change and innovate courageously
Grow Together: We grow together through spirited teamwork

A sneak peek into your benefits:

Additional health insurance coverage for 900 EUR per year
30 days of remote work allowance
Up to 33 days annual vacation allowance (28 base days, + yearly tenure bonus up to 3 days, + 2 days for using the Humanoo app)
Subsidy for your UrbanSportsClub membership
70

Requirements

Strong experience in Site Reliability Engineering, DevOps or Platform Engineering within AWS environments
Proven experience leading and developing engineering teams
Deep expertise in AWS services (e.g. EC2, S3, RDS, Lambda, VPC, IAM)
Strong knowledge of Infrastructure as Code (Terraform or CloudFormation)
Experience with container orchestration (ECS or EKS)
Solid understanding of distributed systems and reliability engineering principles
Experience designing and maintaining CI/CD pipelines
Strong understanding of networking, security and observability practices
Experience managing incident response and operational processes
Excellent stakeholder management and communication skills
Fluent English

Responsibilities

Lead, develop and grow a team of Site Reliability and Platform Engineers, fostering a culture of ownership and continuous improvement
Define and drive the reliability strategy across services, including SLIs, SLOs and error budgets
Ensure high availability, scalability and performance across multi-region AWS environments
Own and improve incident management processes, on-call practices and operational excellence
Drive automation and reduce operational toil through tooling and standardisation
Partner with Security and Compliance teams to ensure adherence to standards such as GDPR, ISO 27001 and SOC 2
Provide architectural guidance across infrastructure, networking and platform services
Collaborate with engineering, product, data and AI teams to support reliable and scalable systems
Communicate risks, performance metrics and priorities to both technical and non-technical stakeholders

Benefits

health insuranceremote work allowanceannual vacation allowanceUrbanSportsClub membership subsidy

Skills

AWSAWS CloudFormationAWS EC2AWS EKSAWS ECSAWS IAMAWS LambdaAWS RDSAWS S3AWS VPCCI/CDDevOpsDockerKubernetesObservabilitySite Reliability EngineeringTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Platform (Site Reliability) Engineering Manager (w/m/d)

About the role

About TELUS Health

Responsibilities

Requirements

Nice to have

Success in Role (after 6 months)

Interview Process

What We Offer

Requirements

Responsibilities

Benefits

Skills

Similar roles

Software Developer/Engineer (Freelancer)

Machine Learning Engineer (ML Ops & Pipelines)

Site Reliability Engineer

Don't send a generic resume