Staff Storage Reliability Engineer (w/m/d)

IONOS SE

Montabaur · Hybrid Full-time Lead 3mo ago

About the role

About Us

IONOS is the leading European digitalization partner for small and medium-sized enterprises (SMEs). IONOS has over six million customers and is active in 18 markets in Europe and North America with a globally available platform. With its Web Presence & Productivity offerings, the company acts as a "one-stop-shop" for all digitalization needs – from domains and web hosting to classic website builders and do-it-yourself solutions, from e-commerce to online marketing tools. In addition, IONOS offers cloud solutions for companies that want to move to the cloud as part of their business development.

Role

Staff Storage Reliability Engineer for our global, public object storage platform based on Ceph. You will develop solutions that scale in production and work with our operations team to build, deploy, and maintain the platforms.

Currently, we are in the double-digit petabyte range, distributed across multiple locations. Our Ceph platform is growing rapidly and is a critical component of our internal and public infrastructure. You will actively contribute to further development, improve and maintain our production environments, and ensure that availability, performance, and security are maintained during scaling.

Responsibilities

Ceph Deployment and Management:

Deploy, configure, and operate Ceph clusters – including managing storage pools, placement groups, and other core components.

Performance Optimization:

Tune Ceph for optimal performance, eliminate bottlenecks, and ensure efficient resource utilization.

Automation:

Develop and implement automation strategies for Ceph deployments, upgrades, and maintenance tasks.

Troubleshooting and Problem Solving:

Diagnose and resolve complex technical issues related to Ceph storage – often in collaboration with other teams.

Collaboration:

Collaborate closely with development teams, system administrators, and other stakeholders to integrate Ceph into various systems and applications.

Staying Up-to-Date:

Follow the latest Ceph developments, new features, and best practices.
Actively participate in the Ceph community and share knowledge.

Qualifications

5+ years of experience as a Senior Linux Engineer or Site Reliability Engineer; deep and broad understanding of Linux systems and networks.
In-depth knowledge of Ceph architecture and administration.
Experience with cloud storage technologies (file, object, block).
Hands-on experience with automation tools (e.g., Ansible) and monitoring and observability solutions.
Familiarity with cloud platforms and container technologies (e.g., Docker).
Excellent troubleshooting and problem-solving skills, strong communication and collaboration skills.

Benefits

Hybrid work model.
Flexible working hours with trust-based working time.
Subsidized canteen and various free drinks at some locations.
Modern office spaces with excellent public transport connections.
Various employee discounts for activities and products.
Employee events such as summer and winter parties, as well as workshops.
Numerous further training and development opportunities.
Various health offers, such as sports and health courses.

Skills

AnsibleCephDockerLinuxObject StorageSite Reliability Engineering

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free