Site Reliability Engineer
Groupe Talents Handicap
About the role
About Eviden
Eviden is the market leader in Europe in the server and supercomputer segments, recognized for its innovations in artificial intelligence, cybersecurity, and quantum computing. Our clients use our high-performance computers (HPC) for crucial projects such as climate change studies, vaccine research, decarbonization, and scientific simulations.
About the Role
The Software Factory team, at the heart of the HPC & AI R&D division, is responsible for the development and operation (DevOps) of a complete continuous integration and continuous delivery (CI/CD) stack for the software development teams in charge of HPC and AI products. Currently, our platform manages over 500 builds per week on a hybrid infrastructure, combining public cloud and an internal lab, ensuring fully automated software production. We are looking for a Site Reliability Engineer, who will play a key role in managing and optimizing our infrastructure.
Your missions:
- Ensure system monitoring and guarantee the proper functioning of the lab infrastructure and HPC & AI clusters;
- Install, update, and configure software, firmwares, and hardware;
- Evolve system and infrastructure architectures to integrate new hardware;
- Maintain a state-of-the-art international development infrastructure;
- Improve the platform's SLA.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free