Research Engineer In Open-Source Machine Learning
Eindhoven University of Technology (TU/e)
About the role
About OpenML
OpenML is a popular open science platform for sharing interconnected AI artifacts (e.g. datasets, models, and benchmarks) using open standards and structured APIs. Hosted at TU Eindhoven, OpenML serves over 300,000 users, has supported over 1,600 scientific studies, and won the Dutch Data Prize. We are looking for an excellent engineer to significantly redesign it and build the next generation of the OpenML platform.
There are two positions available:
- Position 1: Improve OpenML itself. This position is for 1.5 years, to start as soon as possible.
- Position 2: Integrate OpenML with other data and compute infrastructure. This position is for 3.5 years, to start around June-July 2026.
Position 1: Improve OpenML itself
This project will help strengthen the impact and sustainability of OpenML, by:
- Modernizing Infrastructure to simplify deployment and streamline community contributions.
- Improving Open Data, enriching OpenML metadata with FAIR-aligned elements (e.g. data cards and model cards), supporting diverse data types and modalities, and allowing integration with other open science infrastructures.
- Enhancing User Experience through better interfaces and collaboration features.
- Engaging the Scientific Community, ensuring that the platform meets the evolving needs of the open science community.
You will focus primarily on task 1 and 2, specific responsibilities include:
- Update the technical infrastructure components of OpenML to more modern and contributor-friendly technologies.
- Enrich metadata by building on community standards such as Croissant, including quality metrics, DOI references, data cards and model cards and responsible AI metadata.
- Extend support for more data modalities (e.g. imagery, time series, text, and multi-modal data), and domain-specific open data formats (e.g., genomic data).
- Streamline data loading into AI tools to facilitate AI experimentation in many scientific workflows.
- Update developer tools and documentation to speed up the onboarding of new community contributors and streamline maintenance.
- Close collaboration with another hire for the project based in Leiden University that focuses on task 3 and 4.
Position 2: Integrate OpenML with other data and compute infrastructure
For this position, you will help seamlessly integrate OpenML with other Dutch and international infrastructures (e.g., SURF, NLeSC, Hugging Face) to transform it into a next-generation, unified platform for AI:
- Allow people to access datasets and models uniformly from various hosting infrastructures.
- Easily train and benchmark AI models transparently on scalable computational infrastructure.
- Organize all data and results automatically for easier reuse and better reproducibility.
Specific key tasks include:
- Redesign the OpenML platform architecture to streamline federation.
- Update the OpenML backend and APIs to implement integrations with other APIs (e.g. Hugging Face libraries, Harvard Dataverse) and compute infrastructure.
- Improve experiment tracking from AI tools (e.g., PyTorch), using a tracking API or callbacks, with a focus on global community collaboration and better reproducibility.
- Accelerate AI-driven research: support team-based, real-time collaboration, share reproducible results via permalinks in papers, track impact metrics (e.g., dataset reuse), integrate with code repositories (e.g., GitHub) and publishing platforms (e.g,. ArXiv).
- Continuous infrastructure improvement, incorporating community feedback, improving scalability and simplifying maintenance.
About the Team
Both positions are hosted by the research group on Advanced Models by Open Research & Engineering (AMOR/e), providing an ideal environment for advanced AI research and collaboration. You will work directly with AMOR/e’s engineering team, including Pieter Gijsbers, Subhaditya Mukherjee, and Joaquin Vanschoren.
You will also be supported by project partners in Leiden University, the SURF supercomputing center, and the Dutch e-Science Center. Beyond that, you work together with the wider OpenML community, especially the core contributors, as well as the Croissant community (led by Google and the Open Data Institute), and developers from other AI platforms such as Hugging Face.
For more information on the AMOR/e group, visit https://amore-labs.github.io For more information on OpenML, visit https://www.openml.org A more complete description of the positions is also available on the AMOR/e website.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free