OpenML in research and education

Machine Learning is enabling many modern innovations, and lies at the heart of many empirical, data-driven sciences. Still, building self-learning systems remains something of an art, from gathering and transforming the right data to selecting and fine tuning modeling techniques. This makes it harder for students to study and succeed in the field, and causes (data) scientists to spend a lot of time on trial and error, or settle for suboptimal results.

OpenML is an open science platform for machine learning, allowing anyone to easily share data sets, code, and experiments, and collaborate with people all over the world to build better models. It shows, for any known data set, which are the best models, who built them, and how to reproduce and reuse them in different ways. It is readily integrated into several machine learning environments, so that you can share results with the touch of a button or a line of code. All results that are uploaded are evaluated online and compared to all other results in timelines and leaderboards. All solutions are also open, so that anyone can study previous solutions and build on them. As such, it enables large-scale, real-time collaboration, allowing anyone to explore, build on, and contribute to the combined knowledge of the field.

Ultimately, this provides a wealth of information for a novel, data-driven approach to machine learning, where we learn from millions of previous experiments to assist people while analyzing data and automate some processes altogether. As such we can envision future challenges were humans compete or collaborate with automated processes to build better machine learning models.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Joaquin Vanschoren

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

joaquin

Dr. Ir. Joaquin Vanschoren is assistant professor of machine learning at the Eindhoven University of Technology (TU/e). His research focusses on the progressive automation of machine learning and networked science. He has founded OpenML.org, a platform for networked machine learning research used by researchers all over the world. He obtained several demonstration and application awards and has been invited speaker at ECDA, StatComp, AutoML@ICML, IDA, and several other conferences. He also co-organized machine learning conferences (e.g. ECMLPKDD 2013, LION 2016) and many workshops.