To what extent is it possible to predict the outcome of a soccer match using machine learning?
This question motivated us to organize the 2017 Soccer Prediction Challenge with researchers from KU Leuven (Belgium), the Helmholtz Center (Germany), and University Evry (France). We invited the machine learning community to develop innovative machine learning models for the prediction of soccer match outcomes. For this competition, we released the Open International Soccer Database. The Soccer Prediction Challenge required that the participating teams analyze data about more than 200,000 past soccer matches and predict the outcomes of real, future matches. However, only the most basic match information was provided, including the names and leagues of the home and away teams, seasons, match dates, and the number of goals scored per team. How can such data be used to predict, for example, that Arsenal will beat Manchester City on a certain day in the future with a 0.7 probability of "win," 0.2 probability of "draw," and 0.1 probability of "loss" (i.e., win for Manchester City)?
The main difficulty here was how soccer domain knowledge could be integrated into the modeling process. The results of the competition are published as a special issue in the journal Machine Learning.
The current version 1.0 of the Open International Soccer Database contains more than 200,000 games from 52 soccer leagues of 35 countries, covering the seasons starting in 2000 to seasons ending in 2016. The Open International Soccer Database is released as an open science project, providing a valuable resource for soccer analysts and a unique benchmark for advanced machine learning methods. Both the Open International Soccer Database and the data related to the 2017 Soccer Prediction Challenge are available as open science projects at OSF.
References
[1] Berrar D., Lopes P., and Dubitzky, W. (2019) Incorporating domain knowledge in machine learning for soccer outcome prediction. Machine Learning 108(1):97-126.
[2] Dubitzky, W., Lopes P., Davis J., and Berrar D. (2019) The Open International Soccer Database for machine learning. Machine Learning 108(1):9-28.