Challenge & Evaluation

The task of the 2023 Soccer Prediction Challenge is to predict the outcomes of soccer league matches that will be played in April 2023. Participants may use any publicly available data set to train their machine learning models, or the training set provided by the organizers, or both.

The prediction Challenge consists of two tasks:

Task 1: Prediction of match results in terms of exact scores.

Task 2: Prediction of match results in terms of probabilities for win, draw, and loss.

Both tasks must be completed for a submission to be valid. Below is an excerpt of the predictions that should be submitted:

Task 1: Prediction of match results in terms of exact scores.

The term "exact score" refers to the outcome of a soccer match in terms of the goals scored by the home team and the away team. For example, if the home team scores 2 goals and the away team scores 1 goal, then the exact score is 2-1, i.e., HS = 2 and AS = 1. For each match of the prediction set, the exact scores are to be predicted. These scores are to be enter into the columns prd_HS and prd_AS, respectively.

The performance will be evaluated based on the root mean squared error (RMSE) between the real outcomes and the predicted outcomes:

where n is the number of matches in the prediction set. Notice, that predicted scores must be positive integers (0, 1, 2, …); decimal numbers, such as 1.3, 2.77, etc. are not permitted. The smaller the RMSE, the better the predictions.

Task 2: Prediction of match results in terms of probabilities for win, draw, and loss.

For each match, there are three possible outcomes: home win, draw, and loss. Here, "home win" means that the home team wins, "draw" means that neither the home team nor the away team wins, and "loss" means that the away team wins, which is of course equivalent to the home team losing. These three outcomes can be represented as vectors, (W, D, L). A win is represented as (1, 0, 0), a draw is represented as (0, 1, 0), and a loss (i.e., win of away team) is represented as (0, 0, 1). For each match, you should provide your predicted probabilities (prd_W, prd_D, prd_L). For example, (0.6, 0.3, 0.1) means that the probability of win is 0.6, the probability of draw is 0.3, and the probability of loss is 0.1. Clearly, these probabilities must add up to 1. For each match of the prediction set, (prd_W, prd_D, prd_L) must be provided. For one match, the ranked probability score (RPS) is calculated as follows [1]:

The performance will be evaluated based on the average ranked probability score (RPSavg) over all n matches of the prediction set:

The smaller the average RPS, the better the predictions. For more information about the RPS, please refer to [1-3].


[1] Berrar D., Lopes P., and Dubitzky, W. (2019) Incorporating domain knowledge in machine learning for soccer outcome prediction. Machine Learning 108(1):97-126.

[2] Epstein, E. S. (1969). A scoring system for probability forecasts of ranked categories. Journal of Applied Meteorology, 8(6), 985–987.

[3] Constantinou, A., & Fenton, N. (2012). Solving the problem of inadequate scoring rules for assessing probabilistic football forecast models. Journal of Quantitative Analysis in Sports, 8(1). https://doi.org/10.1515/1559-0410.1418.