A machine learning protocol via the Caret package was completed using linear regression methods that included all processed variables in the dataset (see Overview). The output variable coefficients were examined, and variables were chosen on the basis of a P value < 0.05 significance. This process resulted in 34 variables, 22 of which are indicator variables for game mechanics, 8 of which are indicator variables for game categories, and the remaining 4 of which belong to general characteristics of board games (year published, game difficulty, single player games, and multiplayer games).
From initial variables obtained by linear regression mode of machine learning, we performed a stepwise selection of variables to include in the final model. Variables were selected based on AIC. We ended up with 24 significant variables, 16 of which are indicator variables for game mechanics, 4 of which are indicator variables for game categories, and the remaining 4 of which included the same general characteristics of board games, as mentioned above.
The final model has an RMSE of 0.42. The predicted ratings are positively correlated with average ratings, with a correlation coefficient of 0.67.