We try to use the board game features to predict a game's geek rating on boardgamegeek.com and use the geek rating as a measure of how successful a game is. We took the following steps to work out our prediction. We tried 4 machine learning algorithms and presented their results here.
From data exploration and summary statistics of the dataset, we concluded that while number of votes, geek_rating, and rank were strong predictors of average rating (with an adjusted R^2 of 0.85), these predictors are not very meaningful as it does not provide any description of the characteristics and nature of successful board games, thus no valuable information on what makes a good board game. Instead, we would like to concentrate on the plethora of categorical variables available to use in for game mechanics and game categories.
In total, there are 84 different types of game categories and 52 different types of game mechanics, suggesting great diversity and thus the potential for interesting correlations with average rating. In the following machine learning protocols, we would like to build models that rely on these categorical variables to find any meaningful correlation between the structure and characteristics of a board game and its average ratings.
And thus we embark on our machine learning adventure!
We first sparsed out all the different game mechanics and game categories that a game belonged to. Next, we created dummy variables that indicate the presence or absence of one particular game category or mechanic. The analyzed dataset contains 84 columns for all game categories and 52 columns for all game mechanics.
We used set.seed(1) to partition our dataset into a train and test set that encompassed 80% and 20% of our total data, respectively. Our models were train on the train set, then tested on the test set to see how good the predictions are between models.
We used the following four machine learning algorithms to predict the average ratings.