Random Forest

Data Pre-processing

We dropped some of the irrelevant columns such as the board game url, image_url and so, and only included those we think are meaningful in our model. The included features are:

Age (Minimum age required to play)
Year (year of publication)
Min and Max Players needed
Min and Max Time needed
Game Difficulty (weight)
Mechanics (hot-coded)
Categories (hot-coded)

Initial Model & Feature Selection

Since we have over 100 features in the data, it may not make the most sense to include all the features in the random forest, especially if we are doing cross validation. So I first ran a model with all the features and plotted out the features by their importance.

Model Training

We used randomForest function in randomForest library to train our models.

Feature Importance Plot

RMSE: 0.3815

We see there are so many features we can include, but only a few are important.
I will select the top 20 of the features to include in my new model.

Zoom in to Top Features

RMSE: 0.4022

The most important features are: weight (game difficulty), year, maximum and minimum time needed to play, age requirement, whether it's a party game or single player game
The most important mechanics are: hand management, set collection, action point allowance system, co-operative play, point-to-point movement and trading.
This model with only 20 features have a slightly larger RMSE than the previous model.

Cross Validation to Choose the Best mtry Parameter (ntree = 100)

Best RMSE: 0.3711

We see that when mtry = 30 the RMSE becomes the smallest and tends to be stable.

Cross Validation to Choose the Best ntree Parameter (mtry = 30)

One important parameter in random forest is the number of trees to grow. We can use cross validation to choose the best ntree.

Best RMSE: 0.3725

The optimal number of trees to grow with mtry = 30 is around 550.

Predictions

From the above, we can choose our best model to have ntree = 550 and mtry = 30. The predicted values and the true average ratings are as following:

Google Sites

Report abuse