Decision Tree Classifier
In this section, we start off with a decision tree classifier and perform 5-fold cross validation for all predictors to get the cross validation score, prediction and accuracy scores. We will be using the same function that we used for our original category models.
We performed 5-fold cross validation up to a max tree depth of 20 to find the optimal depth for prediction accuracy. We then plotted the results:
Based on above plot, we can observe that the test accuracy score is highest (around 66.289%) when the tree depth is 12. Therefore, we will use this tree depth for other ensemble methods going forward.
Bagging
Definition: Bagging is a combination of bootstrapping and aggregation. In bagging, we use bootstrap re-sampling to create different training data sets. This way each training will give us a different tree.
Since we have many trees that we will average over for prediction, we can choose a large max_depth
and not worry about overfitting, as we will rely on the law of large numbers to shrink this large variance, low bias approach for each individual tree.
We performed bagging on all predictors with the best depth of 9, which we got from the decision tree classifier. We use the same bagger function that we have defined in our original category model.
We ran the bagger function 55 times and got 55 models and used mode to get the most frequently predicted value for each model.
We can observe that the test accuracy has increased to 66.324%, improving even further on our multivariate logistic model. We will next look at variable importance and see which features the model considered most significant.
Based on the above chart, we can see that Property average is the most important predictor to predict crime type in the bagging model, followed by public school distance and streetlight density.
Let us move on and try Random Forest and see if that helps increase our predictive power or selects different variables as most important.
Random Forest
Definition: In Random Forest, we will build each tree by splitting on "random" subset of predictors at each split (hence, each is a 'random tree'). This can't be done in with just one predictor, but with more predictors we can choose what predictors to split on randomly and how many to do this on. Then we combine many 'random trees' together by averaging their predictions, and this gets us a forest of random trees: a random forest.
In this section, we use a Random Forest classifier with estimator count of 55, similar to what we did in bagging. We observe that a test accuracy of 67.1%, a small improvement over our bagging model.
Based on the above chart, we can see that property average is still chosen as the most important predictor of crime type, followed by university distance and publish school distance.
Conclusion
Based on the ensemble methods that we have performed above, we can see that there is some improvement in predictive power of our models. The best prediction accuracy for test data set was achieved by Random Forest classifier model with accuracy of 67.1%.
Finally, we will build neural networks and see if that helps to improve our predictions.
Models for Original Categories