In this section, we try to explore Neural Networks to see if we can improve prediction accuracy for our new categories data set. Let us first identify our input, hidden, output, loss function and optimizer that we are going to use in this model.
Input Layer: We start with input layer which is our original data set which includes our 11 predictors.
Hidden Layers: We have four hidden layers which use "Relu" as activation function.
Output Layer: Output layer activation function is softmax as we are trying to predict nominal data
Loss Function: We will be using sparse categorical cross entropy as our loss function as we are trying to predict nominal data and our output values are integers
Optimizer: We will be using Adam as our optimizer. Adam combines the best properties of the AdaGrad and RMSProp algorithms to provide an optimization algorithm that can handle sparse gradients on noisy problems
Batch Size: We will be using batch size of 128 to make sure it is sufficiently large so that the models converge and also make sure it is not large enough so that it is computationally expensive
Epochs: We will be using 2000 to make sure we run the model multiple times
Validation Split: We will be using 30% of the data to be used for validation
In our first step, let us first define the model with input, hidden and output layers as below
Now, we compile the model with the loss function and optimizer that we would like to use. In our case, we will be going with sparse categorical cross entropy as loss function and Adam as optimizer as below
Finally, we fit the model:
We then plot Model loss and accuracy for both train and validation set:
Below we observe the train and validation accuracy for our neural network model. The validation accuracy is 67.313%, which is just above our Random Forest accuracy.
Regularized Neural Network
Since there was a large gap between train and test accuracies, we chose to also perform L2 regularization on the model. Our validation accuracy improved slightly to 67.442%, as seen below:
Now, let us look at the variable importance that we have via this model and compare to our Random Forest and Bagging models:
We can observe that college university distance is the most important predictor compared to property average that saw in the Random Forest model. However, we can see the second most important predictor is still property average.
Conclusion
Overall, we can say the models predicting the new categories perform better than the naive method of simply choosing the most common crime type. Our best test accuracy came from the regularized neural network model, at 67.44%, an improvement over the 59.06% "most common" method.
Models for Original Categories