In this section, we try to explore Neural Networks to see if we can improve prediction accuracy for our original categories data set. Let us first identify our input, hidden, output, loss function and optimizer that we are going to use in this model.
Input Layer: We start with input layer which is our original data set which includes our 11 predictors.
Hidden Layers: We have four hidden layers which use "Relu" as activation function.
Output Layer: Output layer activation function is softmax as we are trying to predict nominal data
Loss Function: We will be using sparse categorical cross entropy as our loss function as we are trying to predict nominal data and our output values are integers
Optimizer: We will be using Adam as our optimizer. Adam combines the best properties of the AdaGrad and RMSProp algorithms to provide an optimization algorithm that can handle sparse gradients on noisy problems
Batch Size: We will be using batch size of 128 to make sure it is sufficiently large so that the models converge and also make sure it is not large enough so that it is computationally expensive
Epochs: We will be using 2000 to make sure we run the model multiple times
Validation Split: We will be using 30% of the data to be used for validation
In our first step, let us first define the model with input, hidden and output layers as below:
Now, we compile the model with the loss function and optimizer that we would like to use. In our case, we will be going with sparse categorical cross entropy as loss function and Adam as optimizer as below:
Finally, we fit the model:
We then plot Model loss and accuracy for both train and validation set:
Below we observe the train and validation accuracy for our neural network model. The validation accuracy is 51.143% which is still no significant improvement in our prediction accuracy. This is similar to the accuracy we observed using random forest.
Now, let us look at the variable importance that we have via this model and compare to our Random Forest and Bagging models:
We can observe that Police station distance is the most important predictor even in Neural Network model which is similar to Random Forest and Bagging models. However, the second and third most important predictors, college university distance and library distance, are different.
Conclusion
Overall, we can say that our prediction accuracy for our original crime categories is very low and is not reliable. This was part of our motivation to revisit the crime types we predicted in an attempt to improve our prediction accuracy. We will see that analysis in our Models-New section.
Models for New Categories