Once the data has been processed in R, the Naive Bayes algorithm is utilized for modeling. The dataset is divided into two disjoint subsets, with a split of 70% for the training set and 30% for the testing set. The chosen parameters for the model include:
laplace: 1
, a Laplace smoothing parameter set to 1. Laplace smoothing involves adding a positive integer to every class count, helping to address zero-frequency issues.
The results of the model are shown below.
This snapshot shows the summary of the Naive Bayes model. It shows that the smoothing factor is Laplacian Smoothing and its value along with the prior probabilities of the classes.
Confusion matrix shows how well the classification model has performed. The columns of plot 1 and rows of plot 2 represent the actual values of the target variable (total weather delay), and the rows of plot 1 and columns of plot 2 represent the predicted values by the model.
From the confusion matrix, it is evident that the model has correctly predicted 15801 instances of the extended delay class and has wrongly predicted 6934 instances as belonging to the extended delay class when they belonged to the short delay class. Similarly, the model has correctly predicted 7559 instances of the short delay class and has wrongly predicted 11974 instances as belonging to the short delay class when they belonged to the extended delay class.
Overall, the model doesn't perform well in identifying total weather delays.
The classification report displays the result of how well the model has classified and performed in the classification tasks. It displays various values like accuracy, recall (sensitivity), precision (specificity), and F1 score.
From this report, we can see that the accuracy of the model is just 55%. And the F1 score of each of the models is 0.63 which is decent but not so good.
It can be concluded that this model doesn't perform well in identifying total weather delays.