(Still work in progress)
A Naive Bayes model utilizes the Naive Bayes theorem to determine the probability of a data vector (collection of input values) to be part of a certain class in a classification problem. Whichever class the Naive Bayes theorem says the vector has higher probability to be a part of is then the predicted result of the model.
(It will likely be easier to understand with a simple example)
the 'temp-classification-test2.csv' data that I cleaned up and created on the Classification Data Prep page.
Used 'TAVG_label' as the target label. Everything else (except 'year') was used as the data.
Link to code (done in R): https://github.com/Rokkaan5/5622-PublishedCode/blob/main/Classification/R-NaiveBayes.qmd
This Naive Bayes model (like the Decision Tree model) did not predict the "below" label correctly, even once. (TAVG label of "below" means that the average temperature was between the median and 1st quantile of all average temperatures in the given data.) Because it's consistent with the results from the sklearn Decision Tree model, I wonder if that means these are telling me something about the nature of the data or how I discretized the data (granted, it appears that there also wasn't great balance in the number of data labeled "below" compared to other labels). So this is interesting to look into further at some point.