(Still work in progress)
Decision tree models are commonly known to be one of the easiest models to understand intuitively, as it is based on the fairly well-known enough concept of a decision tree (which many of us have seen in various situation, especially in COVID for those decision trees to decide whether or not to stay home and/or get tested.)
In machine learning, decision tree models can use the same concept to determine a class or value to predict based on certain criteria that are established within the model (and those criteria are built during the "training" phase of the machine learning model application.
JK note to self: also could mention the use of Gini/entropy
the 'temp-classification-test2.csv' data that I cleaned up and created on the Classification Data Prep page.
Used 'TAVG_label' as the target label. Everything else (except 'year') was used as the data.
Link to Decision Tree code: https://github.com/Rokkaan5/5622-PublishedCode/blob/main/Classification/DecisionTrees.py
So interestingly, the decision tree model didn't do very well in predicting when the average temperature is the "below" label (which is technically between the median and 1st quantile of the average temperature data). Not only did not predict correctly even once, it predicted everything other than the "below" label when the true value was "below." It's interesting that the Naive Bayes label also did a poor job of predicting that correctly also, so I wonder if that is telling me something about the data or how I discretized the labels.
Personally, I would prefer to have done a regression model with a RandomForest model. From my experience, I acknowledge the power of ensemble models and think it has the potential to be "more accurate". Plus, regression models are capable of producing multiple outputs from a single mode, and my original target data was continuous before I discretized it for classification purposes. But, for time's sake, I will leave that as something to come back to and try after I finish this class and leave it as is for the criteria of this course.