CODE & RESULTS

Confusion Matrix

The base Decision Tree correctly identified more instances across both classes, showing better discrimination power.
The tuned Decision Tree (e.g., with max depth or min samples constraints) provided the most balanced predictions and helped reduce overfitting.
Some models still showed a mild bias toward the majority class, which suggests that further data balancing might be beneficial.
Overall, Decision Trees were more flexible and interpretable, and they responded better to complex feature interactions than Naive Bayes models.

Heart Rate during an attack was chosen as the root feature, meaning it provided the strongest split for separating mild vs. severe anxiety cases. Lower heart rates (≤ 60.5 bpm) leaned slightly toward severe cases.
Stress Level, Dizziness, and Sweating Level are also influential early splits, which confirms that both emotional and physical symptoms are key indicators of anxiety severity.
Features like Diet Quality, Sleep Hours, and Physical Activity appear deeper in the tree, suggesting their predictive value depends on combinations with other symptoms.
Some leaves, such as those involving Smoking or Age, contain fairly mixed class values, highlighting that these factors alone do not consistently predict anxiety severity.
The overall accuracy of the tree is moderate (0.5061), and the high number of samples per split indicates the model is trying to capture subtle distinctions in a large dataset.

The tree starts with Dizziness as the root node, meaning it's one of the strongest individual indicators in the dataset for anxiety severity.
The left subtree focuses on Diet Quality, Age, and Caffeine Intake, showing that poor diet and younger age may link to increased anxiety.
The right subtree incorporates Sweating Level, Breathing Rate, and Sleep Hours, reinforcing that physiological symptoms play a major role in prediction.
The use of Occupation and Alcohol Consumption deeper in the tree shows how lifestyle and demographic factors interact with symptoms to influence predictions.
The tree achieves a modest accuracy of 0.5094, showing slightly improved performance over random guessing but still suggests room for feature engineering or balancing.

This tree begins with Diet Quality as the root feature, indicating that individuals with poorer diets are more likely to experience severe anxiety.
Low Sleep Hours, High Stress Levels, and High Caffeine Intake appear together in the left branch, painting a clear picture of lifestyle-related contributors to anxiety.
On the right, Physical Activity and Therapy Sessions per month are used to distinguish between mild and severe cases, highlighting the importance of both physical and mental health support.
The inclusion of features like Alcohol Consumption, Age, and Caffeine Intake on both branches emphasizes that multiple habits and conditions intersect to influence mental well-being.
With an accuracy of 0.5072, this model shows a slight edge over Tree #1 and Tree #2, suggesting that diet, sleep, and therapy together might offer more consistent signals of anxiety severity.

Overfitting Risk: Decision Trees are prone to overfitting, especially when trained on datasets with many features or noisy patterns. Without proper pruning or tuning, the model can memorize training data rather than learning generalizable patterns, which limits performance on unseen data.
Sensitivity to Imbalanced Data: If one class dominates the dataset, the tree may become biased toward predicting that class more often. This can reduce its ability to identify minority class examples accurately a common issue in health data where class distributions are not always balanced.
Locally Greedy Splitting: Decision Trees use greedy algorithms that select the best split at each node based only on local criteria (e.g., Gini or entropy), without considering the overall tree structure. As a result, they may make suboptimal splits early on, leading to poor downstream decisions and reduced overall performance.

Regression

Page updated

Google Sites

Report abuse