CODE & RESULTS

Confusion Matrix

The confusion matrices reveal that all three models struggled to distinguish between the two classes (0 = low severity, 1 = high severity).

Multinomial NB shows almost equal misclassifications across both classes, indicating poor discrimination power.
Gaussian NB slightly outperformed the others by correctly identifying more samples in both classes.
Bernoulli NB appears biased toward predicting Class 1 more frequently.

Accuracies

The accuracies observed across the three Naive Bayes models provide valuable insight into how different assumptions impact classification performance. Among the models, Gaussian Naive Bayes achieved the highest accuracy (≈0.509), suggesting that the continuous features in the dataset are reasonably modeled by normal distributions. Bernoulli Naive Bayes followed closely (≈0.506), which indicates that binary patterns in the data also hold some predictive power. In contrast, Multinomial Naive Bayes performed slightly worse (≈0.499), likely due to the algorithm’s assumption of count-based features, which may not align well with the continuous nature of this dataset. Overall, while none of the models reached high predictive accuracy, the differences highlight how model assumptions about feature distributions affect classification outcomes.

Predicted Probability Distributions

The histograms of predicted probabilities show how confident each model was:

Multinomial NB had a broad spread of probabilities from 0.2 to 0.8, indicating a moderate range of confidence.
Gaussian NB predicted mostly around 0.5, suggesting low confidence and high uncertainty.
Bernoulli NB showed a slight peak around 0.5–0.55, indicating limited variation and less confident predictions.

Insights about the topic

Top 10 Features Correlated with Anxiety Severity (Binary)

This visualization reveals that certain features have a stronger association with anxiety severity than others, offering valuable insights into the factors that may influence mental health. Most notably, higher stress levels, increased heart rate, and faster breathing rates show a positive correlation with higher anxiety severity, suggesting that physiological responses play a significant role. On the other hand, better diet quality, regular physical activity, and sufficient sleep are negatively correlated, indicating their potential protective effect against anxiety. These patterns highlight the multifaceted nature of anxiety, where both physical and lifestyle-related variables interact. While Naive Bayes models may have limited predictive power, this analysis helps identify which features are most informative and could be prioritized in future modeling or intervention efforts.

Reasons why Naive Bayes didn't perform well.. ⚠️

Over-simplistic Assumptions: Naive Bayes assumes that all features are conditionally independent given the target class. This is rarely true, especially in health-related datasets where factors like stress, sleep, and physical activity are often closely related. This assumption limits the model’s ability to capture real-world interactions.
Model Inflexibility: Each variant of Naive Bayes (Multinomial, Bernoulli, Gaussian) relies on a fixed distributional assumption for the input data. If the actual data distribution does not match the model’s assumption (e.g., Gaussian for continuous features), performance will suffer because the model cannot adapt to non-standard patterns.
Uniform Decision Boundaries: Naive Bayes creates linear decision boundaries in feature space. When the classes are not linearly separable as appears to be the case in this dataset the model struggles to draw meaningful distinctions between them, leading to low accuracy.

CODE

Link to the Code

Decision Tree

Page updated

Google Sites

Report abuse