In this section, we analyse what we can take away from the resulting model performance, highlighting their
Age was a top predictor across all models. This reinforces the fact that stroke risk increases significantly as people get older. Age alone should prompt closer health monitoring, especially beyond 50.
Glucose Level and Age Group were especially important in XGBoost. This shows that metabolic health (e.g., blood sugar levels) combined with demographic groups can signal elevated stroke risk. Patients in older age brackets with high glucose may benefit from earlier screenings or lifestyle changes.
BMI-related Interactions (like age × BMI or glucose × BMI) were important in Logistic Regression. This suggests that excess weight can amplify the effects of age and blood sugar, raising stroke risk even further.
We looked at the 10 patients with the highest stroke risk predictions from the Logistic Regression model. Out of those, 2 actually had a stroke, showing that the model can spot real high-risk individuals even before symptoms appear.
Most of these patients were:
Older adults (65+)
Had high glucose or BMI scores
Some didn't show usual risk signs like heart disease or hypertension
This proves that the model can help flag silent risks that regular health checks might miss—supporting earlier and more targeted care.
Using a decision threshold of 0.485, patients were stratified into three risk categories:
Low Risk: 461 patients
Medium Risk: 159 patients
High Risk: 632 patients
This stratification enables more targeted interventions, where High Risk patients can receive immediate screening, Medium Risk can be monitored, and Low Risk can be advised on lifestyle improvements.
Analysis of age and glucose levels in the High-Risk Group further supports the model’s focus:
Age Distribution peaked in the 50–60 and 75–80 age ranges.
Glucose Levels showed right-skewed patterns with a long tail, highlighting outliers above 200 mg/dL.
These patterns align with clinical expectations and confirm the model’s sensitivity to relevant health indicators.
Logistic Regression correctly flagged 98.3% of actual stroke cases in the high-risk group.
Only 1.7% of stroke cases were missed, reinforcing the model’s excellent recall and low false negative rate.