Dataset: Framingham Heart Study
Software: SAS,Python
Dataset description: 8 qualitative variables, 8 quantitative variables, 3658 observations
Target is imbalance, which will effect classification performance
Heatmap shows the highly correlation between systolic blood pressure and diasotlic blood pressure
The relation can be test in the scatter plot. Only keep one pressure metrics
People most get 20 cigs per day if they do smoke
The normal total cholestral level is below 200. Most patients are heavy in cholestral.
The normal level for BMI is about 18.5 and 24.9, which most patient don't
Patients glucose is mostly below 100, which seems normal .
No difference between glucose, BMI in terms of ten year CHD
No death percentage change in education, gender
Three components would be good for result.
Component Description
Weight: age, systolic blood pressure
Smoking impact: cigs per day, heart rate
Glucose
Result
Utility and Validity
The Accuracy is about 85%, and ROC is about 73.61%. The model utility is overall good.
Model Validity
No pattern in residual.
Limitation
Target imbalance
Follow Up Study
Extrapolation
Conclusion
Except for the BMI and prevalentHyp factors which are not statistically significant, the variables selected in this report act as predictors of whether a person has the risk of CHD in ten years.
There are three components categorized by PCA analysis: weight impact, smoking habits and blood sugar