Sabejon, Jayson*; Rejas, Jeyhozaphat; Mendez, Edwin; Zarate, Reymund; Tinoy, Marie
Science, Technology, Engineering, and Mathematics Strand - Senior High School Department, St. Rita's College of Balingasag, Inc.
Diabetes is a metabolic disorder brought on by either insufficient insulin production by the pancreas or inadequate insulin utilization by the body. It is among the most prevalent diseases without a known cure, however, survival can be increased with timely detection. Analyzing the significant variables that have a greater impact on diabetes complications is the main goal of this study. We discuss in this study the utilization of an extreme gradient boosting algorithm – a variant of decision tree algorithms in formulating a predictive model. The diabetes dataset collected from UCI Machine Learning Repository is utilized in formulating the predictive machine learning model for classifying a positive and negative diabetes diagnosis. Results show that the formulated XGBoost algorithm with data imputation performed best in classifying a positive and/or negative diabetes case, with accuracy = 99.03%, kappa statistics = 0.9797, and f - measure = 0.990. The experimental results of this study outperformed in comparison to methods utilized by previous researches. The feature importance analysis showed that the ‘age’ variable has the greatest predictive power for diabetes detection. This result confirms previous findings that age often does influence diabetes since increased insulin resistance and impaired pancreatic islet function is associated with aging. Result emphasizes the importance of staying active and healthy in older age as it delays the onset of age – related conditions such as diabetes.
Keywords: diabetes, prediction, xgboost algorithm, machine learning
Corresponding author's email: jaysonsabejon38@gmail.com