Diabetes is a chronic illness that affects millions of people worldwide and requires regular monitoring of a patient’s blood glucose level. Currently, blood glucose is monitored by a minimally invasive process where a small droplet of blood is extracted and passed to a glucometer—however, this process is uncomfortable for the patient. In this paper, a smartphone video-based noninvasive technique is proposed to estimate glucose levels in the blood. The videos are collected steadily from the tip of the subject’s finger using smartphone cameras and subsequently converted into a Photoplethysmography (PPG) signal. A Gaussian filter is applied on top of the Asymmetric Least Square (ALS) method to remove high-frequency noise, optical noise, and motion interference from the raw PPG signal. These preprocessed signals are then used for extracting signal features such as systolic and diastolic peaks, the time differences between consecutive peaks (DelT), first derivative, and second derivative peaks. Finally, the features are fed into Principal Component Regression (PCR), Partial Least Square Regression (PLS), Support Vector Regression (SVR) and Random Forest Regression (RFR) models for the prediction of glucose level. Out of the four statistical learning techniques used, the PLS model, when applied to an unbiased dataset, has the lowest standard error of prediction (SEP) at 17.02 mg/dL.
In this study, PCR- and PLS-based models generally outperformed the models based on SVR and RFR. It could be attributed to the limited dataset used in the study. Zhang, G. et al. used PPG signals acquired from smartphones for the classification of the glucose level. With an accuracy of over 80%, their model can predict one of the three diabetic status—normal, borderline, and warning of a subject. However, knowing the actual glucose level's quantitative information is more important than knowing the diabetic status. We have demonstrated that our model can solve that problem by predicting the actual glucose level with an error of less than 20 mg/L. First derivative and second derivative characteristic points were the dominating features with the machine learning models. SVR based models performed comparatively well in prediction in both experiments, whereas RFR models performed poorly in both the experiments.