Data Description
Data Description
For our project, we selected a dataset that aligns with our title: "Exploratory Data Analysis of Diabetes Risk Factors." Our dataset contains around 100,000 instances. The Diabetes prediction dataset consists of medical and demographic data from patients, along with their diabetes status (positive or negative). The data includes features such as age, gender, body mass index (BMI), hypertension, heart disease, smoking history, HbA1c level, and blood glucose level.
This dataset can be utilized to build machine learning models to predict diabetes in patients based on their medical history and demographic information. It proves useful for healthcare professionals in identifying patients who may be at risk of developing diabetes and in creating personalized treatment plans. Moreover, researchers can leverage this dataset to explore the relationships between various medical and demographic factors and the likelihood of developing diabetes.
Here is the source link of our dataset: diabetes-prediction-dataset
There are 9 features in the dataset:
Gender: Gender (Male or Female)
Age: Age in years. Age ranges from 0-80 in our dataset.
Hypertension : Blood pressure in the arteries is persistently elevated. It has values as 0 and 1 where 0 indicates they don’t have hypertension and 1 it means they have hypertension.
Heart disease: It is associated with an increased risk of developing diabetes. It has values a 0 or 1 where 0 indicates they don’t have heart disease and for 1 it means they have heart disease.
Smoking history: Information about smoking, it is a categorical attribute.
-No info
-Never
-Former
-Current
-Not current
HbA1c_level : HbA1c (Hemoglobin A1c) level is a measure of a person's average blood sugar level over the past 2-3 months.
BMI: Body Mass Index is a measure of body fat based on weight and height. The range of BMI in the dataset is from 10.16 to 71.55.
Blood glucose level: Amount of glucose in the bloodstream at a given time.
Diabetes :Diabetes is the target variable being predicted, with values of 1 indicating the presence of diabetes and 0 indicating the absence of diabetes.