This dataset for the project is originally from the National Institute of Diabetes and Digestive and Kidney Diseases.
Project Objectives:
The objective of the dataset is to diagnostically predict whether a patient has diabetes based on certain diagnostic measurements included in the dataset.
Project Tasks:
Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
From the data set in the (.csv) File We can find several variables, some of them are independent
(several medical predictor variables) and only one target dependent variable (Outcome).
Other Important Task in this project are:
Data Cleaning:
Delete Redundant Columns:
I removed redundant columns to streamline the dataset.
Rename Columns:
I ensured clear and descriptive column names for better understanding.
Drop Duplicates:
I eliminated duplicate entries to maintain data accuracy.
Clean Individual Columns:
I performed specific cleaning tasks on columns as needed.
Remove NaN Values:
I addressed missing data by removing NaN values from the dataset.
Additional Transformations:
I conducted further transformations to enhance data usability.
Data Visualization:
Visualizations on Demographic information
I created a dashboard displaying the demographic information.
Visualizations on Various Variables:
I created visualizations for Overtime, Marital Status, Job Role, Gender, Education Field, Department, and Business Travel.
Relations Between Variables:
I explored and visualized relationships between Overtime and Age, Total Working Years, Education Level, Number of Companies Worked, and Distance from Home.
Data Quality Enhancement:
I successfully improved data quality by removing redundant columns, renaming columns, dropping duplicates, and handling missing values.
Insightful Visualizations:
I created visually appealing and informative plots to represent correlations and relationships within the HR dataset.
Enhanced Decision-Making:
I provided actionable insights derived from data, contributing to more informed HR-related decision-making processes.