There are no missing values in the dataset
Exploratory Data Analysis
Histogram Plots
Features like MonthlyIncome, TotalWorkingYears and YearsAtCompany are tail heavy plots.
Statistical Analysis
Statistical Description of employees who left
Statistical Description of employees who stayed
Mean age of the employees who stayed is higher compared to who left
Daily rate of employees who stayed is higher
Employees who stayed live closer to home
Employees who stayed are more satisfied with their jobs
Employees who stayed tend to have higher stock option level
Correlation Plot
Job level is strongly correlated with total working hours
Monthly income is strongly correlated with Job level
Monthly income is strongly correlated with total working hours
Age is strongly correlated with monthly income
Count plot
People in the age group of 26-35 are mostly leaving the company.
There are more sales executives and research scientists in the company
The married and divorced employees have a less tendency to leave compared to the single ones
The employees in the entry job levels (1 and 2) are leaving more (having less experience)
People having higher job involvement level are also leaving the company
Less involved employees tend to leave the company
Laboratory technicians are leaving more than other roles
KDE Plots
Employees who live far away tends to leave
Employees with less experience tends to leave
Employees who were under the current manager for a smaller period of time has left
Box Plots
Female employees are earning more than male counterparts
Research Scientists and Lab Technicians are earning the same amount monthly
Managers and Directors are earning more than other roles
Sales Executive & Healthcare Representative are earning more than the research scientists and sale representatives
Data Pre-Processing
1) Encoding the categorical variance using One-Hot Encoding Technique
2) Scaling the independent variables using Min Max Scaling
3) Splitting the dataset into training and testing sets with a ratio of 0.75
Building ML Classifiers
Logistic Regression
Accuracy: 91%
Precision: 90%
Recall: 91%
F1-Score: 90%
Random Forest
Accuracy: 89%
Precision: 88%
Recall: 89%
F1-Score: 86%
Artificial Neural Network (ANN)
Accuracy: 86%
Precision: 87%
Recall: 86%
F1-Score: 87%