The HR department wants to take some initiatives to improve employee satisfaction levels at the company. They collected data from employees and would like to know what’s likely to make the employee leave the company.
In this project, I build a model that predicts whether or not an employee will leave the company and identify factors that contribute to their leaving. Because of potentially informative outliers, some multicollinearity, and nonlinear relationships between the predictor variables and outcome variables, I chose to build machine models (classification) rather than using logistic regression analysis. I decided to build and compare results for decision tree, random forest and XGBoost classification models. For code and details see Python notebook linked below.
Model results
The strongest decision tree model fit the data well with F1 score of .945 on the test data. It classifies with 98% accuracy and has a good balance between false positives and false negatives (precision score: .972, recall score: .912), although false negatives are more common. It performed well on training, validation, and test data indicating that overfitting is not a concern.
Satisfaction scores were the strongest predictor of whether an employee would leave or not followed by number of projects, evaluation scores, tenure, and monthly average hours.
The developed model is a good predictor for employees likely to leave. There is high confidence that an employee predicted to leave will actually leave although the model does miss a small number of employees who are likely to leave.
Satisfaction scores were the strongest predictor of whether or not an employee would leave. All employees with satisfaction scores near zero left. Employees with low satisfaction scores, high average number of hours (more than 275) and high evaluation scores were particularly vulnerable to leaving. Satisfaction scores dropped sharply and likelihood of leaving increased dramatically for employees with more than five projects.
Supervisors should examine employee workload. Three to five projects per employee appears optimal and supervisors should monitor and intervene for employees working more than 250 hours monthly.
The satisfaction metric should be examined to see if there is more detailed data from the measurement tool that can identify additional causes of dissatisfaction. If not, such a tool should be located or developed to get more information about potential strategies to improve satisfaction.
Strategies to increase satisfaction might have the most impact during their first five years, when employees are more likely to leave.