Customer churn is a term used when a customer decides to stop using the services of the business. Businesses do customer churn analysis all the time because it is very helpful for a company if they learn which customers are about to leave.
This aim of this project is to train a machine learning model on the available data to train a machine learning model that will predict with a high accuracy which customers are about to churn, which in turn will help the business owner in making useful marketing decisions.
You can find the original project details on Kaggle: Dataset.
Data Fields are as shown below:
1) State, string - 2 letter code of the US state of customer residence.
2) Account Length, numerical - Number of months the customer has been with the current telco provider.
3) Area_code, string="area_code_AAA" where AAA = 3 digit area code.
4) International_plan, (yes/no) - The customer has international plan.
5) Voice_mail_plan, (yes/no) - The customer has voice mail plan.
6) Number_vmail_messages, numerical - Number of voice-mail messages.
7) Total_day_minutes, numerical - Total minutes of day calls.
8) Total_day_calls, numerical - Total number of day calls.
9) Total_day_charge, numerical - Total charge of day calls.
10) Total_eve_minutes, numerical - Total minutes of evening calls.
11) Total_eve_calls, numerical - Total number of evening calls.
12) Total_eve_charge, numerical - Total charge of evening calls.
13) Total_night_minutes, numerical - Total minutes of night calls.
14) Total_night_calls, numerical - Total number of night calls.
15) Total_night_charge, numerical - Total charge of night calls.
16) Total_intl_minutes, numerical - Total minutes of international calls.
17) Total_intl_calls, numerical - Total number of international calls.
18) Total_intl_charge, numerical - Total charge of international calls
19) Number_customer_service_calls - numerical. Number of calls to customer service
20) Churn, (yes/no). Customer churn - target variable.
in the exploratory data analysis part some of the main findings were :
firstly - target variables were imbalanced so this needed to be taken care of while building the models
Higher number of samples are churning when the day_charge is higher then 40.
Most of the churned_samples can be seen when the number of voice mail messages is higher than 15
as the customers may get annoyed if they were sent a lot of mails
then we found out that some features were closely correlated so we removed it in order to eliminate redundancy