Dataset: https://www.kaggle.com/blastchar/telco-customer-churn
Software: Python,SPSS
Dataset Description: 7043 rows, 21 columns, 3 of them are numeric variables, rest of them are categorical varibles
Identify the users with high probabilities to churn
Uncover the factors may lead to the customer churn.
Provide managerial recommendations to the company to improve performances
Economic : Economy increase support development of telecom industry
Social Culture: Social Culture needs telecom companies to grow
Providers: Main players in the telecom industry in the US are Verizon, AT&T, T-mobile and Sprint
Subscriber : Postpaid segment accounted for more than 60% of all connection, while prepaid accounted for less than 20% of all connection
Churn Customer: Annual Industry churn rate was 15.9%
Target Distribution :
More cutomers are not churn, which may effects the model performance.
Categorical Variable
After doing a chi-square test, Phone Service and Gender doesn't have much impact on target.
Customer who churn are not senior citizen, Using check more and using month to month contract.
For Service, customer who are churn, using Fiber optic Internet Service and single lines phone services.
They tend less use other internet services like online backup etc m but they do using streaming TV and streaming movies.
Numerical Variables:
From boxplot,tenure, monthly charges and total charges do have impact on churn. But based on domian knowleges, total charges equals to monthly charges multiply tenure. So total charges should be exclude in the analysis.
Most people churn with shorter tenure and more monthly charges.
Using tenure as frequencies and monthly charges represents monetary.
Since dataset doesn't have information about recency, only use frequency and monetary to segment customer initially
From barplot, tenure is negatively associated with churn. Less tenure with high probabilities in churn. And more monthly charge is associated with high probabilities to churn.
Result:
Mainly two kind of people are more likely to churn based on decision tree result:
Month to Month contract using Fiber optic with tenure less than 17.5 month
Month to Month contract using DSL or no internet service with tenure less than 3.5 and don’t use tech support
Accuracy and Utility:
Accuracy is about 80%, and our model is more accurate in predicting in No churn customer due to our target distribution.
Overfitting:
The lift chart showing that the trend is nearly same between training sample and test sample. No evidence of overfitting in decision tree.
Logistic Regression Model Result:
From Logistic Regression, the relationship between monthly charges and churn are negative, which doesn't fits the result from RFM and Exploration Data Analysis.
But the extreme relationship between internet service and churn is strange. From boxplot, Internet Service between monthly charge are highly correlated.
So the relationship between internet service and churn can somehow represent information between monthly charges and churn.
Still think monthly charges are positively correlated to the churn. The more customer spend in month, the higher probabilities the customer will churn.
Model Utility
Small differences between accuracy and roc.
No evidence of overfitting in this situation.
Banana Graph and Cum Lift:
Compared to benchmark, most customers are just slightly better than random guess based on the banana graph.
Cum Lift graph shows that less churn probabilities customer just slightly better than those who are more likely to churn.
It depends on telecommunity industry’s customer characteristics.
What customer's demographics of those who are more likely to churn?
Those who are more likely to churn :
Less tenure with more montly charges
Senior citizen without any partner or dependent, using month to month contract and paper billing and pay with check.
Using FiberOptic just for streaming TV and months
Conclusion:
People churn with shorter tenure and higher monthly charges.
Tenure, Monthly charges and Tech Support are important in predicting Churn.
Churn customer paid more on Fiber Optic and using less services.
Customer are not sensitive to service, so the companies can focus on the existing customers.
Limitations:
Imbalanced Target
Different accuracy in predicting churn customer and non-churn customer
Time - scale effect, such as seasonality
Future Work:
Resampling or collecting more data
Using different cutoff
Include cost of acquiring a new customer and remaining an existing customer