Customer Churn Prediction Project 📊
Customer churn refers to customers discontinuing their relationship with a company or its services/products, tracked by the churn rate.
Businesses employ strategies like improved customer service and loyalty programs to mitigate churn, utilizing customer feedback for better experiences.
I will use a dataset to develop tree-based models for predicting churn in the telecommunication sector.
Management Goals:
Develop ML models for predicting churn.
Precision of churn predictions should be no less than 50%.
Keep model training efficient for quick adaptation to changes.
Maximize recall/sensitivity for churn:Yes class.
Dataset Description:
Includes data on customers who churned recently, services subscribed to, account information, and demographics.
Services include phone, internet, security, tech support, and streaming TV.
Account info covers customer tenure, contract, billing method, and charges.
Demographic data includes gender, age range, and family status.
pandas: Used for data manipulation and analysis. It provides data structures like DataFrame and Series, which are widely used in data science workflows.
numpy: Used for numerical computing. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
random: Provides tools for generating random numbers and performing random selections.
seaborn: Built on top of matplotlib, seaborn provides a high-level interface for drawing attractive and informative statistical graphics.
matplotlib.pyplot: A plotting library for creating static, interactive, and animated visualizations in Python.
math: Provides mathematical functions and constants.
sklearn.metrics: Contains functions for evaluating the performance of machine learning models, such as classification reports, confusion matrices, and ROC curves.
sklearn.ensemble: Contains classes for implementing ensemble learning methods such as Bagging, Random Forests, AdaBoost, and Gradient Boosting.
I like to check random rows of our data in a transpose so that all the features are listed, and variations in features and data it contains can be visualized.
First I split the features in my data in to continuous, categorical, and target variable types.
I observed inconsistencies in the data types of 'Total Charges' and 'Senior Citizens' columns. To address this issue, the data types have been corrected.
Percentage Missing
This code snippet first converts irregular or non-numeric data in the dataset to NaN (Not a Number) format. Then, it calculates the percentage of missing values in the dataset after the conversion.
Looks like we have negligible amount of missing values in our dataset, and on further investigation I believe I can deleted those rows with no impact on my prediction model.
Phone service => I noticed that there is a minority class for a predictior, customers with no phone in our dataset.
How I will address this: Feature engineering: Since the predictor Multiple lines also captures this information, I can safely remove the phone service from my list of predictors.
Senior citizen => I noticed that there is a minority class for a predictior, Senior citizen.
How I will address this:
No change: I will not touch this feature as this is integral part of the datset and the model may able to tell us more about its impact.
If there is an impact to churn based on Senior cititzen or not, I will model data for senior citizens seperatly to identify trends for that particular cohourt.
The Charts help visualize the distribution of features values.
Categorical data distribution
Continuous data distribution
Target Class imbalance
Due to high multicollinearity, I will drop total charges from my data set.
Note: I have dropped phone service, and total charges based on reasoning above
The process involves several steps:
Label Encoding: Certain categorical columns need to be transformed into numerical values for machine learning algorithms. The LabelEncoder from Scikit-Learn is utilized for this task.
One-Hot Encoding: Another approach for handling categorical variables is one-hot encoding. This code snippet employs OneHotEncoder to convert categorical variables into binary vectors.
Standard Scaling: Scaling numerical features ensures that they contribute equally to the model training process. The StandardScaler standardizes features by removing the mean and scaling to unit variance.
Data Splitting: The dataset is split into training and testing sets using the train_test_split function from Scikit-Learn.
Once the preprocessing is completed, the code prints messages indicating the completion of each step, ensuring clarity and transparency throughout the process.
Fit data to a randomforestclassifier. 🌳 RandomForestClassifier is trained with specified parameters to classify data.
Features are ranked based on importance. Important features are selected and only these are used further in model development.
A precision vs recall curven is used to test at different threshold levels.
(I choose the precision-recall curve over ROC due to imbalance in the Class distribution)
Based on the curve, In order to keep precision of predicting customer churn over 50% and maximizing recall(sensitivity), I select threshold of 0.3.
Random Forest Classifier
This code segment employs a RandomForestClassifier for classification tasks. It consists of an ensemble of 300 decision trees. Each tree has a maximum depth of 20 and a maximum of 150 leaf nodes. The minimum number of samples required to split a node is set to 6.
Adaboost
This code snippet implements an AdaBoostClassifier for classification tasks. AdaBoost, short for Adaptive Boosting, sequentially trains a series of weak learners (in this case, decision trees with a maximum depth of 1 and a maximum of 150 leaf nodes) and assigns them weights based on their performance. The next weak learner focuses more on the instances misclassified by the previous ones.
The AdaBoostClassifier is configured with 1500 estimators (decision trees), each with a learning rate of 0.3. The class_weight = 'balanced'' parameter is utilized to handle class imbalance issues in the dataset. Therefore not used the classification threshold (0.3) in this execution.
Gradiant Boost
This code snippet utilizes a Gradient Boosting Classifier (GB) with 1000 decision tree estimators, each limited to a maximum depth of 1. The classifier predicts probabilities and assigns class labels based on a threshold of 0.50. It visualizes the confusion matrix and prints a classification report summarizing key metrics.
🔍 Data Quality Assessment: In the initial stages of the code, I meticulously assess the quality of the data, meticulously identifying and rectifying any anomalies or missing values that may compromise the integrity of the dataset.
📊 Exploratory Analysis: Subsequently, I delve into exploratory analysis, leveraging statistical techniques and visualization tools to unearth patterns, trends, and correlations within the data. This phase allows me to gain a deep understanding of the underlying structure and dynamics of the dataset, laying a solid foundation for subsequent analysis.
🔍 Curse of Dimensionality & Feature Selection: Moving forward, I employ feature selection techniques, based upon methods such as random forest classifiers to identify the most influential features driving predictive performance. By prioritizing metrics like recall and precision, I strive to strike a balance between model complexity and predictive accuracy of may target variable.
🛠️ Model Development: As I fine-tune model parameters and explore alternative algorithms, I embrace an iterative and experimental mindset, continuously refining and optimizing model performance based on empirical evidence and domain expertise. Leveraging techniques such as grid search, I systematically explore the hyper parameter space to identify the optimal configuration for maximizing model efficacy.
🏆 Model Selection: Finally, through comprehensive evaluation and comparison of model outcomes, I select the most suitable model for deployment. In this case, the AdaBoost model which accounts for class imbalance, is the preferred choice, demonstrating superior performance for the task.
Precision : 50 % (minimum requirement) 🌟
Sensitivity/recall: 85 % (Maximized) 🏅