This project was designed to predict customer churn for a fictional bank, utilizing machine learning techniques. By identifying customers who are at risk of leaving, the bank can take proactive measures to retain them and minimize losses. As the host of the Kaggle competition, I organized the challenge, provided the dataset, and curated the problem to ensure a robust competition with a real-world application.
The workflow for this project involved:
Data collection and exploration
Data preprocessing and feature engineering
Model selection and evaluation
Model deployment using Streamlit for real-time predictions
Technologies Used
Programming Languages: Python
Libraries/Frameworks: Pandas, Scikit-learn, XGBoost, SMOTE (Synthetic Minority Over-sampling Technique), Streamlit
Tools: Jupyter Notebooks, GitHub, Kaggle.
Data Overview
The dataset contains several features that describe customers' demographics, bank interactions, and transaction history. It was critical to preprocess the data before applying machine learning algorithms.
Key features include:
Demographics: Age, Gender, Marital Status, etc.
Bank Interaction: Account Type, Credit Score, etc.
Transaction History: Total Spent, Total Credit, Monthly Charges, etc.
Preprocessing Steps:
Categorical Variable Encoding: Applied One-Hot Encoding for categorical variables such as Gender and Geography.
Data Balancing: Used SMOTE to generate synthetic data points for the minority class in the target variable (Exited), ensuring that the model was not biased toward the majority class.
Several machine learning models were trained and evaluated using ROC-AUC Score as the primary performance metric:
Logistic Regression
Served as the baseline model. It offered interpretability and decent performance.
ROC-AUC Score: 0.8714
Decision Tree Classifier
A simple tree-based model prone to overfitting.
ROC-AUC Score: 0.7797
Random Forest Classifier
Performed exceptionally well by combining multiple trees to reduce variance.
ROC-AUC Score: 0.9215
K-Nearest Neighbors (KNN)
Performance was limited, likely due to the dataset’s complexity and scale sensitivity.
ROC-AUC Score: 0.5704
XGBoost Classifier
Showed high performance and robustness to class imbalance.
ROC-AUC Score: 0.9242
Model Selection & Tuning
Random Forest emerged as the top performer. To further improve its performance, RandomizedSearchCV was used for hyperparameter tuning.
Tuned Random Forest ROC-AUC Score: 0.9287
I hosted the Bank Customer Churn Prediction competition on Kaggle to engage the data science community in solving real-world problems. The competition provided a challenge for participants to predict customer churn and compete for top rankings based on model performance.
Key features of the competition:
Problem Statement: Predict whether a customer will leave the bank based on their profile and transaction data.
Evaluation Criteria: Leaderboard ranking based on accuracy, precision, recall, and F1-score.
Competition Platform: Kaggle, where participants could submit their models and compete with others.
The competition was designed to help participants practice their data preprocessing, model building, and evaluation skills in a real-world scenario.
After training the best model, I deployed it using Streamlit for easy interaction and real-time predictions. Users can input customer details and get an immediate prediction on whether the customer is likely to churn.
The Streamlit app is hosted and accessible to users, providing an easy-to-use interface for non-technical users to interact with the model.
View the life App here
The final model achieved an impressive 93% accuracy in predicting customer churn.
This model can be used by banks to improve customer retention strategies, such as personalized marketing campaigns and targeted offers.
The deployed app allows for real-time predictions, enabling bank employees to instantly evaluate customer risk.
Algorithm Exploration: Experimenting with other advanced machine learning algorithms such as Gradient Boosting and Neural Networks to further enhance predictive performance.
UI/UX Improvements: Enhancing the user interface of the Streamlit app to make it more intuitive and visually appealing.
Real-Time Data Integration: Integrating the model with real-time transactional data to provide up-to-date churn predictions.
Repository Links
💬 Want to learn more, give feedback, or collaborate?
Reach out via LinkedIn or GitHub for potential collaboration or questions!