This project analyzes customer churn in an e-commerce business using SQL and Python. The goal was to identify key behavioral drivers of churn and build a predictive model to flag high-risk customers before they leave.
How many customers churn vs stay?
Which customer behaviors increase churn risk?
Can churn be predicted in advance?
What action can reduce churn?
Exploratory and diagnostic analysis was conducted using SQL (SQLite) to uncover churn patterns across tenure, satisfaction, complaints, cashback, and distance from warehouse. Python was then used for visualization and to build a logistic regression model for churn prediction.
Churn Rate by Complaint Status:
Customers who filed complaints churn at a much higher rate, making complaints a stronger churn predictor.
Customer Churn Distribution:
Customers who churned were far lesser than customers who stayed.
Distribution of Predicted Churn Probability:
Most customers have a low probability of churn, indicating a generally stable customer base.
Churn Rate by Satisfaction Score:
The Churn rate increases steadily as the satisfaction score worsens.
Tenure by Churn:
Churned customers are heavily concentrated at lower values, while non-churned customers exhibit a broader distribution with a longer right tail. This indicates that churn primarily occurs early in the customer lifecycle
Confusion Matrix:
The model correctly identifies most customers who stay and a meaningful portion of customers who churn, making it useful for targeted retention strategies.
945 - These are customers the model predicted would not churn, and they indeed did not churn.
36 -These are customers the model predicted would churn, but they actually did not churn.
113: These are customers the model predicted would not churn, but they actually did churn.
89: These are customers the model predicted would churn, and they indeed did churn.
A logistic regression model was training to estimate churn probability for each customer. The output enables customer ranking based on churn risk, supporting targeted retention strategies.
Complaints strongly increase churn probability
Lower satisfaction scores are associated with churn
Customers living farther from warehouses churn more
Churn risk is highest early in the customer lifecycle
Focus retention efforts on early-tenure customers.
Prioritize fast resolution of customers.
Proactively engage customers with high churn probability.
Improve experience for customers far from warehouses.
SQL (SQLite / DB Browser)
Python (Pandas, Matplotlib, Scikit-learn)
Logistic Regression
Customer Analytics
GitHub Repository : https://github.com/tosin19-tech/E-commerce-Customer-Churn-Prediction-Logistic-Regression-
I executed this project independently, working through the full analytics lifecycle- from collecting platform analytics data to transforming it, designing the data model, and building insights-driven dashboards in Power BI. The approach involved analyzing cross-platform trend while also diving into individual channel performance to understand both overall marketing health and platform-specific opportunities.
Although completed as an independent analytics project, it was structured as if supporting a digital marketing or social media team, focusing on practical business questions such as:
Which platform performs best?
What content type drives engagement?
Where should effort and content strategy be focused?
How do audience differ across platforms?