Customer Segmentation Using K-means Clustering
Customer Segmentation Using K-means Clustering
K-means clustering is an unsupervised machine learning algorithm used to cluster data based on similarity. K-means clustering usually works well in practice and scales well to the large datasets.
K-means is applied on large datasets.
To identify High value customers by clusters.
Prepared the data using One-Hot Encode
Most machine algorithm works on only numerical data so I utilized one-hot encode to transform categorical data to numeric data.
Columns such as Education and Marital status was transformed to k_marital and k_education.
Normalize numeric features.
Defined the columns income, total_amount_spent, k_marital, and engagement_score needed for K-means clustering.
Normalized the scales as k-means is sensitive to scale.
Principal component Analysis (PCA)
Conducted PCA to reduce dimensionality and simplify datasets.
Silhouette score analysis
Conducted Silhouette score analysis to determine the quality of the clusters.
Visualization of Clusters
Clusters 1 & 3: High-Income, High-Spending, Highly Engaged thus they are high value customers.
Clusters 0 & 2 are lower-value customers
Which factors influence customer engagement in marketing campaigns?
Correlation Analysis: Does engagement correlate with income or total amount spent?
Does Marital Status Influence Engagement?
Are High-Value Customers Also Highly Engaged?
Key Observations
Income & Total Amount Spent (0.81): very strong correlation. This indicates that, as income increases, the total amount spent tends to increase significantly.
Total Amount Spent & Engagement (0.70): suggesting that customers who spend more are also generally more engaged.
Income & Engagement Score (0.50):Moderate correlation, implying that while higher income tends to be associated with higher engagement, the relationship is not as strong as with spending.
Does Marital Status Influence Engagement?
Single customer not only engage more but also spend more compared to customers who are partnered.
Are High-Value Customers Also Highly Engaged?
The overall trend suggests that as total amount spent increases, engagement score also increases. This aligns with your earlier correlation matrix (0.70 correlation between spending and engagement).
How does Online shopping differ from offline shopping
What are the characteristics of high-value customers?
Cluster 1 and Cluster 3 are your high-value customers (big spenders).
Graduates dominate both clusters (51-54%).
Postgraduates also have a strong presence (38-40%).
Undergraduates are the minority (7-9%).
Marketing campaign result vs high value customers (Cluster 1 & 3)
Key Findings:
Overall Campaign Acceptance Rates:
Cluster 1: 26.57% of customers accepted at least one campaign.
Cluster 3: 27.40% of customers accepted at least one campaign.
Campaign 2 had the lowest acceptance (1.85-2.05%), meaning it was the least effective.
Campaign result vs age group ( To have a little insight to why campaign 2 failed).
High Value Customers (cluster 1 and 3) By Age group (To know which age group is our customer segment).
Campaign result vs age group
High-Value Customers (Cluster 1 and 3) Vs age group
Key observation from the further analysis
Young Adults: Focus on social media and gamified offers.
Seniors: Use traditional methods like direct mail and in-store promotions.
Cluster 0
Moderate to high income range. Low spending behavior. Moderate engagement score. Possible Segment: "High earners but cautious spenders."
Cluster 1
High-income group. Highest total amount spent. High engagement score. Possible Segment: "High-income, high-spending, highly engaged customers."
Cluster 2
Low-income group. Low spending behavior. Lowest engagement score. Possible Segment: "Low-income, low-spending, least engaged customers."
Cluster 3
High-income group. High total amount spent (similar to Cluster 1). High engagement score. Possible Segment: "Loyal and high-spending customers.
Clusters 1 & 3: High-Income, High-Spending, Highly Engaged
Customers in these groups earn the most (66K) and spend significantly more (937–968). They also have the highest engagement scores (22).
Interpretation: These could be premium customers or loyal shoppers who engage frequently and contribute the most revenue.
Clusters 0 & 2: Low-Income, Low-Spending, Low Engagement Customers here earn significantly less (36K) and spend much less (128–133). Their engagement scores (2) are also lower.
Interpretation: These might be budget-conscious shoppers or less frequent buyers who need re-engagement.
Income & Total Amount Spent (0.81): very strong correlation. This indicates that, as income increases, the total amount spent tends to increase significantly.
Total Amount Spent & Engagement (0.70): suggesting that customers who spend more are also generally more engaged.
Income & Engagement Score (0.50):Moderate correlation, implying that while higher income tends to be associated with higher engagement, the relationship is not as strong as with spending.
Engagement Score by Marital: Single people are more engaged.
Cluster 1 and 3 are our high value customers.
High-Value Customers Are Highly Engaged:
The yellow dots (Cluster 3) represent high-spending, highly engaged customers. These customers are likely prime targets for premium campaigns, loyalty programs, or exclusive deals.
Lower Spend, Lower Engagement (Cluster 0): Customers in this cluster are spending less and are also less engaged. They might need more incentives to boost engagement (e.g., personalized offers, targeted ads).
Clusters 1 & 3 have the highest online and offline purchases
Clusters 0 & 2 have significantly lower purchases
Education level correlates with spending behavior.
Graduates and postgraduates make up ~90% of the high-value customer base. Undergraduates are a very small segment of these high spenders. Cluster 3 has a slightly higher proportion of graduates (54%) vs. Cluster 1 (51%).
This could suggest that Cluster 3 leans more towards graduates, whereas Cluster 1 has a slightly higher postgraduate representation.
Cluster 3 is slightly more responsive to campaigns than Cluster 1.Particularly in Campaign 4 (14%), suggesting the campaign's content, timing, or product offering was more appealing.
Campaign 2 was ineffective across both clusters. It had the lowest engagement, suggesting it may need redesigning (wrong product, poor targeting, or bad timing).
Cluster 1 & 3 are generally receptive to marketing efforts (~26-27% acceptance overall). This makes them a prime target for future campaigns.
Young Adults & Middle-Aged Adults are the most engaged in marketing campaigns.
Seniors engage selectively, especially with Campaign 4.
Adults have the lowest response rate across all campaigns.
Campaign 3 seems effective across most groups but especially in Young Adults and Middle-Aged Adults.
Campaign 5 was very effective for Young Adults, suggesting a trend that could be leveraged.
Campaign 2 is the least effective among rest with effective, seems to be a bit effective on the youth.
Cluster 3 shows a strong presence of Seniors and Adults, which suggests that older customers are more engaged and contribute significantly to spending. As you mentioned, this could be due to the fact that they may have more disposable income (possibly due to retirement or established careers) and more time to engage with your brand.
Cluster 1 also mirrors this trend, but with slightly fewer Seniors and Adults. These customers are still highly engaged and tend to spend a lot, indicating they may have substantial financial resources and a willingness to engage with high-end products or services. Young Adults (20-30) in both clusters appear to be the minority, as expected. They are likely still in the earlier stages of their careers, which limits their disposable income and spending potential. Their lower engagement levels also suggest they might be less likely to interact with the brand consistently, or might not prioritize spending on premium products yet.
High Value Customer
For Clusters 1 & 3 (High-value Customers), offer exclusive deals, early access to products, and VIP loyalty programs to retain them. Consider personalized marketing to increase their spending.
For Clusters 0 & 2 (Low-spending, Low-engagement Customers), Implement discount-based incentives to encourage spending. Try email marketing campaigns, loyalty rewards, or free shipping offers to boost engagement.
Clusters 1 & 3 have the highest online and offline purchases which shows they are yet again ou High-Value customers. Adjust the marketing strategies for customers within cluster 0 & 2, email marketing campaign and discount could be a better approach.
Marital Status
Since single people in my dataset are more engaged, which implies that marketing campaigns could be tailored to capitalize on this engagement—for example, by offering promotions or content that resonates specifically with single customers.
Education Qualification
Target Graduates & Postgraduates in marketing strategies. Since these groups are more likely to be high spenders, focus marketing campaigns on them. Use promotions tailored to professionals, premium product offerings, or loyalty programs for alumni networks.
Age Group
For Young Adults:
Focus on Campaign 5 & 3 (which had the highest acceptance).
Use social media marketing, influencer promotions, and mobile-friendly campaigns.
Consider gamified offers, discount codes, and exclusive online deals. Focus on Career Building for Young Adults. While young adults are less engaged and spend less, you could introduce entry-level products or services targeting this group. Marketing could focus on affordable, value-for-money products or discounts to build brand loyalty early on. Also, introducing career-related content, incentives, or benefits might catch their attention as they start earning.
For Middle-Aged Adults:
Keep investing in Campaign 3 & 5, but also refine Campaign 4.
Use email marketing, referral programs, and loyalty points.
Provide premium offers or professional-targeted incentives.
For Seniors:
Campaign 4 worked best—focus on personalized promotions, phone support, and trust-based marketing.
Use direct emails, loyalty rewards, and in-store promotions.
For Adults:
Since response rates are low, test new campaign approaches (e.g., bundled offers, subscription models, or exclusive benefits). Target them with retargeting ads and personalized content
Increase Marketing Targeting for Older Age Groups. Given that Seniors and Adults in Cluster 1 & 3 are the primary contributors to high spending and engagement, you can target these groups with exclusive promotions or luxury product discounts. Offering tailored experiences for older customers, such as loyalty rewards, or time-sensitive offers during their leisure hours, might further boost engagement.