Understanding Customers with K-Means Clustering
Understanding Customers with K-Means Clustering
Understanding customers is more important than ever, especially in today's highly competitive market. One effective method for customer segmentation to better understand them is K-Means Clustering, a machine learning technique that groups customers based on their attributes. This project will explore how K-Means can be applied to uncover insights about customers and how these insights can be leveraged to enhance customer experiences.
You can find the full code on GitHub here.
The Dataset: Mall Customers
For simplicity, the dataset chosen is already cleaned, allowing us to focus on more critical aspects such as EDA, understanding how the K-Means clustering model works, building the model, and analyzing the results to see if the clusters make business sense.
The Mall Customer Dataset contains information about customers at a mall, including:
Age: Customer’s age
Gender: Male or Female
Annual Income: Income of the customer
Spending Score: A score assigned by the mall based on the customer’s spending habits (1-100)
Customer Segmentation with K-Means
K-Means Clustering works by partitioning customers into a pre-specified number of clusters (in our case, five), based on their similarities in spending, income, and age. It starts by randomly initializing K centroids, then assigns each data point to the nearest centroid. The algorithm recalculates the centroids based on the mean of the points assigned to each cluster and repeats this process until the centroids no longer change. This results in customer groups who exhibit similar behavior patterns.
Inertia and Silhouette Scores
Once the K-Means model is built, two popular metrics for evaluating how well the clustering represents the data are inertia and the silhouette score.
Inertia: This metric measures how tightly the points within each cluster are grouped. It calculates the sum of squared distances between each point and its assigned cluster center. Lower inertia indicates that the clusters are more compact, meaning customers within the same cluster have more similar behavior. However, inertia alone can’t always determine the optimal number of clusters, as it tends to decrease as more clusters are added.
Silhouette Score: This score ranges from -1 to 1 and measures how similar a point is to its own cluster compared to other clusters. A higher silhouette score indicates that the points are well-matched to their own cluster and poorly matched to neighboring clusters, meaning the clustering is effective.
Optimal Number of Clusters k
To select the optimal number of clusters, we considered both the inertia and the silhouette score. The inertia decreases as we add more clusters, but after K = 5, the inertia starts to decrease more slowly, indicating that adding more clusters doesn’t significantly improve the compactness of the clusters. Meanwhile, for K = 5, the silhouette score was relatively high, indicating that the clusters were both compact and well-separated.
Although five segments might seem like a lot, if the clusters display clear distinctions in their features, the result should still be interpretable. Therefore, 5 clusters will be used for the analysis.
Insights from Clustering
Once K-Means clusters are created, here are some customer segments that I analyzed based on their average spending score, income, and age:
Cluster 0 - Young Affluent Shoppers – High income, high spenders: These are relatively young customers with a high income and a very high spending score. They are likely affluent, frequent shoppers who are willing to spend on premium products. They could be targeted with VIP services, exclusive offers, or something that make them feel special (a voucher on their birthday).
Cluster 1 - Affluent but Conservative Spenders – High income, low spenders: These are high-income customers who tend to shop selectively and prefer saving over frequent spending. Encourage them to increase their basket size by offering discounts on complementary items when they purchase their usual products. For example, when they buy their usual clothing or accessory, provide special offers on matching shoes or bags. To inspire them to explore new products or higher-end options, send promotions for seasonal or premium items.
Cluster 2 - Middle-Aged Moderate Shoppers – Moderate income and spending: Middle-aged individuals with moderate income and spending patterns. They may be practical shoppers, focused on essential purchases rather than indulgent spending. Discounts on mid-range products could increase engagement.
Cluster 3 - Young Value-Conscious Shoppers – Low income but willing to spend on good deals: These are younger customers with lower income but moderate spending scores. They may be more budget-conscious but still willing to spend on affordable luxuries. Marketing strategies could focus on special deals, discounts, or trendy, low-cost products.
Cluster 4 - Older Practical Shoppers – Moderate income, careful spenders: These customers are older with moderate income and low to moderate spending. They may prefer functional or necessity-driven purchases rather than indulgence. They can be targeted with practical products and promotions that emphasize convenience and utility.