RESULT AND ANALYSIS

Financial

K-Means (RapidMiner)

These are the clusters for K-Means, and the k-value was set to 6 as suggested by Elbow method, Silhouette and Davies Bouldin Score. From this visualization, the biggest cluster is cluster 5 with 369 members while the smallest is cluster 3 with 29 members only. From this overview, we can see for cluster 0, socio economic, salary and the level of education is the main factors to cluster the respondents while cluster 3 is salary, marital status and socio economic.

We can see the number of items or respondents in each cluster in which the total would be 1214 altogether. We can also see the Davies Bouldin score. The score is defined as the average similarity measure each cluster with its most similar cluster. The minimum value of Davies Bouldin score is 0. Hence, the nearest the score of the clusters to 0, the better the result. Based on the result we got, the score is very good since it is 0.487.

K-Means (Python)

Based on this visualization, we can see there are six clusters altogether. The biggest cluster is the purple one which age ranges from 22-55 and salary ranges from 0-3000. This cluster has respondents from all age group. The smallest cluster is the red cluster which the age ranges from 41-55 while the salary has the range from 11000 to 16000.

Agglomerative (RapidMiner)

This is the dendrogram for agglomerative clustering (Single-link). Based on the result the number of clusters is 2427 which is very different from suggested k-value by Elbow, Silhouette and Davies Bouldin score.

Agglomerative (Python)

This is the dendrogram for agglomerative clustering. for the k-value we decided to set to 6 since it is unclear and hard to find from the dendrogram itself. By using the same k-value, we can compare the clusters with the K-Means clusters. As can be seen from the visualizations, the clusters are very different from one another. The number of members in each cluster using K-Means is more balanced than agglomerative.

To sum up everything, I actually prefer doing clustering with RapidMiner than Python. This is because RapidMiner offer better overview or visualization than Python.

Page updated