These are the clusters for K-Means, and the k-value was set to 6 as suggested by Elbow method, Silhouette and Davies Bouldin Score. From this visualization, the biggest cluster is cluster 5 with 369 members while the smallest is cluster 3 with 29 members only. From this overview, we can see for cluster 0, socio economic, salary and the healthy is the main factors to cluster the respondents while cluster 3 is salary, marital status and socio economic.
We can see the number of items or respondents in each cluster in which the total would be 1214 altogether. We can also see the Davies Bouldin score. The score is defined as the average similarity measure of each cluster with its most similar cluster. The minimum value of Davies Bouldin score is 0. Hence, the nearest the score to 0, the better the result. Based on the result we got, the score is good since it is 0.487. Therefore, this is good clusters.
As we can see from the visualization for self-rate health and age, there are six clusters visible and the biggest cluster is purple. The smallest clusters are cluster with red colour and light green coloured with six members each.
This is the dendrogram for agglomerative clustering (Single-link). Based on the result the number of clusters is 2427 which is very different from suggested k-value by Elbow, Silhouette and Davies Bouldin score.
This is the dendrogram for agglomerative clustering. for the k-value we decided to set to 6 since it is unclear to find from the dendrogram itself. We decided to use the same k-value so we can compare the clusters with the K-Means clusters. As we can see from the visualizations, the clusters are very different from one another. The number of respodents in each cluster using K-Means is more balanced than agglomerative.
To sum up everything, I actually prefer doing clustering with RapidMiner than Python. This is because RapidMiner offer better overview or visualization than Python.