FAQ

1) How to make the k-means learning algorithm automatically identify the number of clusters. clusters.

--> For this use the two basic clustering properties.

i) Data point belonging to one cluster should be as close as possible (Intracluster distance).

ii) Data point belonging to different cluster should be as far as possible (Intercluster distance).

So, our aim is to minimize the ratio (Intracluster distance)/(Intercluster distance). Start iteration by initially assuming number of cluster as two and then calculate and record the the ratio (Intracluster distance) / (Intercluster distance), now increment the number of cluster by one and again calculate this ratio and record it. Continue this untill number of clusters reaches to maximum (say 15). Finally, select the number of cluster as the value for which the recorded ratio is minimum of all. Fig I shows the graph obtained for determining the number of cluster centers.

Fig I: Graph for determining the number of clusters

References

1) Determination of number of clusters in k-means clustering and application in color image segmentation by S. Ray and R. H. Turi.

2) Determination of cluster number in clustering microarray data by J. Shen, I. S. Chang, E. S. Lee, Y. Deng and S. J. Brown.

3) A clustering algorithm for datasets with different densities by X. Yang, L. He and H. Lu.