MST based clustering algorithm

The basic idea of MST based clustering algorithm is as follows.

First construct MST(minimum spanning tree) using Kruskal algorithm and then set a threshold value and step size. We then remove those edges from the MST, whose lengths are greater than the threshold value. We next calculate the ratio between the intra-cluster distance and inter-cluster distance and record the ratio as well as the threshold. We update the threshold value by incrementing the step size. Every time we obtain the new (updated) threshold value, we repeat the above procedure. We stop repeating, when we encounter a situation such that the threshold value is maximum and as such no MST edges can be removed. In such situation, all the data points belong to a single cluster. Finally we obtain the minimum value of the recorded ratio and form the clusters corresponding to the stored threshold value. The above algorithm has two extreme cases:

1) With the zero threshold value, each point remains within a single cluster.

2) With the maximum threshold value all the points lie within a single cluster.

Therefore, the proposed algorithm searches for that optimum value of the threshold for which the Intra-Inter distance ratio is minimum. It needs not to mention that this optimum value of the threshold must lie between these two extreme values of the threshold. However, in order to reduce the number of iteration we never set the initial threshold value to zero.

Advantage

1) Comparatively better performance then k-means algorithm.

Disadvantage

1) Threshold value and step size needs to be defined apriori.

References

1) An Efficient Minimum Spanning Tree based Clustering Algorithm by Prasanta K. Jana and Azad Naik.

2) Minimum Spanning Tree Partitioning Algorithm for Micro aggregation by Michael Laszlo and Sumitra Mukherjee.

3) Clustering gene expression data using a graph-Theriotic approach: An application of minimum spanning trees by Y. Xu, V. Olman and D. Xu.

4) Graph-Theriotical methods for detecting and describing getalt clusters by C. T. Zahn.

5) A mentation using minimum spanning trees by Y. Xu, V. Olman and E. C. Uberbacher.