Quality Threshold (QT) clustering algorithm

This algorithm requires the apriori specification of the threshold distance within the cluster and the minimum number of elements in each cluster. Now from each data point we find all its candidate data points. Candidate data points are those which are within the range of the threshold distance from the given data point. This way we find the candidate data points for all data point and choose the one with large number of candidate data points to form cluster. Now data points which belongs to this cluster is removed and the same procedure is repeated with the reduced set of data points until no more cluster can be formed satisfying the minimum size criteria.

Algorithmic steps for QT clustering

1) Initialize the threshold distance allowed for clusters and the minimum cluster size.

2) Build a candidate cluster for each data point by including the closest point, the next closest, and so on, until the distance of the cluster surpasses the threshold.

3) Save the candidate cluster with the most points as the first true cluster, and remove all points in the cluster from further consideration.

4) Repeat with the reduced set of points until no more cluster can be formed having the minimum cluster size.

Fig I: Result of QT Clustering with threshold distance =10 & minimum number of element required within a cluster = 5

Advantages

1) Quality Guaranteed - Only clusters that pass a user-defined quality threshold will be returned.

2) Number of clusters is not specified apriori.

3) All possible clusters are considered - Candidate cluster is generated with respect to every data points and tested in order of size against quality criteria.

Disadvantages

1) Computationally Intensive and Time Consuming - Increasing the minimum cluster size or increasing the number of data points can greatly increase the computational time.

2) Threshold distance and minimum number of element in the cluster has to be defined apriori.

References

1) http://en.wikipedia.org/wiki/Cluster_analysis.