Fuzzy c-means clustering algorithm

This algorithm works by assigning membership to each data point corresponding to each cluster center on the basis

of distance between the cluster center and the data point. More the data is near to the cluster center more is its

membership towards the particular cluster center. Clearly, summation of membership of each data point should be

equal to one. After each iteration membership and cluster centers are updated according to the formula:

where,

'n' is the number of data points. 'vj' represents the j^th cluster center. 'm' is the fuzziness index m € [1, ∞]. 'c' represents the number of cluster center. 'µij' represents the membership of i^th data to j^th cluster center. 'dij' represents the Euclidean distance between i^th data and j^th cluster center.

Main objective of fuzzy c-means algorithm is to minimize:

where,

'||x_i– v_j||' is the Euclidean distance between i^th data and j^th cluster center.

Algorithmic steps for Fuzzy c-means clustering

Let X = {x₁, x₂, x₃ ..., x_n} be the set of data points and V = {v₁, v₂, v₃ ..., v_c} be the set of centers.

1) Randomly select ‘c’ cluster centers.

2) Calculate the fuzzy membership 'µ_ij' using:

3) Compute the fuzzy centers 'v_j' using:

4) Repeat step 2) and 3) until the minimum 'J' value is achieved or ||U^(k+1)- U^(k)|| < β.

where,

‘k’ is the iteration step.

‘β’ is the termination criterion between [0, 1].

‘U = (µ_ij)_n*c’ is the fuzzy membership matrix.

‘J’ is the objective function.

Fig I: Result of Fuzzy c-means clustering

Advantages

1) Gives best result for overlapped data set and comparatively better then k-means algorithm.

2) Unlike k-means where data point must exclusively belong to one cluster center here data point is assigned

membership to each cluster center as a result of which data point may belong to more then one cluster center.

Disadvantages

1) Apriori specification of the number of clusters.

2) With lower value of β we get the better result but at the expense of more number of iteration.

3) Euclidean distance measures can unequally weight underlying factors.

References

1) Fuzzy c-means by Balaji K and Juby N Zacharias.

2) Fast and Robust Fuzzy C-Means Clustering Algorithms Incorporating Local Information for Image Segmentation by

Weiling Cai, Songcan Chen and Daoqiang Zhang.

3) http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html