kernel k-means clustering algorithm

This algorithm applies the same trick as k-means but with one difference that here in the calculation of distance, kernel method is used instead of the Euclidean distance.

Algorithmic steps for Kernel k-means clustering

Let X = {a1, a2, a3, ..., an} be the set of data points and 'c' be the number of clusters.

1) Randomly initialize ‘c’ cluster center.

2) Compute the distance of each data point and the cluster center in the transformed space using:

where,

cth cluster is denoted by πc.

‘mc denotes the mean of the cluster πc.

‘Ф(ai)’ denotes the data point ai in transformed space.

Ф(ai). Ф(aj) = exp- (||ai - aj||)*q for gaussian kernel.

= (c + ai.aj)^d for polynomial kernel.

3) Assign data point to that cluster center whose distance is minimum.

4) Until data points are re-assigned repeat from step 2).

Fig I: Result obtained by applying Gaussian Kernel k-means with 'q' =10

Advantages

1) Algorithm is able to identify the non-linear structures.

2) Algorithm is best suited for real life data set.

Disadvantages

1) Number of cluster centers need to be predefined.

2) Algorithm is complex in nature and time complexity is large.

References

1) Kernel k-means and Spectral Clustering by Max Welling.

2) Kernel k-means, Spectral Clustering and Normalized Cut by Inderjit S. Dhillon, Yuqiang Guan and Brian Kulis.

3) An Introduction to kernel methods by Colin Campbell.