Gaussian(EM) clustering algorithm

This algorithm assumes apriori that there are 'n' Gaussian and then algorithm try to fits the data into the 'n' Gaussian by expecting the classes of all data point and then maximizing the maximum likelihood of Gaussian centers.

Algorithmic steps for Expectation Maximization(EM) clustering

Let X = {x1, x2, x3, ..., xn} be the set of data points

V = {µ1, µ 2, µ 3, ..., µc} be the set of means of Gaussian

P = {p1, p2, p3, …, pc} be the set of probability of occurrence of each Gaussian

1) On the ith iteration initialize:

E-step.

2) Compute the “expected” classes of all data points for each class using:

3) Compute “maximum likelihood µ” given our data class membership distribution using:

where,‘R’ is the number of data points.

Fig I: Showing the result of Gaussian(EM) algorithm for data set of size 'N' = 60

Advantage

1) Gives extremely useful result for the real world data set.

Disadvantage

1) Algorithm is highly complex in nature.

References

1) Clustering with Gaussian Mixtures by Andrew W. Moore.

2) http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/mixture.html