Review: Unsupervised learning of finite mixture models

Post date: Sep 8, 2011 4:38:13 PM

Overall, the paper discuss very intuitively about several types of EM algorithms derived from several criteria, for instance, BIC, MDL, MML and the proposed method. The proposed EM algorithm has good properties:

Automatically select the number of components
less sensitive to the initialization
avoid the boundary of the parameters space, that is, preventing parameters from being zeros by removing the parameters near zero.

The authors makes the code available here. There are some interesting ideas I learned from the paper irrelevant to the proposed algorithm itself.

The authors prove that the EM algorithm using ML is useless as a criterion to estimate the number of components; only 1 paragraph!!!
There are 2 types of model selection criteria:
- Deterministic: All those BIC, MDL, MML, etc.
- Stochastic: MCMC, resampling-based scheme, cross-validation, but they are too computationally expensive, so not practical.
ICL and LEC seem to be good criteria for model selection, and it reportedly outperforms others including BIC.
EM is highly dependent on the initialization.
There are several types of EM mentioned in the paper
- split and merge EM (SMEM) to escape from local minima
- Deterministic annealing (DA) has the temperature T as a multiplication constant to the identical covariance matrix I. T is then annealed gradually.
- self-annealing behavior of EM algorithm. The authors proved that the traditional EM itself can be self-annealed when setting non-informative prior on the starting label, that is, assigning uniform prior to each of the sample.
- CEM2 updates its components' parameters sequentially one at a time, and compute the membership posterior every time the parameters are updated. This algorithm makes EM more robust when initializing with large number of samples.
The proposed EM algorithm is derived using the new criteria incorporating the mixture coefficient explicitly, which enables the algorithm to eliminate those components near the parameters boundary. Furthermore, CEM2 is combined with the proposed EM algorithm resulting in more robust algorithm when having huge number of samples.
The paper discusses very intuitively about how to initialize the EM and the rationale behind it.