Normal Mixture

Let us study an unsupervised learning using a normal mixture. A normal mixture is defined by a finite mixture of several normal distributions.

An unknown data generating distribution is estimated by a normal mixture. It is applied to automatic clustering, density estimation, and discovery of unknown information structure.

We examine a normal mixture and its prior distribution which are defined by these equations, where {ak} is nonnegative sequence and its summation is equal to 1.

A simple experiment was conducted. A model and a data-generating distribution were set as the left equations. We studied three cases. (1) True distribution is realizable by and singular for a model. (2) True ditribution is not realizable by and singular for a model. And (3) True distribution is realizable by and regular for a model.

Case 1: The posterior distributions in a realizable and singular case are displayed. Four posterior distributions correspond to different samples. The horizontal and vertical lines correspond to a and b, respectively.

The point (0,0) is the singularity of a model.

Case 2: The posterior distributions in a singular and unrealizable case are displayed. In this case, the posterior distributions more strongly depend on different samples.

Case 3: The posterior distributions in a realizable and regular case are displayed. Even in this case, the posterior distributions do not concentrate on the neighborhood of the unique true parameter (0.5, 1).

bayes032.mp4

This is an example of Bayes Estimation.

Data generating distribution is a normal mixture with 3 components, whereas a learning machine is that with 2 components.

A sample size increases gradually. Generalization error, LOOCV, and WAIC are compared. In this case, the data-generating distribution is unrealizable by a statistical model, resulting that the generalization error does not converge to zero.

bayes033.mp4

This is an example of Bayes Estimation.

Data generating distribution is a normal mixture with 3 components, whereas a learning machine is that with 3 components.

A sample size increases gradually. Generalization error, LOOCV, and WAIC are compared. In ths case, the data-generating distribution is realizable by and regular for a statistical model, resulting that the real log canonical threshold is equal to half of the dimension of the parameter.

bayes034.mp4

This is an example of Bayes Estimation.

Data generating distribution is a normal mixture with 3 components, whereas a learning machine is that with 5 components.

A sample size increases gradually. Generalization error, LOOCV, and WAIC are compared. In this case, the data-generating distribution is realizable by and singular for a statistical model. The real log canonical threshold is not trivial.

Researches of singularities in mixture models.

In mixture models, if a statistical model is redundant for a data-generating distribution, then the set of all zero points of Kullback-Leibler divergence has singularities. The statistical effect by singularities in mixture models were firstly studied by K. Yamazaki 2003 [1], by which an upper bound of the real log canonical threshold was derived. The approximation accuracy by variational learning (mean field approximation) was clarified by K. Watanabe 2006 [2]. K. Sato 2021 [3] derived the real log canonical threshold of Van der Monde singularities and gave the real log canonical threshold of the Poisson mixture. N. Hayashi 2021 [4] clarified the exact real log canonical threshold of Latent Dirichlet allocation. N. Kariya 2022 [6] derived the asymptotic distribution of the log marginal likelihood of a normal mixture, by which Bayesian hypothesis test was clarified. T. Watanabe 2022 [7] gave the real log canonical threshold of a mixture of multinomial distributions and showed the existence of phase transition caused by hyperparameter of the prior distribution. WAIC and WBIC for mixture models are summarized in S. Watanabe 2002 [5].

References:

(1) K. Yamazaki, et. al., Singularities in mixture models and upper bounds of stochastic complexity, Neural networks, 16 (7), 1029-1038, 2003.

(2) K. Watanabe, et.al., Stochastic complexities of Gaussian mixtures in variational Bayesian approximation. The Journal of Machine Learning Research 7, 625-644, 2006.

(3) K Sato, et.al., Bayesian generalization error of Poisson mixture and simplex Vandermonde matrix type singularity, arXiv preprint arXiv:1912.13289, 2019.

(4) N. Hayashi. The exact asymptotic form of Bayesian generalization error in latent Dirichlet allocation, Neural Networks, vo.137, pp.127-137, 2021.

(5) S. Watanabe, WAIC and WBIC for mixture models, Behaviormetrika, 48 (1), 5-21, 2021.

(6) N. Kariya, et.al., Asymptotic analysis of singular likelihood ratio of normal mixture by Bayesian learning theory for testing homogeneity, Communications in Statistics-Theory and Methods, 51 (17), 5873-5888, 2022.

(7) T. Watanabe, et.al., Asymptotic behavior of Bayesian generalization error in multinomial mixtures. arXiv:2203.06884, 2022.

Page updated

Google Sites

Report abuse