Let us study an unsupervised learning using a normal mixture. A normal mixture is defined by a finite mixture of several normal distributions.
An unknown data generating distribution is estimated by a normal mixture. It is applied to automatic clustering, density estimation, and discovery of unknown information structure.
We examine a normal mixture and its prior distribution which are defined by these equations, where {ak} is nonnegative sequence and its summation is equal to 1.
A simple experiment was conducted. A model and a data-generating distribution were set as the left equations. We studied three cases. (1) True distribution is realizable by and singular for a model. (2) True ditribution is not realizable by and singular for a model. And (3) True distribution is realizable by and regular for a model.
Case 1: The posterior distributions in a realizable and singular case are displayed. Four posterior distributions correspond to different samples. The horizontal and vertical lines correspond to a and b, respectively.
The point (0,0) is the singularity of a model.
Case 2: The posterior distributions in a singular and unrealizable case are displayed. In this case, the posterior distributions more strongly depend on different samples.
Case 3: The posterior distributions in a realizable and regular case are displayed. Even in this case, the posterior distributions do not concentrate on the neighborhood of the unique true parameter (0.5, 1).
Data generating distribution is a normal mixture with 3 components, whereas a learning machine is that with 2 components.
A sample size increases gradually. Generalization error, LOOCV, and WAIC are compared. In this case, the data-generating distribution is unrealizable by a statistical model, resulting that the generalization error does not converge to zero.
Data generating distribution is a normal mixture with 3 components, whereas a learning machine is that with 3 components.
A sample size increases gradually. Generalization error, LOOCV, and WAIC are compared. In ths case, the data-generating distribution is realizable by and regular for a statistical model, resulting that the real log canonical threshold is equal to half of the dimension of the parameter.
Data generating distribution is a normal mixture with 3 components, whereas a learning machine is that with 5 components.
A sample size increases gradually. Generalization error, LOOCV, and WAIC are compared. In this case, the data-generating distribution is realizable by and singular for a statistical model. The real log canonical threshold is not trivial.