Second, we take an extremely difficult journey back from algebraic geometry to statistical learning theory. At that time, a new field called singular learning theory will be developed. The path passes through resolution theorem, zeta functions, Schwartz distributions, state density functions, partition functions, empirical process theory, and finally reaches free energy and generalization loss.
Remark : You might think that algebraic geometry has no relation to statistics. However, if you walk the road back from algebraic geometry to statistical learning theory, you will find the completely new, beautiful, and useful world by your own experience.
In deep learning, a statistical sub model corresponds to an algebraic variety in the parameter space. Statistical properties of such models can be captured by two birational invariants. This is the reason why algebraic geometry is necessary to understand deep learning process.
Note: There are many singular learning machines, for example, neural networks, normal mixtures, Boltzmann machines, reduced rank regressions, latent Dirichlet allocations, and so on.
S. Watanabe. Almost all learning machines are singular, IEEE Symposium on Foundations of Computational Intelligence, 2007. DOI: 10.1109/FOCI.2007.371500
In classical regular models, KL divergence can be approximated by a quadratic form, whereas, in modern singular models, it cannot because of singularities.
Although singularities make the generalization errors very small, it has been difficult to analyze its mathematical properties.
Note: Of course, regular models are special examples of singular models. In other words, singular learning theory holds even for regular models. From the mathematical point of view, singular learning theory is an extension of a regular learning theory.
In deep learning, most of eigenvalues of Fisher information matrix are zero. Hence the classical regular theory does not hold. The maximum likelihood estimator often diverges and the posterior distribution is far from any normal distribution.
The problem in machine learning theory caused by singularities is solved by the basic theorem in algebraic geometry. For the statement and concrete example, please see Hironaka resolution theorem. An arbitrary singularity can be made normal crossing in each local coordinate on an appropriate manifold by using a birational transform. This theorem is the most basic and important one in algebraic geometry proved in 1964, whose proof was given by Professor Heisuke Hironaka in RIMS Kyoto University.
The learning process is captured by two birational invariants, real log canonical threshold and singular fluctuation, both of which can be explicitly defined by using the resolution theorem. You will be able to understand the fact that these two mathematical concepts play the central role in machine learning theory.
If you are interested in this page, please see the following book,
S. Watanabe, Algebraic geometry and statistical learning theory, Cambridge University Press, 2009.