Algebraic Geometry

An algebraic set V is defined as the common zero points of several polynomials. The geometric property of V can be studied by using the ideal I(V) which consists of all polynomials that are zero on V. This figure illustrates an algebraic set V={(x,y,z); y^2-x^3-x^2z=0}.


If you are interested in this page, the following book may be useful.

S. Watanabe, Algebraic Geometry and Statistical Learning Theory, Cambridge University Press, 2009. 

For example, Let V be an irreducible algebraic set. The set of singularities of V is defined as the set of all points in V where the rank of the Jacobian matrix defined from the generator of I(V) is not maximum.


Remark.  An algebraic set V is said to be irreducible if V = V1 U V2 using some algebraic sets V1 and V2, then V=V1 or V=V2. 

Let V be an algebraic set : x^3+y^3-3xy=0. Then I(V) is generated from x^3+y^3-3xy. Thus Jacobi matrix J(x,y) can be calculated. The rank of J(0,0) =0, whereas J(x,y)=1 if (x,y) is not the origin. Thus the origin is a singularity of V. 

By the same way, the set of singularities can be found. By the definition, the set of singularities is a strict subset of an algebraic set. By using the inverse function theorem, an algebraic set in a neighborhood of nonsingular point can be understood as a (nonsingular) manifold. 

By the same way, we can find the set of singularities of a general algebraic set. In this case, the set of singularities is {(x,y,z);x=y=0}. 

Blow-up is one of methods how to find resolution of singularities. First, find the set of singularities, sing(W), of an algebraic set W = {y^3-x^2=0}. In this case, the origin V={O} is a subset of singularities of W which is nonsingular itself. 

Second, remove a subset V={O} from W={y^3-x^2=0}.

Third, calculate blow-up in each local coordinate.

Fourth, find the closure of the set using Zariski Topology. In this case, the closure by the Euclidean topology results in the same set.

Fifth, the total transform is the uninon of the strict one and the exceptional set. By Gluing two local coordinate, the blow-up of W with the center V, BV(W), is obtained. 

The same procedure can be illustrated by using projective space. The set BV(W) is a subset in  R^2  times P^1.  

The obtained set BV(W) has no singularity, whereas the original set W has a singularity. The general procedure of the blow-up in a high dimensional case can be defined by the same way. 

Hironaka's resolution theorem claims that an arbitrary algebraic set can be made to have only normal crossing singularities by using finite recursive blow-ups. This is one of the most fundamental and important theorem in algebraic geometry, which was proved by Professor Heisuke Hironaka 1964. Professor Atiyah 1970 and Professor Kashiwara 1976 applied this theorem to Gelfand conjecture 1954 of singular Schwartz distribution and rationality of b-function, respectively. 

Hironaka also proved that resolution of singularities can be found by finite recursive blow-ups. Here the center of each blow-up is nonsingular subset contained in singularities. 

Statistical models and learning machines such as deep learning contain singularities in their parameter spaces. Hironaka's theorem is also the algebro-geometric foundation of deep learning theory. In fact, the Bayesian marginal likelihood and generalization error are determined by the singularities. 

Let f(x,y)=y^3-x^5 be a polynomial. This figure shows the algebraic set defined by f(X,Y)=0. This set has a unique tangent line Y=0 at the origin (0,0). However, the rank of the Jacobi matrix (3Y^2,6X^4) is zero if and only if (X,Y)=(0,0). Hence the origin is a singularity. 

This figure shows exp{ -n f(X,Y)^2 }, where f(x,y)=y^3-x^5 and n=30. The posterior distribution of a singular learning machine such as deep neural networks has such a singular shape. It should be emphasized that the posterior distribution concentrates onto singularities, as sample size increases. 

This figure shows an example of Langevin dynamics, which is equal to the steepest decent with a gaussian noise. It is well known that the stochastic process by the Langevin dynamics converges to the Boltzmann distribution. If the energy function or Hamiltonian function is equal to the log likelihood function, then the Boltzmann distribution is equal to the Bayesian posterior distribution. Near singularities, its dynamics resembles a random walk.