Geometry And Machine learning Seminar
Fall 2021
Fall 2021
Speaker Titles and Abstracts
September 2nd, 6PM (KST): Otto van Koert (SNU), Morse Theory and Machine Learning: Some Geometric Components of ML
Abstract: We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.
September 9th, 6:30 PM (KST): Chanwoo Park (SNU), A Convergence Theory for Deep Learning via Over-parametrization (Paper)
Abstract: Deep neural networks (DNNs) have demonstrated dominating performance in many fields; since AlexNet, networks used in practice are going wider and deeper. On the theoretical side, a long line of works have been focusing on why we can train neural networks when there is only one hidden layer. The theory of multi-layer networks remains unsettled. In this work, we prove simple algorithms such as stochastic gradient descent (SGD) can find Global Minima on the training objective of DNNs in Polynomial Time. We only make two assumptions: the inputs do not degenerate and the network is over-parameterized. The latter means the number of hidden neurons is sufficiently large: polynomial in L, the number of DNN layers and in n, the number of training samples. As concrete examples, starting from randomly initialized weights, we show that SGD attains 100% training accuracy in classification tasks, or minimizes regression loss in linear convergence speed eps e^{-T}, with running time polynomial in n and L. Our theory applies to the widely-used but non-smooth ReLU activation, and to any smooth and possibly non-convex loss functions. In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).
September 14th, 6PM (KST): Dongho Lee (SNU), Isomorphism of Hochschild Cohomology of Legendrian Cohomology Algebra and Symplectic Cohomology
Abstract: We demonstrate that the Hochschild cohomology of the Legendrian cohomology of ADE singularities is equal to the symplectic cohomology of the Milnor fiber. We show the computation for the A_n case for small n, and explain what relevant discussions need to be made. The crucial Legendrian surgery formula of Bourgeois, Eliashberg, Ekholm is mentioned in the last part.
September 16th, 6:30 PM (KST): Chanwoo Park (SNU), Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data (Paper)
Abstract: Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic "expansion" assumption, which states that a low probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.
September 21th, 4PM (KST): Dohyeong Kim (SNU), Isomorphism of Legendrian Cohomology DG Algebra and Ginzburg DG Algebra
Abstract: We show that the Legendrian Cohomology algebra is deformation equivalent to the Ginzburg DG algebra. In particular, for A_n singularities and D_n singularities with the characteristic of the base field not equal to 2, the corresponding Hochschild cohomology class is zero, and the algebras are isomorphic.
September 23th, 6:30 PM (KST): Chanwoo Park (SNU), Neural Target Kernel: Convergence and Generalization in Neural Networks (Paper)
Abstract: At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function (which maps input vectors to output vectors) follows the so-called kernel gradient associated with a new object, which we call the Neural Tangent Kernel (NTK). This kernel is central to describe the generalization features of ANNs. While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and stays constant during training. This makes it possible to study the training of ANNs in function space instead of parameter space. Convergence of the training can then be related to the positive-definiteness of the limiting NTK. We then focus on the setting of least-squares regression and show that in the infinite-width limit, the network function follows a linear differential equation during training. The convergence is fastest along the largest kernel principal components of the input data with respect to the NTK, hence suggesting a theoretical motivation for early stopping.
September 28th, 4PM (KST): Yonghwan Kim (SNU), Floer Cohomology Algebra from Plumbings of 2-Spheres
Abstract: We show that the Floer Cohomology algebra generated from plumbings of 2-spheres is isomorphic to the path algebra of the Dynkin diagram. We also show intrinsic formality of the path algebra structure to show that we can restrict the algebra structure to the 0th degree cohomology.
October 5th, 4PM (KST): Yonghwan Kim (SNU), Isomorphism of Gerstenhaber Structures between Floer Cohomology Algebra and Ginzburg DG Algebra
Abstract: We show that Kozsul duality holds between the path algebra of a Dynkin diagram, and the Ginzburg DG algebra from the Dynkin diagram. By a theorem of Keller, we are able to conclude that the Hochschild cohomologies of the two algebras are isomorphic as Gerstenhaber algebras.
October 7th, 6:30 PM (KST): Chanwoo Park (SNU), Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers (Paper)
Abstract: The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network (that can be viewed as a second-order variant of NTK), and connect it to the SGD theory of escaping saddle points.
October 8th, 5PM (KST): Yonghwan Kim (SNU), Reeb Flows without Simple Global Surfaces of Section
Abstract: The existence of global surfaces of section has been a long standing question in dynamics. In this talk, I will construct a Reeb flow on the tight three-sphere whose global surface of section, should it exist, must have at least n boundary components. Moreover, the Reeb flow can be shown to be stable under C^\infty small perturbations. This talk will demonstrate how the linking number of orbits can be used to construct a topological obstruction to global surfaces of section.
October 12th, 4PM (KST): Dongho Lee (SNU), Generators for Path Algebras of Dynkin Diagrams of A_n, D_n Type
Abstract: We compute the generators of the Hochschild cohomology of the path algebra of Dynkin diagrams. Using this result, we are able to compute the generators for symplectic cohomology of A_n, D_n singularities, when the characteristic is not 2.
October 14th, 6:30 PM (KST): Chanwoo Park (SNU), Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks (Paper)
Abstract: Modern neural networks are typically trained in an over-parameterized regime where the parameters of the model far exceed the size of the training data. Such neural networks in principle have the capacity to (over)fit any set of labels including significantly corrupted ones. Despite this (over)fitting capacity in this paper we demonstrate that such overparameterized networks have an intriguing robustness capability: they are surprisingly robust to label noise when first order methods with early stopping is used to train them. This paper also takes a step towards demystifying this phenomena. Under a rich dataset model, we show that gradient descent is provably robust to noise/corruption on a constant fraction of the labels. In particular, we prove that: (i) In the first few iterations where the updates are still in the vicinity of the initialization gradient descent only fits to the correct labels essentially ignoring the noisy labels. (ii) To start to overfit to the noisy labels network must stray rather far from the initialization which can only occur after many more iterations. Together, these results show that gradient descent with early stopping is provably robust to label noise and shed light on the empirical robustness of deep networks as well as commonly adopted heuristics to prevent overfitting.
October 18th, 4PM (KST): Yonghwan Kim (SNU), Morse-Bott Spectral Sequence and Generators of Hochschild Comology
Abstract: We demonstrate a computation for the generators of the Hochschild cohomolgies of the path algebras for A2, A3, D4 singularities. We compare these results to the "geometric picture" via the Morse-Bott spectral sequence. In particular, we show that it is sometimes possible to identify nontrivial differentials in the Morse-Bott spectral sequence.
November 4th, 6:30 PM (KST): Chanwoo Park (SNU), Relationships between Statistical Physics and High-Dimensional Statistics