Seminar:

Spin Glasses with Applications to Deep Learning

Content

In recent years deep learning techniques have been applied with great success to a vast number of problems in machine learning whose solutions have already found their way into our modern technology. Image recognition [8] as well as speech recognition and synthesis [7, 11] are just a few instances of this success story. Despite the obvious benefits of these algorithms the mechanisms behind their surprising effectiveness have remained elusive from a theoretical point of view. In mathematical terms deep learning algorithms attempt to minimise a complicated non-convex loss function on a very high dimensional parameter space, the space of all programs of a specified design. This function effectively measures the difference between reality and prediction and its minimiser can be viewed as a program that makes optimal predictions. Under certain assumptions the properties of the loss function resemble those of the energy of spin glasses [6, 12]. Spin glasses are statistical mechanics systems that describe the interaction of a large number of randomly coupled spins. They are well studied in the mathematical literature [10]. Thus the connection to spin glasses promises to narrow the gap between applications and theoretical understanding of deep learning algorithms [1, 2, 3, 4, 13]. On the other hand, many results about spin glasses rely on another highly prosperous area of mathematical research, the theory of random matrices. In particular, the behaviour of eigenvalues at the spectral edge of large dimensional matrices with random entries plays a fundamental role because it determines the efficiency with which the Hamiltonian of a spin glass can be minimised. But also more direct connections between random matrices and deep learning can be found [9, 14]. In this graduate seminar we will explore the connection between spin glasses and deep learning and introduce random matrices as a tool to study both. To achieve this goal we will present and discuss the recent articles mentioned above, but also briefly cover well established toy models such as the Hopfield model [5].

For a short heuristics explaining the relationship between random matrices, spin glasses and deep learning see the handout of our first meeting.


Instructions

If you want to participate in the seminar, please send register in StudOn.


Schedule

The seminar takes place Mondays at 12:15 in Übung 2 (Cauerstraße 11) and follows the schedule below.

Date Topic

May 2 First meeting

May 23 The random energy model

June 13 Cancelled

June 27 Connection between spin glasses and deep learning

July 11 Nonlinear random matrix theory for deep learning

Prerequisites

The participants are expected to have a basic knowledge of linear algebra, analysis and stochastics.


Participation

Every participant who needs a certificate is expected to prepare a talk of approximately 90 minutes (projector or blackboard) that reflects a good understanding of the underlying literature source and a written summary of the chosen topic that is to be provided to the audience. The talks can be based on chapters of books or research articles and has to be fully prepared an discussed with me one week before the scheduled date. The reference list below provides ample examples but based on the preferences of the participants many other topics are possible. Interested graduate students and postdocs are very welcome to join as well.


Literature

[1] M. Advani, S. Lahiri, and S. Ganguli. Statistical mechanics of complex neural systems and high dimensional data. Journal of Statistical Mechanics: Theory and Experiment, 2013(03):P03014, 2013.

[2] G. B. Arous and A. Jagannath. Spectral gap estimates in mean field spin glasses. arXiv:1705.04243.

[3] A. Auffinger, G. B. Arous, and J. Černý. Random matrices and complexity of spin glasses. Communications on Pure and Applied Mathematics, 66(2):165–201.

[4] A. Auffinger and G. Ben Arous. Complexity of random smooth functions on the high-dimensional sphere. Ann. Probab., 41(6):4214–4247, 11 2013.

[5] A. Bovier, R. Gill, and B. Ripley. Statistical Mechanics of Disordered Systems: A Mathematical Perspective. Cambridge Series in Statistica. Cambridge University Press, 2006.

[6] A. Choromanska, M. Henaff, M. Mathieu, G. Arous, and Y. LeCun. The loss surfaces of multilayer networks. Journal of Machine Learning Research, 38:192–204, 2015.

[7] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6):82–97, Nov. 2012.

[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.

[9] J. Pennington and P. Worah. Nonlinear random matrix theory for deep learning. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2637–2646. Curran Associates, Inc., 2017.

[10] M. Talagrand. Spin Glasses: A Challenge for Mathematicians: Cavity and Mean Field Models. A series of modern surveys in mathematics. Springer, 2003.

[11] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. W. Senior, and K. Kavukcuoglu. Wavenet: A generative model for raw audio. In SSW, page 125. ISCA, 2016.

[12] N. Baskerville, J. Keating, F. Mezzadri, and J. Najnudel. The loss surfaces of neural networks with general activation functions. arXiv:2004.03959, 2020.

[13] N. Baskerville, J. Keating, F. Mezzadri, and J. Najnudel. A spin-glass model for the loss surfaces of generative adversarial networks. arXiv:2101.02524, 2021.

[14] N. P. Baskerville, D. Granziol, and J. P. Keating. Applicability of Random Matrix Theory in Deep Learning. arXiv:2102.06740.