Graduate Seminar:

Random Matrices, Spin Glasses and Deep Learning

Content

In recent years deep learning techniques have been applied with great success to a vast number of problems in machine learning whose solutions have already found their way into our modern technology. Image recognition [8] as well as speech recognition and synthesis [7, 11] are just a few instances of this success story. Despite the obvious benefits of these algorithms the mechanisms behind their surprising effectiveness have remained elusive from a theoretical point of view. In mathematical terms deep learning algorithms attempt to minimise a complicated loss function on a very high dimensional parameter space, the space of all programs of a specified design. This function effectively measures the difference between reality and prediction and its minimiser can be viewed as a program that makes optimal predictions. Under certain assumptions the loss function resembles the Hamiltonian of spin glasses [6]. Spin glasses are statistical mechanics systems that describe the interaction of a large number of randomly coupled spins. They are well studied in the mathematical literature [10]. Thus the connection to spin glasses promises to narrow the gap between applications and theoretical understanding of deep learning algorithms [1, 2, 3, 4]. On the other hand, many results about spin glasses rely on another highly prosperous area of mathematical research, the theory of random matrices. In particular, the behaviour of eigenvalues at the spectral edge of large dimensional matrices with random entries plays a fundamental role because it determines the efficiency with which the Hamiltonian of a spin glass can be minimised. But also more direct connections between random matrices and deep learning can be found [9]. In this graduate seminar we will explore the connection between the three prominent fields of research: random matrices, spin glasses and deep learning. To achieve this goal we will study the recent articles mentioned above, but also briefly cover well established toy models such as the Hopfield model [5].

For a short heuristics explaining the relationship between random matrices, spin glasses and deep learning see the handout of our first meeting.


Instructions

If you want to participate in the seminar, please send an email to torben.krueger(at)uni-bonn.de. If you need a certificate you also have to register in BASIS.


Schedule

The seminar takes place on Fridays at 12PM (c.t.) in SR N0.007. The following preliminary schedule may still change until the beginning of the semester.

Date

July 26

October 12

October 19

October 26

November 02

November 09

November 16

November 23

November 30

December 07

December 14

December 21

January 11

January 18

January 25

Topic

First Meeting

Introduction to deep learning

Applications of deep learning

Nonlinear random matrix theory for deep learning

The random energy model

Spin glasses: The SK model (1)

Spin glasses: The SK model (2)

Critical temperature of the 2-spin SK model

Hopfield models (1)

Hopfield models (2)

No seminar

No seminar

Random matrices and complexity of spin glasses (1)

Random matrices and complexity of spin glasses (2)

The loss function of multilayer networks

Prerequisites

The participants are expected to have a basic knowledge of linear algebra, analysis and probability theory.


Participation

Every participant who needs a certificate is expected to prepare a talk of approximately 90 minutes (projector or blackboard) that reflects a good understanding of the underlying literature source and a written summary of the chosen topic that is to be provided to the audience. The talks can be based on chapters of books or research articles and has to be fully prepared an discussed with me one week before the scheduled date. The reference list below provides ample examples but based on the preferences of the participants many other topics are possible. Interested graduate students and postdocs are very welcome to join as well.


Literature

[1] M. Advani, S. Lahiri, and S. Ganguli. Statistical mechanics of complex neural systems and high dimensional data. Journal of Statistical Mechanics: Theory and Experiment, 2013(03):P03014, 2013.

[2] G. B. Arous and A. Jagannath. Spectral gap estimates in mean field spin glasses. arXiv:1705.04243.

[3] A. Auffinger, G. B. Arous, and J. Černý. Random matrices and complexity of spin glasses. Communications on Pure and Applied Mathematics, 66(2):165–201.

[4] A. Auffinger and G. Ben Arous. Complexity of random smooth functions on the high-dimensional sphere. Ann. Probab., 41(6):4214–4247, 11 2013.

[5] A. Bovier, R. Gill, and B. Ripley. Statistical Mechanics of Disordered Systems: A Mathematical Perspective. Cambridge Series in Statistica. Cambridge University Press, 2006.

[6] A. Choromanska, M. Henaff, M. Mathieu, G. Arous, and Y. LeCun. The loss surfaces of multilayer networks. Journal of Machine Learning Research, 38:192–204, 2015.

[7] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6):82–97, Nov. 2012.

[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.

[9] J. Pennington and P. Worah. Nonlinear random matrix theory for deep learning. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2637–2646. Curran Associates, Inc., 2017.

[10] M. Talagrand. Spin Glasses: A Challenge for Mathematicians: Cavity and Mean Field Models. A series of modern surveys in mathematics. Springer, 2003.

[11] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. W. Senior, and K. Kavukcuoglu. Wavenet: A generative model for raw audio. In SSW, page 125. ISCA, 2016.