In recent years deep learning techniques have been applied with great success to a vast number of problems in machine learning whose solutions have already found their way into our modern technology. Image recognition [8] as well as speech recognition and synthesis [7, 11] are just a few instances of this success story. Despite the obvious benefits of these algorithms the mechanisms behind their surprising effectiveness have remained elusive from a theoretical point of view. In mathematical terms deep learning algorithms attempt to minimise a complicated loss function on a very high dimensional parameter space, the space of all programs of a specified design. This function effectively measures the difference between reality and prediction and its minimiser can be viewed as a program that makes optimal predictions. Under certain assumptions the loss function resembles the Hamiltonian of spin glasses [6]. Spin glasses are statistical mechanics systems that describe the interaction of a large number of randomly coupled spins. They are well studied in the mathematical literature [10]. Thus the connection to spin glasses promises to narrow the gap between applications and theoretical understanding of deep learning algorithms [1, 2, 3, 4]. On the other hand, many results about spin glasses rely on another highly prosperous area of mathematical research, the theory of random matrices. In particular, the behaviour of eigenvalues at the spectral edge of large dimensional matrices with random entries plays a fundamental role because it determines the efficiency with which the Hamiltonian of a spin glass can be minimised. But also more direct connections between random matrices and deep learning can be found [9]. In this graduate seminar we will explore the connection between the three prominent fields of research: random matrices, spin glasses and deep learning. To achieve this goal we will study the recent articles mentioned above, but also briefly cover well established toy models such as the Hopfield model [5].

For a short heuristics explaining the relationship between random matrices, spin glasses and deep learning see the handout of our first meeting.


The participants are expected to have a basic knowledge of linear algebra, analysis and probability theory.


Every participant who needs a certificate is expected to prepare a talk of approximately 90 minutes (projector or blackboard) that reflects a good understanding of the underlying literature source and a written summary of the chosen topic that is to be provided to the audience. The talks can be based on chapters of books or research articles and has to be fully prepared an discussed with me one week before the scheduled date. The reference list below provides ample examples but based on the preferences of the participants many other topics are possible. Interested graduate students and postdocs are very welcome to join as well.


