Programme

LMS invited lectures by Gitta Kutyniok

Lecture 1: Introduction to Deep Neural Networks

Deep neural networks are today the work horse of artificial intelligence. However, despite the outstanding success of deep neural networks in real-world applications, most of the related research is empirically driven and a comprehensive mathematical foundation is still missing. In this lecture, we will start by reviewing the basic principles from statistical learning theory, which is the basis for several results on mathematics of deep learning. This is then followed by an introduction into deep neural networks, covering in particular different network architectures, training algorithms, and application settings. This shall serve also as an introduction into the exciting area of mathematics of deep learning and the preparation for the next lectures.

Lecture 2: Deep Neural Networks: From Approximation to Expressivity

Regarding deep learning as a statistical learning problem, the core theory can be divided into the research directions of expressivity, learning, and generalization. Recently, the new direction of explainability, robustness, and fairness became important as well. In this lecture, we will first focus on expressivity of deep neural networks, which aims to provide a deep understanding of the impact of the architecture of a neural network on its performance. We will discuss the amazing universality properties of deep neural networks, performing at least as good as virtually any classical approximation scheme. We will then focus on lower bounds for the complexity of approximating neural networks leading to optimal approximation results such as in the sparsely connected regime.

Lecture 3: Deep Neural Networks: Analyzing the Training Algorithm

Deep neural networks are traditionally trained using stochastic gradient descent. Although no comprehensive mathematical understanding of the energy landscape and convergence properties does exist to date, we already witness intriguing results which shed first light on those questions. This ranges from approaches which aim to stabilize the training by using architectures inspired by partial differential equation/optimal control over the mathematical analysis of tricks during the training phase such as drop out up to the analysis of phenomena such as neural collapse. In this lecture, we provide an introduction into selected theoretical results.

Lecture 4: Deep Neural Networks: The Mystery of Generalization

The amazing ability of deep neural networks to perform well on unseen data is still one of the big mysteries of deep learning. In fact, the classical statistical learning theory only explains the training and test risk in the low parameter regime, whereas the behavior in the high parameter regime is not understood at all. In this lecture, we will discuss exciting first results such as the line of research introducing and using the concept of a neural tangent kernel. In addition, in some situations such as graph CNNs, certain instances of generalization such as transferability are already completely understood, which we will cover as well. We will end the question with a discussion on possible research directions to unravel this mystery.

Lecture 5: Deep Neural Networks: Opening the Black Box via Explainability Methods

This lecture focusses on the area of explainability, which considers ready-to-use neural networks, aiming to ``break open the black box'' they typically appear as. More specifically, one intends to identify those features from the input, which are most crucial for the observed output. In a real-world setting, a vision would be to receive an explanation of a network decision which is indistinguishable from a human explanation. We start by discussing classical approaches such as sensitivity analysis, and then turn to more recent work covering in particular the Deep Taylor Approach, surrogate model based methods such as LIME, and game theoretic methods such as SHAP. Finally, we present a profound mathematically founded approach based on rate-distortion theory, where the task of explaining a network decision can be formulated as an optimization problem, applicable to different data situations.

Lecture 6: Deep Neural Networks: Towards Robustness

It is by now well-known that situations in which deep learning-based techniques can dramatically fail under small perturbations such as adversarial examples in image classification can be easily constructed. We will start this lecture by reviewing such approaches. This will lead us to the question of whether robust neural networks do exist. In fact, at present many mathematical results exist which point towards the non-existence of entirely robust neural networks. We will discuss those first results from the mathematical perspective of how robust neural networks can be nevertheless generated, and/or to which extent the mathematical model deep neural networks represent needs to be generalized. This will also lead us to the question of complexity limits of deep learning.

Lecture 7: Limitations of Deep Neural Networks

Deep neural networks are at present considered as the "miracle cure" for almost all problems. And, indeed, it seems that their applicability is endless. However, one can rigorously prove that neural networks do have limitations, sometimes even of a severe nature. One of the problems we will address is the fact that the training is currently performed on standard digital hardware. This will lead us into computability theory and the question of whether and to which extent the training of deep neural networks can be performed for any given accuracy.

Lecture 8: Inverse Problems meet Deep Learning: Optimal Hybrid Methods

Inverse problems in imaging such as denoising, recovery of missing data, or the inverse scattering problem appear in numerous applications. However, due to their increasing complexity, model-based methods are often today not sufficient anymore. At the same time, deep learning methods current sweep the area often leading quickly to state-of-the-art approaches. This lecture is devoted to first reviewing the main classical model-based approaches to solve ill-posed inverse problems, followed by an introduction to deep learning-based methodologies. A particular focus will be on hybrid methods which (optimally) combine model- and learning-based methods to not neglect known and valuable information from the modeling world. As a numerical illustrative example, we will discuss the application of limited-angle computed tomography.

Lecture 9: Partial Differential Equations meet Deep Learning: Beating the Curse of Dimensionality

While the area of inverse problems, predominantly from imaging, was very quick to embrace deep learning methods with tremendous success, the area of numerical analysis of partial differential equations was much slower. However lately, impressive success could be reported for very high dimensional partial differential equations with even a precise theoretical analysis that such approaches can beat the curse of dimensionality. This lecture shall serve as an introduction into this exciting research direction. We will present both theoretical results which prove that deep neural networks are able to beat the curse of dimensionality in different problem settings and numerical results indicating the universality of neural networks for solving partial differential questions, also in the parametric regime.

Lecture 10: Mathematical Foundations of Deep Learning: Potential, Limitations, and Future Directions

The final lecture will provide a review of the previously presented mathematical results in the area of deep learning. We will discuss the potential of deep neural networks and artificial intelligence in general from a mathematical perspective. In this realm, we also have to debate the limitations of these methods and how they might be overcome. This will lead us to intriguing problems as future research directions.


Accompanying lectures

Peter Bartlett

Title: Benign overfitting in linear and nonlinear settings

Abstract: Benign overfitting is a surprising phenomenon revealed by deep learning practice: even without any explicit effort to control model complexity, deep learning methods find prediction rules that give a near-perfect fit to noisy training data and yet exhibit excellent prediction performance in practice. This talk surveys results on methods that predict accurately in probabilistic settings despite fitting too well to training data. We give a characterization of this phenomenon in linear regression and in ridge regression, and we present a fundamentally nonlinear setting where benign overfitting occurs: a two-layer neural network trained using gradient descent on a classification problem.

Weinan E

Title: Mathematics of Deep Learning: Some Open Problems

Abstract: There are two important directions of research in mathematics of deep learning. The first is to understand the mathematical principles behind deep learning. The second is to develop new formulations of deep learning. Regarding the first direction, the two most important issues are: Why deep learning performs so well for such high dimensional problems and why the performance is so sensitive to the choice of hyperparameters. Regarding the second direction, one idea that has been hotly pursued in the last several years is to use ODEs and PDEs to help develop new formulations of deep learning. A lot of progress has been made in both directions, but many fundamental issues remain open. In this talk, we will discuss some of these open questions.

Klaus-Robert Müller

Title: Machine Learning for the Sciences: towards understanding

Abstract: In recent years, machine learning (ML) and artificial intelligence (AI) methods have begun to play a more and more enabling role in the sciences and in industry. In particular, the advent of large and/or complex data corpora has given rise to new technological challenges and possibilities. In his talk, Müller will touch upon the topic of ML applications in the sciences, in particular in neuroscience, medicine and physics. He will also discuss possibilities for extracting information from machine learning models to further our understanding by explaining nonlinear ML models. E.g. Machine Learning Models for Quantum Chemistry can, by applying interpretable ML, contribute to furthering chemical understanding. Finally, Müller will briefly outline perspectives and limitations.

Rebecca Willett

Title: The Role of Linear Layers in Nonlinear Interpolating Networks

Abstract: An outstanding problem in understanding the performance of overparameterized neural networks is to characterize which functions are best represented by neural networks of varying architectures. Past work explored the notion of representation costs — i.e., how much does it “cost” for a neural network to represent some function? For instance, given a set of training samples, consider finding the interpolating function that minimizes the representation cost; how is that interpolant different for a network with three layers instead of two layers? Both functions have the same values on the training samples, but they may have very different behaviors elsewhere in the domain. In this talk, I will describe the representation cost of a family of networks in which the first layers all have linear activations and the final layer has a ReLU activation. Our results show that the linear layers in this network yield a representation cost that reflects a complex interplay between the alignment and sparsity of ReLU units. For example, using a neural network to fit training data with minimum representation cost yields an interpolating function that is constant in directions perpendicular to a low-dimensional subspace on which a parsimonious interpolant exists. We will explore these effects and their implications for future work on generalization. This is joint work with Greg Ongie.