Data Geometry and Deep Learning

This is a postgraduate course aimed at Mathematics students interested in the theory and mathematics of Deep Learning.

Abstract: The idea is to give the basic ingredients to understand what Deep Learning theory is, and of what are the underlying mathematical/theoretical structures. Hopefully at the end of the course, the mathematics students that follow it will be able to read Deep Learning research papers without being lost, and will have the basic tools to start working on some important maths-rich topics. Lectures 7-12 below, each will cover an important direction of research for which some first mathematical steps have been done, but which has many subtopics and extensions left open. Particular emphasis is given to the view that the geometry of datapoints and of learning algorithms has important practical consequences, some of which have started to emerge in the last 3-4 years.


Dates: November 15th 2022 - December 20th 2022,

Timetable: Tuesdays 3-5PM, Thursdays 3-5PM,

Place: Math Department "Bruno Castelnuovo" of "La Sapienza" University, Room B (first floor).


Links to slides will be posted on this page. Please fill this short form (2mins) if you are interested in being added to the mailing list of the course!


The plan is to spend one lecture on each topic (the below plan may change as the course progresses).

Note that the slides are intended as study material: clickable links in the slides will send you to more detailed references.

Part I: Introduction to Deep Learning:

  1. Introduction and a brief history of Neural Networks. Overview of the course. (Slides) Video recordings (in Italian): part 1 - 6min and part 2 - 80min

  2. Backpropagation, Stochastic Gradient Descent, Convergence. (Slides) (Video recording (in Italian) 100min)

  3. Some very common Neural Network architectures and their motivations (Slides) (Video (Ita) 120min)

  4. Neuromorphic Neural Networks (skipped)

Part II: Staples of classical Deep Learning theory

  1. Generalization: some mathematical interpretations (Slides) ( Video (Ita) 110min)

  2. PAC learning, VC dimension, and expressivity tests for DNN (Slides) (Video parte 2 (Ita) 50 min)

  3. Introduction to Information Theory, and the Information Bottleneck Principle (Slides) (Video parte 1 (45 min), Parte 2 (35 min))

no class on Dec. 1st

Part III: Selected topics of research

  1. Network pruning: the "Lottery ticket hypothesis", and sparsity (Slides) (Video in italiano (90 min))

no class on Dec. 8th

  1. Hyperbolic Neural Networks (Slides)

  2. Equivariant Neural Networks (Slides)

  3. Persistence diagrams and Topological Data Analysis (guest lecture by Sara Scaramuccia) (Slides)

Final presentations by students:

  • Luca Falorsi: Renormalization Group and Restricted Boltzmann Machines

  • Alessio Oliviero: Physics Informed Neural Networks for Optimal Control (Slides)

  • Jacopo Ulivelli: Geometric interpretation of GANs and link with Optimal Transport (Link to exposed paper)

  • Andrea Pizzi: Graph Neural Networks generalized to simplicial complexes (Slides)

  • Lorenzo D'Arca: Teoremi di approssimazione in spazi funzionali (Slides)(Slides in English, paper1, paper2)



Some bibliography given before the course start, for the first 6 lectures (see precise references within each lecture's slides):