Theory and Practice of Deep Learning [3CFU]

Prof Fabrizio Silvestri, Prof Michael Bronstein

June 2022

Graph Neural Networks: Geometric, Structural and Algorithmic Perspectives

13 June 2022, 09:00 (Aula Magna)
DIAG department, Sapienza University of Rome

Prof. Petar Veličković (DeepMind)

Recording (link) , passcode: P%zHx4=A
Recording (link) , passcode: TtZ1V+TV

Abstract & Schedule
Recent years have seen a surge in research on graph representation learning, including techniques for deep graph embeddings, generalizations of CNNs to graph-structured data, and neural message-passing approaches. These advances in graph neural networks (GNNs) and related techniques have led to new state-of-the-art results in numerous domains: chemical synthesis, vehicle routing, 3D-vision, recommender systems, question answering, continuous control, self-driving and social network analysis. Accordingly, GNNs regularly top the charts on fastest-growing trends and workshops at virtually all top machine learning conferences.

In this series of lectures, I will attempt to provide several "bird’s eye" views on GNNs. Following a quick motivation on the utility of graph representation learning, I will derive GNNs from first principles of permutation invariance and equivariance. We will discuss how we can build GNNs that are not strictly reliant on the input graph structure, and how we can categorise their expressive power using graph isomorphism testing. Finally, we will explore an emerging connection between GNNs and classical algorithms, and demonstrate how we successfully used this connection to power mathematical discovery (a milestone which recently graced the cover of Nature).

The talk will be geared towards a generic computer science audience, though some basic knowledge of machine learning with neural networks will be a useful prerequisite.

The content is inspired by my ongoing work on the categorisation of geometric deep learning, alongside Joan Bruna, Michael Bronstein and Taco Cohen.

Schedule: (start 5 minutes after the hour)

10 min: Introduction: Why study data on graphs?
20 min: Permutation Invariance and Equivariance: Neural Networks on Sets
20 min: Graph Neural Networks
(10 min break/discussion)
25 min: Latent Graph Inference: How to run GNNs when there is no graph?
25 min: Expressive power of GNNs: The Weisfeiler-Lehman Hierarchy
(10 min break/discussion)
25 min: Neural Algorithmic Reasoning
25 min: Case study: GNNs power mathematical discovery
(5 min break before Q&A)

Energy-Based Models

14 June 2022, 09:00 (Aula Magna)
DIAG department, Sapienza University of Rome

Prof. Alfredo Canziani (NYU Courant Institute of Mathematical Sciences)

Abstract, Schedule, and Bio
Energy-Based Models (EBMs) provide a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graph-transformer networks, several manifold learning methods, and joint embedding methods. EBMs capture dependencies between variables by associating a scalar energy to each configuration of the input variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimise the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones.

We will start this tutorial by introducing the EBM terminology by revisiting the training of a simple classifier in terms of shaping its energy function. We will then introduce latent variables (LVs) for modelling the unpredictable component of a given phenomenon and learning one-to-many relationships. Finally, we’ll cover classical examples of architectural, regularised, and contrastive training techniques for EBMs and LV-EBMs.

Schedule

Lesson 1 (2 hours)
- Classification through the lens of energy-based models (EBMs).
- Learning EBMs vocabulary and concepts through a classification example.
Lesson 2 (2 hours)
- Inference and training of latent-variable (LV) conditional and unconditional EBMs.
- How to handle stochasticity.
Lesson 3 (2 hours)
- Generative models: amortised inference for LV-EBMs.
- From LV-EBMs to target prop to autoencoders (architectural, contrastive, and regularised) to adversarial nets.

Bio: Post-Doctoral Deep Learning Research Scientist and Lecturer at NYU Courant Institute of Mathematical Sciences, under the supervision of Professors KyungHyun Cho and Yann LeCun. His research focuses on Machine Learning for self-driving vehicles. He has delved into the study of uncertainty estimation networks. Passionate about new teaching methods, he believes that online teaching provides the means to reach a wider audience and make a difference in the lives of many.

Knowledge-intensive NLP and retrieval augmentation

15 June 2022, 09:00 (Aula Magna)
DIAG department, Sapienza University of Rome

Prof. Aleksandra Piktus (HuggingFace)

Recording (link) , passcode: zwDDv.0V

Abstract & Bio
With the advent of large-scale, transformer-based language models such as BERT and GPT, we have witnessed unprecedented progress on many NLP tasks. Common benchmarks testing natural language inference, paraphrase detection or closed-book question answering saw submissions approaching or exceeding human performance. Yet, NLP is far from being a solved problem, and the ability to reliably access and utilise knowledge - be it common sense or factual, emerges as a consistently challenging problem.
In this session, we will take a closer look at knowledge-intensive NLP (KI-NLP) tasks. First, we will go over examples of KI-NLP datasets, and analyse the challenges and limitations of their positioning. We will then provide an overview of common approaches to modelling such tasks-contextualising them with respect to both classical information retrieval and state-of-the-art, billion-parameter-scale language modelling. We will then use the RAG model as an example to guide us through the process of building a typical retriever-reader architecture. Finally, we will glimpse into a new exciting path of research exploring the concept of generative retrieval for KI-NLP.

Bio: Aleksandra is a research engineer specializing in Natural Language Processing and Information Retrieval. After a couple of exciting years working on knowledge-intensive NLP at the Facebook AI Research lab in London, she has recently joined the Science team at HuggingFace. Before that, she worked on reducing the spread of misinformation on the Facebook platform and on Facebook Search.

Rethinking "optimization" in deep learning

16 June 2022, 15:00 (Aula Magna)
DIAG department, Sapienza University of Rome

Prof. Sanjeev Arora (Princeton University)

Recording (link) , passcode: $6ngBpyh

Abstract & Bio:

The talk will focus on recent works, showing ways in which traditional optimization analyses are a bad match for deep learning phenomena for two reasons: (a) Traditional analyses of gradient descent rely upon an inequality by which the learning rate is set using the smoothness of the loss. This is violated in deep learning losses (Li et al. 2020, Cohen et al. 2021). (b) Traditional treats the cost/loss function as a black box and sets the goal as finding any solution of low cost. It is increasingly clear that the cost of the solution does not capture its goodness completely because two solutions of the same cost can have very different performances on held-out data. Instead, the exact trajectory taken by gradient-based optimization has a big effect on the quality of the solution.
The talk will introduce these surprising phenomena and how new theory has been developed in the past few years to understand them.

Bio: Prof. Arora is a member of the groups in Theoretical Computer Science and Theoretical Machine Learning at Princeton. In the past he has worked on: Computational Complexity, Probabilistically Checkable Proofs (PCPs), computing approximate solutions to NP-hard problems, and related issues. For several years now Sanjeev is most interested in developing a new theory for Machine Learning (including deep learning).

Page updated

Report abuse