Deep Learning through Information Geometry

Workshop at NeurIPS 2020

What is the workshop about?

It has proven difficult to build a theoretical foundation that can sufficiently explain the strong empirical performance of deep networks and push practice to the next level. Three distinct— but interdependent—aspects lie at the heart of this challenge: (i) neural architectures are large and complex, (ii) training these non-convex models is difficult, and (iii) generalization performance of the trained models is difficult to guarantee. Attempts at understanding deep learning have focused on attacking each of these aspects independently, e.g., approximation theory to characterize the hypothesis class, building stochastic optimization theory tailored to deep networks, and adapting statistical learning theory to handle over-parametrized models. While substantial progress has been made on each of the individual aspects, a coherent understanding of deep learning remains elusive.

Our first goal is to bring together individual threads of theoretical work that address the three aspects of deep learning. This involves combining ideas from the diverse fields that have fueled these efforts, namely statistical physics, applied mathematics and optimization and information theory, along with statistical learning theory. We aim for these different communities to gain appreciation of each other’s results, learn each other’s language and compare and contrast their results. For instance, generalization is typically studied in a teacher-student setting in statistical physics, which is aligned with the Bayesian setting but the results are quite different from uniform convergence, max-margin, or stability bounds; optimization theory presents similar dissonant discourse. Information Geometry has strong overlaps with both these directions; it allows an explicit characterization of both the geometry and information theoretic properties of the hypothesis class and leads to results that qualitatively fit empirical results in the modern literature. We would like engage these communities via a unifying platform and search for a direction forward.

Our second observation is that while large real-world datasets have been instrumental in propelling empirical progress, current theories do not sufficiently exploit properties of the data. This is a large intellectual gap, and potentially the key to developing a holistic way to understand deep learning. Theoretical attempts to model the data have been quite successful. A deep, empirical understanding data has also been developed in the vision and NLP communities under the umbrella of transfer learning. Bringing forth these insights and cataloging them systematically, vis-a-vis existing theory, will improve our understanding of deep learning, and machine learning in general.

The workshop will discuss the following questions and encourages submissions in these areas.

  1. Statistical learning theory focuses on the complexity of the hypothesis class to bound the generalization gap, but it is clear that this approach won’t work for deep networks. Nevertheless, empirically neural networks often exhibit good generalization properties.

      • How can we adapt existing theory to exploit the geometry of the hypothesis class?

      • The learning algorithm can be viewed as an information processing procedure. What information-theoretic properties of this channel lead to good generalization?

  2. How should we build an understanding of data in machine learning? Specifically, how does the dataset (task) in deep learning affect optimization and generalization? How can we adapt learning theory to understand the low-data regime?

Key Dates

Submission Deadline October 16 2020 (23:59 Anywhere On Earth)
Acceptance Notification
November 6, 2020
Camera Ready Submission November 20, 2020
Workshop Date December 12, 2020

Main NeurIPS workshop website

Gather.Town workspace


Videos of each talk are hyperlinked.

09.20 - 09.30 Opening Remarks
09.30 - 10.15
Keynote 1: Ke Sun: Information Geometry for Data Geometry through Pullbacks
10.15 - 10.30
Contributed Talk 1: The Volume of Non-Restricted Boltzmann Machines
and Their Double Descent Model Complexity
10.30 - 10.45
Contributed Talk 2: From em-Projections to Variational Auto-Encoder
10.45 - 11.30
Keynote 2: Marco Gori: Developmental Learning the Natural Manifold of Time

Lunch Break

12.30 - 13.15 Keynote 3: Shun-ichi Amari: Deep Random Networks and Fisher Information
13.15 - 14.00
Keynote 4: Alexander Rakhlin (Live)


14.15 - 14.30 Contributed Talk 3: An Information-Geometric Distance on the Space of Tasks
14.30 - 15.15
Keynote 5: Gintare Karolina Dziugaite: Information-theoretic generalization bounds for
noisy, iterative learning algorithms

15.15 - 16.00
Keynote 6: Guido Montufar (Live)
16.00 - 16.15
Contributed Talk 4: Annealed Importance Sampling with q-Paths


16.30 - 17.00 Panel Discussion and Closing Remarks
17.00 - 18.30
Poster Session (

The workshop will be held remotely. The program will run in the PDT time-zone. Zoom will be used for keynotes and contributed talks, the Zoom session will be live-streamed and recorded to YouTube. The "poster" session will consist of a Gather Town session by the authors of accepted submissions with channel (see the NeurIPS workshop website for the link) that will run throughout the day.


Senior Advisor,
RIKEN Brain Science Institute, Japan

Fundamental Research Scientist,
Element AI, Canada

Professor of Computer Science,
University of Sienna, Italy

Associate Professor,
Massachusetts Institute of Technology, USA

Data 61, CSIRO,

Assistant Professor,
University of California, Los Angeles, USA


Alexander Alemi
Google Inc

Pratik Chaudhari
University of Pennsylvania

Varun Jog
University of Wisconsin-Madison

Dhagash Mehta
The Vanguard Group

Frank Nielsen
Sony Computer Science Laboratories Inc

Stefano Soatto
Amazon Web Services, University of California Los Angeles

Greg Ver Steeg
University of Southern California

Formatting instructions

You must format your submission using the NeurIPS 2020 LaTeX style file. All submissions should be at most 4 pages, excluding references. If you wish, you can append a longer version of your manuscript as the Appendix, there will be no supplementary material. Submissions should be double blind, so you should anonymize the preprint. Submissions that violate the NeurIPS style (margins, font sizes etc.) will rejected without review.

Submission instructions

Submissions will be handled through OpenReview. Submissions and their reviews (both anonymous) will be public on OpenReview. Work that is accepted for publication at the main conference track of NeurIPS 2020 is also welcome at the workshop; if you are submitting such work, please copy the reviews and the rebuttal as a private note to the workshop organizers (Program Chairs) on OpenReview. We welcome work that is currently under review at other venues or on ArXiv.

Special Issue in the Springer Journal on Information Geometry

Accepted papers at the workshop may contribute to the special issue on "Information Geometry for Deep Learning: Part II" of the Springer Journal on Information Geometry (Editor: Frank Nielsen). The deadline for submission to this journal is 1st February, 2021. A short description of the special issue follows.

Deep neural networks (DNNs) are artificial neural network architectures with huge parameter spaces geometrically interpreted as neuromanifolds (with singularities), and learning algorithms are visualized as trajectories or gradient flows on neuromanifolds. The aim of this special issue is to comprise original theoretical and/or experimental research articles which address the recent developments and research efforts on information-geometric methods in deep learning. The topics include but are not limited to:

  • Properties and complexity of neural networks/neuromanifolds

  • Geometric dynamic learning with singularities

  • Optimization with natural gradient methods, proximal methods, and other alternative methods

  • Information geometry of generative models (f-GANs, VAEs, etc)

  • Wasserstein/Fisher-Rao metric spectral properties

  • Information bottleneck of neural networks

  • Neural network simplification and quantization

  • Geometric characterization of robustness and adversarial attacks

Please select "S.I.: Information Geometry for Deep Learning' in the submission form at the journal homepage.

Accepted Submissions

We are glad to have Amazon Web Services (AWS) sponsor two Best Paper awards for this workshop. Winners will be announced on the day of the workshop (Dec 12) and will receive AWS Cloud credits.

Best Paper Awards

Oral presentations

Poster presentations