Deep Learning through Information Geometry
Workshop at NeurIPS 2020
What is the workshop about?
It has proven difficult to build a theoretical foundation that can sufficiently explain the strong empirical performance of deep networks and push practice to the next level. Three distinct— but interdependent—aspects lie at the heart of this challenge: (i) neural architectures are large and complex, (ii) training these non-convex models is difficult, and (iii) generalization performance of the trained models is difficult to guarantee. Attempts at understanding deep learning have focused on attacking each of these aspects independently, e.g., approximation theory to characterize the hypothesis class, building stochastic optimization theory tailored to deep networks, and adapting statistical learning theory to handle over-parametrized models. While substantial progress has been made on each of the individual aspects, a coherent understanding of deep learning remains elusive.
Our first goal is to bring together individual threads of theoretical work that address the three aspects of deep learning. This involves combining ideas from the diverse fields that have fueled these efforts, namely statistical physics, applied mathematics and optimization and information theory, along with statistical learning theory. We aim for these different communities to gain appreciation of each other’s results, learn each other’s language and compare and contrast their results. For instance, generalization is typically studied in a teacher-student setting in statistical physics, which is aligned with the Bayesian setting but the results are quite different from uniform convergence, max-margin, or stability bounds; optimization theory presents similar dissonant discourse. Information Geometry has strong overlaps with both these directions; it allows an explicit characterization of both the geometry and information theoretic properties of the hypothesis class and leads to results that qualitatively fit empirical results in the modern literature. We would like engage these communities via a unifying platform and search for a direction forward.
Our second observation is that while large real-world datasets have been instrumental in propelling empirical progress, current theories do not sufficiently exploit properties of the data. This is a large intellectual gap, and potentially the key to developing a holistic way to understand deep learning. Theoretical attempts to model the data have been quite successful. A deep, empirical understanding data has also been developed in the vision and NLP communities under the umbrella of transfer learning. Bringing forth these insights and cataloging them systematically, vis-a-vis existing theory, will improve our understanding of deep learning, and machine learning in general.
The workshop will discuss the following questions and encourages submissions in these areas.
Statistical learning theory focuses on the complexity of the hypothesis class to bound the generalization gap, but it is clear that this approach won’t work for deep networks. Nevertheless, empirically neural networks often exhibit good generalization properties.
How can we adapt existing theory to exploit the geometry of the hypothesis class?
The learning algorithm can be viewed as an information processing procedure. What information-theoretic properties of this channel lead to good generalization?
How should we build an understanding of data in machine learning? Specifically, how does the dataset (task) in deep learning affect optimization and generalization? How can we adapt learning theory to understand the low-data regime?
Key Dates
Submission Deadline October 16 2020 (23:59 Anywhere On Earth)
Acceptance Notification November 6, 2020
Camera Ready Submission November 20, 2020
Workshop Date December 12, 2020
Main NeurIPS workshop website
https://neurips.cc/virtual/2020/public/workshop_16112.html
Gather.Town workspace
https://neurips.gather.town/app/vPYEDmTHeUbkACgf/dl-info-neurips2020
Schedule
Videos of each talk are hyperlinked.
09.20 - 09.30 Opening Remarks
09.30 - 10.15 Keynote 1: Ke Sun: Information Geometry for Data Geometry through Pullbacks
10.15 - 10.30 Contributed Talk 1: The Volume of Non-Restricted Boltzmann Machines
and Their Double Descent Model Complexity
10.30 - 10.45 Contributed Talk 2: From em-Projections to Variational Auto-Encoder
10.45 - 11.30 Keynote 2: Marco Gori: Developmental Learning the Natural Manifold of Time
Lunch Break
12.30 - 13.15 Keynote 3: Shun-ichi Amari: Deep Random Networks and Fisher Information
13.15 - 14.00 Keynote 4: Alexander Rakhlin (Live)
Break
14.15 - 14.30 Contributed Talk 3: An Information-Geometric Distance on the Space of Tasks
14.30 - 15.15 Keynote 5: Gintare Karolina Dziugaite: Information-theoretic generalization bounds for
noisy, iterative learning algorithms
15.15 - 16.00 Keynote 6: Guido Montufar (Live)
16.00 - 16.15 Contributed Talk 4: Annealed Importance Sampling with q-Paths
Break
16.30 - 17.00 Panel Discussion and Closing Remarks
17.00 - 18.30 Poster Session (gather.town)
The workshop will be held remotely. The program will run in the PDT time-zone. Zoom will be used for keynotes and contributed talks, the Zoom session will be live-streamed and recorded to YouTube. The "poster" session will consist of a Gather Town session by the authors of accepted submissions with rocket.chat channel (see the NeurIPS workshop website for the link) that will run throughout the day.
Speakers
Organizers
Alexander Alemi
Google Inc
Pratik Chaudhari
University of Pennsylvania
Varun Jog
University of Wisconsin-Madison
Dhagash Mehta
The Vanguard Group
Frank Nielsen
Sony Computer Science Laboratories Inc
Stefano Soatto
Amazon Web Services, University of California Los Angeles
Greg Ver Steeg
University of Southern California
Formatting instructions
You must format your submission using the NeurIPS 2020 LaTeX style file. All submissions should be at most 4 pages, excluding references. If you wish, you can append a longer version of your manuscript as the Appendix, there will be no supplementary material. Submissions should be double blind, so you should anonymize the preprint. Submissions that violate the NeurIPS style (margins, font sizes etc.) will rejected without review.
Submission instructions
Submissions will be handled through OpenReview. Submissions and their reviews (both anonymous) will be public on OpenReview. Work that is accepted for publication at the main conference track of NeurIPS 2020 is also welcome at the workshop; if you are submitting such work, please copy the reviews and the rebuttal as a private note to the workshop organizers (Program Chairs) on OpenReview. We welcome work that is currently under review at other venues or on ArXiv.
Special Issue in the Springer Journal on Information Geometry
Accepted papers at the workshop may contribute to the special issue on "Information Geometry for Deep Learning: Part II" of the Springer Journal on Information Geometry (Editor: Frank Nielsen). The deadline for submission to this journal is 1st February, 2021. A short description of the special issue follows.
Deep neural networks (DNNs) are artificial neural network architectures with huge parameter spaces geometrically interpreted as neuromanifolds (with singularities), and learning algorithms are visualized as trajectories or gradient flows on neuromanifolds. The aim of this special issue is to comprise original theoretical and/or experimental research articles which address the recent developments and research efforts on information-geometric methods in deep learning. The topics include but are not limited to:
Properties and complexity of neural networks/neuromanifolds
Geometric dynamic learning with singularities
Optimization with natural gradient methods, proximal methods, and other alternative methods
Information geometry of generative models (f-GANs, VAEs, etc)
Wasserstein/Fisher-Rao metric spectral properties
Information bottleneck of neural networks
Neural network simplification and quantization
Geometric characterization of robustness and adversarial attacks
Please select "S.I.: Information Geometry for Deep Learning' in the submission form at the journal homepage.
Accepted Submissions
We are glad to have Amazon Web Services (AWS) sponsor two Best Paper awards for this workshop. Winners will be announced on the day of the workshop (Dec 12) and will receive AWS Cloud credits.
Best Paper Awards
Oral presentations
ID: 3 The Volume of Non-Restricted Boltzmann Machines and Their Double Descent Model Complexity
ID: 15 An Information-Geometric Distance on the Space of Tasks
Poster presentations
ID: 1 DIME: An Information-Theoretic Difficulty Measure for AI Datasets
ID: 3 The Volume of Non-Restricted Boltzmann Machines and Their Double Descent Model Complexity
ID: 5 Learning Joint Intensity in a Multivariate Poisson Process on Statistical Manifolds
ID: 10 AdaBelief Optimizer: Adapting Stepsizes by theBelief in Observed Gradients
ID: 11 Visualizing High-Dimensional Trajectories on the Loss-Landscape of ANNs
ID: 12 Revisiting "Qualitatively Characterizing Neural Network Optimization Problems"
ID: 13 Comparing Fisher Information Regularization with Distillation for DNN Quantization
ID: 14 Deep Learning Generalization and the Convex Hull of Training Sets
ID: 16 Estimating Total Correlation with Mutual Information Bounds
ID: 17 Noisy Neural Network Compression for Analog Storage Devices