Summer 2021

Friday, July 16th , 2021

Yi Ma

Recording: here


White-box Deep (Convolution) networks from First Principles

In this talk, we offer an entirely “white box’’ interpretation of deep (convolution) networks from the perspective of data compression (and group invariance). In particular, we show how modern deep layered architectures, linear (convolution) operators and nonlinear activations, and even all parameters can be derived from the principle of maximizing rate reduction (with group invariance). All layers, operators, and parameters of the network are explicitly constructed via forward propagation, instead of learned via back propagation. All components of so-obtained network, called ReduNet, have precise optimization, geometric, and statistical interpretation. There are also several nice surprises from this principled approach: it reveals a fundamental tradeoff between invariance and sparsity for class separability; it reveals a fundamental connection between deep networks and Fourier transform for group invariance – the computational advantage in the spectral domain (why spiking neurons?); this approach also clarifies the mathematical role of forward propagation (optimization) and backward propagation (variation). In particular, the so-obtained ReduNet is amenable to fine-tuning via both forward and backward (stochastic) propagation, both for optimizing the same objective.


This is joint work with students Yaodong Yu, Ryan Chan, Haozhi Qi of Berkeley, Dr. Chong You now at Google Research, and Professor John Wright of Columbia University. A related paper can be found at: https://arxiv.org/abs/2105.10446

Friday, July 9th , 2021

Freyr Sverrisson

Recording: here


End-to-end learning of interaction fingerprints from protein molecular surfaces

Proteins’ biological functions are defined by the geometric and chemical structure of their 3D molecular surfaces. In our recent work we showed that geometric deep learning can be used on mesh-based representations of proteins to identify potential functional sites, such as binding targets for potential drugs. Unfortunately though, the use of meshes as the underlying representation for protein structure has multiple drawbacks including the need to pre-compute the input features and mesh connectivities. This becomes a bottleneck for many important tasks in protein science.


In order to overcome these limitations we developed a new framework for deep learning on protein structures. Among the key advantages of our method are the computation and sampling of the molecular surface on-the-fly from the underlying atomic point cloud and a novel efficient geometric convolutional layer. As a result, we were able to process large collections of proteins in an end-to-end fashion, taking as the sole input the raw 3D coordinates and chemical types of their atoms, eliminating the need for any hand-crafted pre-computed features.


We showcased the performance of our approach by testing it on two tasks in the field of protein structural bioinformatics: the identification of interaction sites and the prediction of protein-protein interactions. On both tasks, we achieved state-of-the-art performance with much faster run times and fewer parameters than previous models. These results will considerably ease the deployment of deep learning methods in protein science and open the door for end-to-end differentiable approaches in protein modeling tasks such as function prediction and design.

Friday, July 2nd , 2021

Yasaman Bahri

Recording: here


Explaining neural Scaling laws

The test loss of neural networks has empirically been found to follow power laws as a function of basic variables such as model size and training set size. I will discuss a theory that explains and connects these scaling laws. We propose “variance-limited” and “resolution-limited” scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We also discuss several empirical relationships between task properties and scaling exponents.


Friday, June 25 , 2021

Maciej Koch-Janusz

Recording: here


How to construct a physical theory

A key stage in developing a physical theory of a complex system is identifying its “relevant” degrees of freedom. “Relevance” is surprisingly well defined in physics and forms the cornerstone of the celebrated renormalisation group (RG) program. Its practical execution in unfamiliar systems is, however, difficult. Machine learning approaches, on the other hand, though promising, often lack formal interpretability: it is unclear what relation, if any, the architecture- and training-dependent learned "relevant features” bear to standard objects of physical theory.


I will introduce these concepts and discuss how the above gap can be bridged, paving a path to automated discovery of mathematically formal and interpretable physical theories, from raw data. The key insight is that the field-theoretic “relevance” of physics is in fact equivalent* to the notion of “relevant information” defined in the Information Bottleneck (IB) formalism of compression theory, which we prove. Employing recent tools of ML-based estimation of information-theoretic quantities we then construct an unsupervised algorithm whose inputs are raw configurations of a statistical system, and whose outputs are neural nets parametrising formal objects such as “order parameters”, or more generally “scaling operators” characterising the system.

Towards the end I will mention open questions and possible future research directions on the interface of formal physical theory building and ML.


Friday, June 18 , 2021

Jessica Hamrick

Recording: here


Mental Simulation, Imagination, and
 Model-Based Deep RL

Mental simulation—the capacity to imagine what will or what could be—is a salient feature of human cognition, playing a key role in a wide range of cognitive abilities. In artificial intelligence, the last few years have seen the development of model-based deep reinforcement learning methods, which seemingly share many similarities with mental simulation. In this talk, I will discuss how closely such methods actually capture the qualitative characteristics exhibited by human mental simulation, with a particular focus on: (1) the extent to which the performance of such agents is driven by model-based reasoning and planning, and (2) how effectively such agents can leverage planning for generalization. While a number of challenges remain in matching the capacity of human mental simulation, I will highlight some recent progress on developing more compositional model-based algorithms through the use of graph neural networks and tree search.

Friday, June 11 , 2021

Josh Tenenbaum

Recording: here


Scaling AI the Human Way: Reverse-engineering the origins of human common sense with probabilistic programs, game-engine style simulators, and program synthesis

What distinguishes today's AI technologies from true artificial intelligence as the field's founders understood it? Why don't we have machines with the flexible, general purpose, common sense of even a young child, and the ability to conceive of and accomplish an endless range of new goals without having to be specially built and trained to solve each task from scratch? I will discuss our efforts towards the long-term -- and still far off -- goal of reverse-engineering how common sense arises in the human mind and brain, and using what we learn to build more human-like forms of intelligence in machines. I will present models of both what we think is built in -- the cognitive start-up software or "core knowledge" that is also shared in some form with other species and implemented in functionally specific brain networks -- as well as steps towards general-purpose learning mechanisms that can go beyond this core knowledge. I will introduce key concepts that underlie these models, drawing from probabilistic programming, game-engine simulators, and program induction and synthesis methods. These tools complement and extend what we can do with better known modeling toolkits from Bayesian modeling and neural networks in cognitive science and AI. I will show examples of how we have used these tools along with behavioral experiments to study cognition in core domains such as intuitive physics and intuitive psychology (or theory of mind), and how we are starting to model the learning of new conceptual systems or “languages of thought” that go beyond core knowledge domains.