Fall 2022

Friday, 25 November, 2022

Amortized Inference for Learning Bayesian Causal Models

Friday, 11 November, 2022

What every computer scientist should know about alphafold2

Friday, 4 November, 2022

Some steps towards causal representation learning

Friday, 21 October, 2022

Imitation vs Innovation in Children and Large Language and Image Models

Friday, 14 October, 2022

Deep learning through the lens of data

Friday, 16 September, 2022

Profs talk

Friday, 25 November, 2022

Yoshua Bengio

Recording : here

Amortized Inference for Learning Bayesian Causal Models

Being Bayesian is the rational way to learn and make safe decisions as well as explore efficiently (to maximize information gain) but that requires computing intractable quantities or running MCMC for potentially exponentially long times. Because of computational limitations, Bayesian approaches have up to now mostly relied on excessive distributional assumptions or insufficiently long MCMC which may explain that these methods have not become mainstream in machine learning. I will argue and show some examples that using large neural networks such as GFlowNets (that extend variational inference, RL and generative neural net approaches) can be used to approximate the rich required distributions (such the posterior over causal graphs) and intractable marginalized quantities (such as the conditional mutual information needed to estimate the expected information gain) in order to perform the required inferences. This also makes it possible to avoid the very computationally expensive Monte-Carlo averages that are normally required at run-time and instead represent highly multimodal distributions in an implicit way. The fundamental reason why such amortized predictors can work in spite of having to approximate such rich distributions (with potentially an exponential number of modes) is that in many cases of interest generalization within the distribution is possible: there is structure that makes it possible to guess most of the modes from a comparably small number of them.

Friday, 11 November, 2022

Nazim Bouatta

Recording : here

What every computer scientist should know about alphafold2

AlphaFold2, DeepMind’s machine-learning algorithm, represents a stunning advance on one of biology’s grand challenges: predicting the 3D structure of a protein from the knowledge of its primary structure—its sequence of amino acids. AlphaFold2’s performance demonstrates the remarkable power of deep learning in molecular problems when co-evolutionary information is available in terms of multiple sequence alignments (MSAs). AlphaFold2 and the transformative developments it already prompted may carry the seeds of a novel understanding of biology: a structural systems biology in which biological phenomena across the varied scales of life are studied through a structural and mechanistic prism.

I will discuss how AlphaFold2 works by describing its key features, including its use of

Attention mechanisms and Transformers to capture long-range dependencies,
Symmetry principles to facilitate reasoning over protein structures in three dimensions,
End-to-end differentiability as a unifying framework that makes the entire approach work in a self-consistent and data-efficient manner

Friday, 4 November, 2022

Jason Hartford

Recording : here

Some steps towards causal representation learning

High-dimensional unstructured data such as images or sensor data can often be collected cheaply in experiments, but is challenging to use in a causal inference pipeline without extensive engineering or labelling to extract underlying latent factors. The goal of causal representation learning is to find appropriate assumptions and methods to disentangle latent variables and learn the causal mechanisms that explain a system's behaviour. In this talk, I'll present results from a series of recent papers that describe how we can leverage assumptions about a system's causal mechanisms to disentangle latent variables with identifiability guarantees. I will also talk about the importance of considering object centric learning for identifying latents, and the limitations of a commonly used injectivity assumption. Finally, I’ll discuss a hierarchy of disentanglement settings that do not require injectivity, but are important to solve if we want to build systems that can discover the underlying dynamics of complex systems from high dimensional observations.

Friday, 21 October, 2022

Alison Gopnik

Recording : here

Imitation vs Innovation in Children and Large Language and Image Models

It’s natural to ask whether large language models like LaMDA or GPT-3 are intelligent agents. But I argue that this is the wrong question. Intelligence and agency are the wrong categories for understanding them. Instead, these AI systems are what we might call cultural technologies, like writing, print, libraries, internet search engines or even language itself. They are new techniques for passing on information from one group of people to another. Cultural technologies aren’t like intelligent humans, but they are essential for human intelligence. Many animals can transmit some information from one individual or one generation to another, but no animal does it as much as we do or accumulates as much information over time. New technologies that make cultural transmission easier and more effective have been among the greatest engines of human progress, but they have also led to negative as well as positive social consequences. Moreover, while cultural technologies allow transmission of existing information cultural evolution also depends on innovation, explorartion and causal learning. I will present results from a novel environment showing that young children outperform even SOTA LLM’s and RL agents in an exploratory causal learning task. Sinilarly. I will show results that children can rapidly infer new uses and causal affordances for objects eg using a fork rather than a mirror as a comb. LLM’s and Dall-E can infer the associated use but not the novel one.

Friday, 14 October, 2022

Karolina Dziugate

Recording : here

Deep learning through the lens of data

Deep learning comes with excessive demands for data. In this talk, I will present my recent work on showing that not all data is necessary for training an accurate predictor. In particular, one can drop "easy-to-learn" examples, and do just as well as learning on all of the data. Given this disparate “importance” of training data on generalization, I will present empirical analysis of the loss landscape derived from different subsets of the training examples. I will then look into how the training dynamics are influenced by easy versus hard data.

Friday, 16 September, 2022

Mila Profs

Recording : here

Profs talk

Talk 1 : Unsupervised inference of a conserved neural substrate for behavior through brain-wide compositional modes

Behaving animals continually reconcile the internal states of their neural circuits brain-wide with incoming sensory and environmental evidence to evaluate when and how to act. The brains of animals including humans exploit many evolutionary innovations, chiefly modularity—observable at the level of anatomically-defined brain regions, cortical layers, and cell types among others—that can be repurposed in a compositional manner to endow the animal with a highly flexible behavioral repertoire with minimum energy expenditure and learning time. On the surface, it seems that similar modularity could be a powerful approach to endow artificial brains with similar generalizability and efficiency as biological brains. Accordingly, behavioral output shows its own modularity, yet these behavioral modules seldom correspond directly to traditional notions of modularity in the brain. It remains unclear how to link neural and behavioral modularity in a compositional manner. Here, we propose that, rather than serve as a basis set of low-level functions to be combined towards higher-level functions, compositional modularity emerges from evolutionary constraints imposed on parts of the greater system to support specific behavioral functions that span more readily apparent submodules such as brain regions. We introduce a comprehensive framework—compositional modes—which directly links the behavioral repertoire with distributed patterns of population activity brain-wide at multiple concurrent spatial and temporal scales. Using whole-brain recordings of larval zebrafish, we introduce an unsupervised pipeline based on neural network models to reveal highly conserved compositional modes across individuals despite the spontaneous nature of the behavior. These modes provided a scaffolding for other modes that account for the idiosyncratic behavior of each fish. Our results demonstrate that even spontaneous behavior in different individuals can be decomposed and understood using a relatively small number of neurobehavioral modules—the compositional modes—and elucidate a compositional neural basis of behavior.

Talk 2 : Stochastic Algorithms in the Large

In this talk, I will present a framework, inspired by random matrix theory, for analyzing the dynamics of stochastic optimization algorithms (e.g., stochastic gradient descent (SGD) and momentum (SGD + M)) when both the number of samples and dimensions are large. Using this new framework, we show that the dynamics of optimization algorithms on a least squares problem with random data become deterministic in the large sample and dimensional limit. In particular, the limiting dynamics for stochastic algorithms are governed by a Volterra equation. From this model, we identify a stability measurement, the implicit conditioning ratio (ICR), which regulates the ability of SGD+M to accelerate the algorithm. When the batch size exceeds this ICR, SGD+M converges linearly at a rate of $O(1/\sqrt{\kappa})$, matching optimal full-batch momentum (in particular performing as well as a full-batch but with a fraction of the size). For batch sizes smaller than the ICR, in contrast, SGD+M has rates that scale like a multiple of the single batch SGD rate. We give explicit choices for the learning rate and momentum parameter in terms of the Hessian spectra that achieve this performance. Finally we show this model matches performances on real data sets.

Talk 3 : Advancing multimodal vision-language learning

Over the last decade, multimodal vision-language (VL) research has seen impressive progress. We can now automatically caption images in natural language, answer natural language questions about images, retrieve images using complex natural language queries and even generate images given natural language descriptions. However, current VL systems lack several skills that prevent them from being practically usable: out-of-distribution generalization, compositional reasoning, common sense and factual knowledge reasoning, data-efficient adaptation to new tasks, interpretability and explainability, overcoming spurious correlations and biases in data, etc. In this talk I will present our work studying two of these challenges in VL research: out-of-distribution generalization in visual question answering, and data-efficient adaptation of VL models to new VL tasks.

Fall 2022

Friday, 25 November, 2022

Amortized Inference for Learning Bayesian Causal Models

Friday, 11 November, 2022

What every computer scientist should know about alphafold2

Friday, 4 November, 2022

Some steps towards causal representation learning

Friday, 21 October, 2022

Imitation vs Innovation in Children and Large Language and Image Models

Friday, 14 October, 2022

Deep learning through the lens of data

Friday, 16 September, 2022

Profs talk

Be our next SPEAKER