Friday, May 3, 2019
ICLR Lightning Talks
Recording: https://bluejeans.com/s/jXmi_
Slides : Not available
Featuring
Note: Audio Since Minute 10:35
Friday, April 19, 2019
Dhruva Raman (University of Cambridge)
Recording: https://bluejeans.com/s/9nfIc/
Slides : Not available
Biological neural circuits learn in spite of imperfect information on task performance and noisy biological components. How can these problems be mitigated? We use optimization theory to show how adding apparently redundant neurons and connections to a network can improve learning performance in the face of imperfect learning rules and corrupted error signals. The theory shows how large neural circuits can exploit additional connectivity to achieve faster and more precise learning. However, there is a limit to the benefit of adding connections. Biologically, synapses (connections strengths) are intrinsically unreliable. We show that excessive network size eventually outcompetes the benefits to learning performance. Consequently, there is an optimal size of network for a given task, which we can calculate in specific cases.
Whereas machine learning theory has focused on generalization to examples from the same distribution as the training data, better understanding of the transfer scenarios where the observed distribution changes often in the lifetime of the learning agent is important, both for robust deployment and to achieve a more powerful form of generalization which humans seem able to enjoy and which seem necessary for learning agents. Whereas most machine learning algorithms and architectures can be traced back to assumptions about the training distributions, we also need to explore assumptions about how the observed distribution changes. We propose that sparsity of change in distribution, when knowledge is represented appropriately, is a good assumption for this purpose, and we claim that if that assumption is verified and knowledge represented appropriately, it leads to fast adaptation to changes in distribution, and thus that the speed of adaptation to changes in distribution can be used as a meta-objective which can drive the discovery of knowledge representation compatible with that assumption. We illustrate these ideas in causal discovery: is some variable a direct cause of another? and how to map raw data to a representation space where different dimensions correspond to causal variables for which a clear causal relationship exists? We propose a large research program in which this non-stationarity assumption and meta-transfer objective is combined with other closely related assumptions about the world embodied in a world model, such as the consciousness prior (the causal graph is captured by a sparse factor graph) and the assumption that the causal variables are often those agents can act upon (the independently controllable factors prior), both of which should be useful for agents which plan, imagine and try to find explanations for what they observe.
Friday, April 5, 2019
Anirudh Goyal (Mila)
Recording: https://bluejeans.com/s/2Q@yO/
Slides: Not yet available
The biggest challenge right now in RL specifically, AI more generally, is to devise methods for learning complex behaviours that have meaningful generalization. In supervised learning (vision, speech, NLP), it seems that deep models can achieve very complex generalization, but this is not really the case in RL. In this talk, I'm going to first argue that this gap in generalization has much to do with current practices in evaluating generalization in multitask, transfer and meta-learning setups, and then I'm going to talk about an elegant solution for learning complex inductive biases that can be used to exploit the structure of the task, thus enabling the learning agent to incorporate prior knowledge into the learning system to exploit reusable structure in task space.
Friday, March 29, 2019
Jeffrey Pennington (Google Brain NYC)
Recording: https://bluejeans.com/s/UJY3k
Slides: Not yet available
Neural networks define a rich and expressive class of functions whose properties and behaviors are very hard to describe from a theoretical perspective. Nevertheless, when these functions become highly overparameterized, a surprisingly simple characterization emerges. In this talk, I will discuss several perspectives on this characterization: 1) I will examine the prior over functions induced by common weight initialization schemes and show that it corresponds to a Gaussian process with a well-defined compositional kernel; 2) I will show that by tuning initialization hyperparameters, this kernel can be optimized for signal propagation, yielding networks that are trainable to enormous depths (10k+ layers); and 3) I will demonstrate that the learning dynamics of such overparameterized neural networks are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.
Friday, March 8, 2019
David Blei (Columbia University)
Recording: https://bluejeans.com/s/KgY4m
Slides: Here
Causal inference from observational data is a vital problem, but it comes with strong assumptions. Most methods require that we observe all confounders, variables that correlate to both the causal variables (the treatment) and the effect of those variables (how well the treatment works). But whether we have observed all confounders is a famously untestable assumption. We describe the deconfounder, a way to do causal inference from observational data with weaker assumptions that the classical methods require. How does the deconfounder work? While traditional causal methods measure the effect of a single cause on an outcome, many modern scientific studies involve multiple causes, different variables whose effects are simultaneously of interest. The deconfounder uses the multiple causes as a signal for unobserved confounders, combining unsupervised machine learning and predictive model checking to perform causal inference. We describe the theoretical requirements for the deconfounder to provide unbiased causal estimates, and show that it requires weaker assumptions than classical causal inference. We analyze the deconfounder's performance in three types of studies: semi-simulated data around smoking and lung cancer, semi-simulated data around genomewide association studies, and a real dataset about actors and movie revenue. The deconfounder provides a checkable approach to estimating close-to-truth causal effects. This is joint work with Yixin Wang. [*] https://arxiv.org/abs/1805.06826
Friday, February 1, 2019
Isabela Albuquerque (INRS -Université du Québec)
Recording: https://bluejeans.com/s/h69jV
Slides: Not yet available
Recent literature has demonstrated promising results for training Generative Adversarial Networks by employing a set of discriminators, in contrast to the traditional game involving one generator against a single adversary. Such methods perform single-objective optimization on some simple consolidation of the losses, e.g. an average. In this work, we revisit the multiple-discriminator setting by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem. Specifically, we evaluate the performance of multiple gradient descent and the hypervolume maximization algorithm on a number of different datasets. Moreover, we argue that the previously proposed methods and hypervolume maximization can all be seen as variations of multiple gradient descent in which the update direction can be computed efficiently. Our results indicate that hypervolume maximization presents a better compromise between sample quality and computational cost than previous methods.
Friday, January 25, 2019
Hugo Larochelle (Mila + Brain)
Recording: https://bluejeans.com/s/z1I9S/
Slides: Here
A lot of the recent progress on many AI tasks was enable in part by the availability of large quantities of labeled data. Yet, humans are able to learn concepts from as little as a handful of examples. Meta-learning is a very promising framework for addressing the problem of generalizing from small amounts of data, known as few-shot learning. In meta-learning, our model is itself a learning algorithm: it takes as input a training set and outputs a classifier. For few-shot learning, it is (meta-)trained directly to produce classifiers with good generalization performance for problems with very little labeled data. In this talk, I'll present an overview of the recent research that has made exciting progress on this topic (including my own) and will discuss the challenges as well as research opportunities that remain.
This talk will present BabyAI, a research project based at Mila with the long-term goal of creating agents which we can communicate with and teach new concepts using natural language. We will begin by discussing why language learning is a hard problem, one to which deep learning doesn't yet have a satisfying answer for. We will then introduce the BabyAI platform, created to study the sample efficiency of grounded language learning in the context of embodied agents and instruction following (see paper accepted at ICLR19, https://openreview.net/forum?id=rJeXCo0cYX). Lastly, we will discuss multiple promising research directions we have identified with the goal of improving the sample efficiency of grounded language learning.