Friday, December 18, 2020

Stephan Zheng/Richard Socher

Recording: Here




The ai economist : improving equality and productivity with ai-driven tax policies

Tackling real-world socio-economic challenges requires designing and testing economic policies. However, this is hard in practice, due to a lack of appropriate (micro-level) economic data and limited opportunity to experiment. In Zheng et al., 2020, we propose a two-level deep reinforcement learning approach to learn dynamic tax policies, based on principled economic simulations in which both agents and a social planner (government) learn and adapt. AI social planners can discover tax policies that improve the equality and productivity trade-off by at least 16%, compared to the prominent Saez tax model, US Federal tax, and the free market. The learned tax policies are qualitatively different from the baselines, and certain model instances are effective in human studies as well.

This talk will present three topics: 1) economic policy design in the context of multi-agent RL, 2) our two-level RL approach to economic policy design, and 3) open research problems towards an AI Economist for the real world. These include key methodological challenges in two-level RL and data-driven economic modeling, multi-agent RL, mechanism design, convergence guarantees, robustness, explainability, and others.


Friday, November 27, 2020

Siva Reddy

Recording: Here




Pathological Behaviours of Large Neural Models of Language

Large neural models of language have achieved state-of-the-art results on many NLP tasks like Question Answering. Despite this progress, they are known to be surprisingly brittle to variations of input. In this talk, we will systematically study the pathological behaviours of neural models in three scenarios: generalization to natural language sentence structure, generalization in reasoning strategies, and latent capture of societal biases.


Friday, November 20, 2020

Lucas Lehnert

Recording: Here




Encoding reusable knowledge in state representations

A key question in further scaling artificial intelligence is how reinforcement learning systems can re-use previously learned knowledge. While recent advances in deep reinforcement learning research have demonstrated how to build algorithms to maximize rewards in complex tasks and even outperform humans, these algorithms are not as adept as humans at flexibly transferring knowledge between different tasks. How knowledge reuse can be applied to reinforcement-learning algorithms is a central, yet not well understood problem. By viewing knowledge representations through the lens of representation learning, I will present an approach to address the question of which models allow an agent to re-use knowledge. Through a sequence of theoretical and empirical results, I will discuss different state representations and present connections to model-based reinforcement learning, model-free reinforcement learning, and successor features. Lastly, I will present different transfer learning simulations, demonstrating that representations that are predictive of future reward outcomes generalize across different tasks. These results suggest that the reinforcement learning objective changes in the context of transferring knowledge: Instead of focusing only on maximizing reward in one task, learning a model detailed enough to predict future reward outcomes leads to encoding reusable task knowledge that allows accelerated learning on previously unseen tasks.


Friday, November 13, 2020

Guillaume Rabusseau

Recording: Here




Tensor networks for machine learning

In this talk, I will give an introduction to tensor networks and give a very brief overview of three recent contributions from my group aiming at going beyond classical tensor decomposition models using the tensor network formalism.

Tensors are high order generalization of vectors and matrices. Similar to matrix factorization techniques, one of the goal of tensor decomposition techniques is to express a tensor as a product of small factors, thus reducing the number of parameters and potentially regularizing machine learning models. While linear algebra is ubiquitous and taught in most undergrad curriculum, tensor and multilinear algebra can be daunting. In the first part of the talk, I will try to give an easy and accessible introduction to tensor methods using the tensor network formalism. Tensor networks are an intuitive diagrammatic notation allowing one to easily reason about complex operations on high-order tensors.

In the second part of the talk, I will very briefly give an overview of three recent work from my group, ranging from tensorizing random projections to studying the VC dimension of tensor network models.

Friday, November 6, 2020

Irina Higgins

Recording: Here




Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

A principled approach to modeling structured data is to consider all transformations that maintain What objective drives learning in the ventral visual stream? To answer this, we turn to face perception -- the often considered ``microcosm of object recognition''. We model neural responses to faces in the macaque IT with a deep self-supervised generative model, beta-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a remarkable correspondence between the generative factors discovered by beta-VAE and those coded by single inferotemporal (IT) neurons, far beyond that found for the baselines, including the handcrafted ``gold standard'' model of face perception and deep classifiers. Moreover, beta-VAE was able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective results in representations that closely resemble those in the IT at the single unit level, suggesting disentangling as a plausible learning objective for the visual brain.

Friday, October 30, 2020

Siamak Ravanbakhsh

Recording: Not available




COMPOSITIONALITY OF SYMMETRY IN DEEP LEARNING

A principled approach to modeling structured data is to consider all transformations that maintain structural relations. Using this perspective in deep learning leads to the design of models (such as ConvNet) that are invariant or equivariant to the symmetry transformations of the data. While equivariant deep learning has dealt with a range of simple structures so far, we have not explored the notion of compositionality in this symmetry-based approach. In this talk, I plan to explore various types of compositionality in recent works on symmetry-based model design and identify opportunities for compositional generalization.

Friday, October 16, 2020

Gal Chechik

Recording: here




A causal view of compositional zero-shot visual recognition

People easily recognize new visual categories that are new combinations of known components. This compositional generalization capacity is critical for learning in real-world domains like vision and language because the long tail of new combinations dominates the distribution. Unfortunately, learning systems struggle with compositional generalization because they often build on features that are correlated with class labels even if they are not "essential" for the class. This leads to consistent misclassification of samples from a new distribution, like new combinations of known components.

In this talk, I will describe our recent work on compositional generalization that builds on causal ideas. First, we describe compositional zero-shot learning from a causal perspective, and propose to view zero-shot inference as finding "which intervention caused the image?". Second, we present a causal-inspired embedding model that learns disentangled representations of elementary components of visual objects from correlated (confounded) training data. We evaluate this approach on two datasets for predicting new combinations of attribute-object pairs: A well-controlled synthesized images dataset and a real world dataset which consists of fine-grained types of shoes. We show improvements compared to strong baselines.

Friday, October 9, 2020

Colin Raffel

Recording: here




TRANSFER LEARNING FOR NLP : T5 and beyond

Transfer learning, where a model is pre-trained on a data-rich task before being fine-tuned on a downstream task of interest, has emerged as the dominant framework for tackling natural language processing (NLP) problems. In this talk, I'll give an introduction to transfer learning for NLP through the lens of our recent large-scale empirical study. To carry out this study, we introduced the "Text-to-Text Transfer Transformer" (T5), a pre-trained language model that casts every NLP problem as a text-to-text problem. After figuring out what works best, we "explored the limits" by scaling up our models to achieve state-of-the-art on many standard NLP benchmarks. I will then present two follow-up works that provide more insight into what these models are capable of. In the first, we evaluate whether giant language models can answer open-domain questions without accessing an external knowledge source. To perform well on this task, a model must squirrel away vast amounts of knowledge in its parameters during pre-training. In the second, we test whether these models can generate plausible-sounding explanations of their predictions, which provides a crude form of interpretability. I'll provide pointers to our pre-trained models and code to facilitate future work.


Friday, September 18, 2020

Bharath Ramsundar

Recording: here




Open Sourcing Medicine Discovery with DeepChem

Traditionally, the process of discovering new medicine has been driven by proprietary techniques and algorithms. The advent of deep learning driven drug discovery over the last several years has started to change this state of affairs, with increasingly powerful suites of open datasets and algorithms available for researchers. In this talk, I'll introduce the DeepChem project (https://deepchem.readthedocs.io/en/latest/) which seeks to create a powerful open suite of algorithms to enable scientists working on medicine discovery and scientific problems more broadly. I'll also review some of the core algorithmic techniques underlying molecular machine learning and other related areas of scientific deep learning and say a bit about our efforts to build a diverse, decentralized open research community with DeepChem.