Invited Speakers


University of Tilburg


Title: Grounded language learning, from sounds and images to meaning

Abstract

Humans learn to understand speech from weak and noisy supervision: they manage to extract structure and meaning from speech by simply being exposed to utterances situated and grounded in their daily sensory experience. Emulating this remarkable skill has been the goal of numerous studies; however researchers have often used severely simplified settings where either the language input or the extralinguistic sensory input, or both, are small-scale and symbolically represented. I present a series of studies on modelling visually grounded language understanding. Using variations of recurrent neural networks to model the temporal nature of spoken language, we examine how form and meaning-based linguistic knowledge emerges from the input signal.

Bio

Afra Alishahi is an Associate Professor of Cognitive Science and Artificial Intelligence at Tilburg University, the Netherlands. Her main research interests are computational modeling of human language acquisition, studying the emergence of linguistic form and function in grounded models of language learning, and developing tools and techniques for analyzing linguistic representations in neural models of language. She has received a number of research grants including an NWO Aspasia, an NWO Natural Artificial Intelligence and an e-Science Center/NWO grant. She has been the recipient of a number of best paper awards at Computational Linguistics and Cognitive Science venues.


University of Washington and Facebook


Title: De-noising Sequence-to-Sequence Pre-training

Abstract

De-noising auto-encoders can be pre-trained at a very large scale by noising and then reconstructing any input text. Existing methods, based on variations of masked language models, have transformed the field and now provide the de facto initialization to be tuned for nearly every task. In this talk, I will present our work on sequence-to-sequence pre-training that introduces and carefully measures the impact of two new types of noising strategies. I will first describe an approach that allows arbitrary noising, by learning to translate any corrupted text back to the original with standard Transformer-based neural machine translation architectures. I will show that the resulting mono-lingual (BART) and multi-lingual (mBART) models provide effective initialization for learning a wide range of discrimination and generation tasks, including question answering, summarization, and machine translation. I will also present our recently introduced MARGE model, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance with no fine-tuning, as well as consistent performance gain when fine-tuned for individual tasks. Together, these techniques provide the most comprehensive set of pre-training methods to date, as well as the first viable alternative to the dominant masked language modeling pre-training paradigm.

This joint work was primarily led by Mike Lewis, Yinhan Liu, and Jiatao Gu.

Bio

Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Scientist at Facebook. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. Honors include multiple paper awards, a PECASE award, and an Allen Distinguished Investigator Award. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.