Towards Commonsense reasoning in natural language generation
Speaker: Lianhui Qin
Abstract:Today's NLP systems have achieved remarkable advances in understanding text passages and generating plausible language, as exemplified by the massive pre-trained language models such as GPT-3. However, those models still fall short of contextual reasoning and generation that obey commonsense robustly. Such reasoning capability is crucial for common human cognition activities, such as explaining the situations from incomplete observations (abduction), imagining causal change in future events given a change in the current situation (counterfactual reasoning), and reasoning over rich temporal concepts in complex context. In this talk, I’ll present our works for understanding and enhancing those reasoning and generation capabilities of pretrained LMs. I’ll first formulate the counterfactual reasoning problem in language generation, and build the first large-scale testbed for measuring LMs on this problem. I’ll then present new language decoding methods to tackle the limitation of pretrained LMs due to their left-to-right generation nature, allowing them to perform the mentioned rich reasoning activities by incorporating arbitrary constraints.
Bio: Lianhui Qin is a Ph.D. student at University of Washington working with Yejin Choi. Her research interests lie in natural language processing and machine learning, especially commonsense reasoning in text and conversation generation. She is the recipient of 2021 Microsoft Research PhD Fellowship.
PAST SEMINARS
Summer 2021
June 1, 2021
Interpretability in NLP with Differentiable Masking
Speaker: Nicola De Cao
Abstract: Attribution methods assess the contribution of inputs to the model prediction. One way to do so is erasure: a subset of inputs is considered irrelevant if it can be removed without affecting the prediction. Though conceptually simple, erasure's objective is intractable and approximate search remains expensive with modern deep NLP models. Erasure is also susceptible to the hindsight bias: the fact that an input can be dropped does not mean that the model `knows' it can be dropped. The resulting pruning is over-aggressive and does not reflect how the model arrives at the prediction. To deal with these challenges, we introduce Differentiable Masking. DiffMask learns to mask-out subsets of the input while maintaining differentiability.
Bio: Nicola De Cao is a third year Ph.D. candidate at the Institute for Logic, Language and Computation (ILLC) at the University of Amsterdam and permanent visiting of the School of Informatics at the University of Edinburgh. He is part of the EdinburghNLP group. He previously interned at Facebook AI Research (FAIR) in London and at Amazon Research in Berlin. Under the supervision of Prof. Ivan Titov, his work focuses on Machine Reading Comprehension also know as Question Answering. More generally, he is interested in (semi-)supervised and unsupervised deep neural network applications in combination with reasoning and reinforcement methods to approach Natural Language Understanding.
June 15, 2021
Advances in Text Generation and the Perils of its Automatic Evaluation
Speaker: Kalpesh Krishna
Abstract: Recent advances in large-scale language modeling have significantly improved the capability of natural language generation (NLG) systems, opening up several new applications. Unfortunately, evaluating NLG systems remains challenging, making it hard to measure meaningful progress. In this talk I will present our recent efforts in building & evaluating NLG systems for 1) unsupervised sentence-level style transfer; 2) paragraph-length abstractive question answering with the ELI5 dataset. We build NLG systems (using large language models with paraphrase generation & retrieval respectively) that significantly outperform prior state-of-the-art using "standard" automatic metrics. Unfortunately, we discover several issues with the current evaluation setups, including trivial baselines (like input copying) which can game these standard metrics, even outperforming real systems. Along the way I will discuss our efforts towards rectifying these issues, and conclude with a brief mention of other projects working towards more robust NLG evaluation.
Bio: Kalpesh Krishna is a third year PhD student at UMass Amherst, advised by Prof. Mohit Iyyer. He is primarily interested in natural language generation and the security of NLP systems. Before coming to UMass, he completed a bachelors' degree at IIT Bombay, advised by Prof. Preethi Jyothi. He has also spent time interning at Google, TTI-Chicago and Mozilla. His research is supported by a Google PhD Fellowship, which was awarded in 2021.
June 29, 2021
What Can We Learn From Vulnerabilities of NLP Models?
Speaker: Eric Wallace
Abstract: Today’s neural NLP models achieve high accuracy on in-distribution data and are being widely deployed in production systems. This talk will discuss attacks on such models that not only expose worrisome security & privacy vulnerabilities, but also provide new perspectives into how and why neural models work. Concretely, I will show how realistic adversaries can extract secret training data, steal model weights, and manipulate test predictions, all using black-box access to models at either training- or test-time. These attacks will reveal different insights, including how NLP models rely on dataset biases and spurious correlations, and how their training dynamics impact memorization of examples. Finally, I will discuss defenses against these vulnerabilities and suggest practical takeaways for developing secure NLP systems.
Bio: Eric Wallace is a 2nd year PhD student at UC Berkeley advised by Dan Klein and Dawn Song. His research interests are in making NLP models more secure, private, and robust. Eric's work received the best demo award at EMNLP 2019.
July 13, 2021
Robust Reasoning Comprehension
Speaker: Dheeru Dua
Abstract:With large scale reading comprehension datasets and complex model architectures question answering seems like a solved task. However, recent works have shown the brittleness of these models to even very conservative changes outside the provided data distribution, due to existence of spurious correlations in the data that hamper characterization of the task well. In this talk, we will discuss three ways to alleviate bias learned by model due to confounding artifacts in data, and, hence, learn robust question answering models. First, we show how intermediate annotations can help reduce bias in compositional reading comprehension. Second, we will discuss how discriminative QA models exploit spurious correlations in data whereas generative models are more robust as they simulate the data generation process in its entirety. Finally, we will talk about the problems with IID (independently and identically distributed) assumption while training with data containing strong language priors and how to train models under a non-IID setup.
Bio: Dheeru Dua is a PhD candidate at University of California, Irvine. Her research focuses on how to make reading comprehension models more robust towards confounding artifacts in question answering datasets. She has been a Hasso Plattner Institut fellow since April 2020. Prior to PhD, she received her Masters degree from Language Technologies Institute at Carnegie Mellon University. Dheeru has held research positions at IBM, Amazon and Facebook.
July 27, 2021
Towards Robust and Interpretable Machine Reasoning over Text
Speaker: Mor Geva
Abstract:Large pre-trained language models (PTLMs) are at the core of natural language understanding systems, but they fail to learn many reasoning skills when trained with standard language modeling objectives. Numerous benchmarks have been created in order to teach and evaluate target reasoning abilities. However, collecting data is challenging and expensive, and fine-tuning on existing datasets often leads to unexpected model behaviour and poor generalization beyond the training distribution. Moreover, because these models are usually trained in an end-to-end fashion, there is little interpretability as to their inner-workings, making it challenging to understand their predictions. In this talk, I will show how structured representations and multi-task training can be used to tackle these fundamental challenges. First, I will describe a method that uses synthetic data generated from a grammar to endow PTLMs with numerical reasoning skills. Next, I will show how question decomposition representations can be used to automatically transform the semantics of questions, and demonstrate the utility of this method to assess the reasoning abilities of reading comprehension models. Last, I will discuss an emergent behaviour phenomenon in multi-task PTLMs, and show how it can be harnessed for interpretability and generalization of reading comprehension skills.
Bio: Mor Geva is a PhD candidate (direct track) at Tel Aviv University and a research intern at the Allen Institute for AI, advised by Prof. Jonathan Berant. Her research focuses on developing systems that can reason over text in a robust and interpretable manner. During her PhD, Mor interned at Google AI and Microsoft Media AI. She was recently awarded the Dan David prize for graduate students in the field of AI and the Deutsch Prize for excellence in PhD studies.
August 10, 2021
Language model evaluation beyond perplexity
Speaker: Clara Meister
Abstract:In this talk, I will present our recent work that proposes an alternate approach to quantifying how well language models learn natural language. Specifically, we ask how well language models learn the statistical tendencies of natural language. To answer this question, we analyze whether text generated from language models exhibits the statistical tendencies present in the human-generated text on which they were trained. We find that neural language models appear to learn only a subset of the tendencies considered, but align much more closely with empirical trends than proposed theoretical distributions (when present). Further, the fit to different distributions is highly-dependent on both model architecture and generation strategy. As concrete examples, text generated under the nucleus sampling scheme adheres more closely to the type--token relationship of natural language than text produced using standard ancestral sampling; text from LSTMs reflects the natural language distributions over length, stopwords, and symbols surprisingly well.
Bio: Clara is a second year PhD in Computer Science with Professor Ryan Cotterell at ETH Zürich. She received her Master’s and Bachelor’s degrees in Computational and Mathematical Engineering from Stanford University. Her research focuses include decoding methods for language generators, analysis techniques for language models, and the general application of statistical methods to NLP.