GT NLP Seminar


GT NLP Seminar is an interactive talk series held bi-weekly, on Fridays 12:00 pm to 1:00 pm, where students/faculty/staff with interest in Natural Language Processing at Georgia Tech meet together, have lunch and listen to talks about recent NLP research in a wide range of topics. Our speakers come from both inside Georgia Tech or outside, and will usually give a 45-minute talk, followed by a 15-minute QA/discussion session. Currently, GT NLP Seminar is held remotely through Bluejeans.

All are welcome!

If you are interested in join the GT NLP Seminar, please be sure to subscribe to the mailing list ( Future emails about the seminars go to the mailing list only. You could email Amal Alabdulkarim ( or William Held ( for subscribing to the mailing list.

Subscribe to our Fall 2021 Calendar!

Schedule for Fall 2021

The early vision of AI included the goal of endowing intelligent systems with human-like language processing capabilities. This proved harder than expected, leading the vast majority of natural language processing practitioners to pursue less ambitious, shorter-term goals. Whereas the utility of human-like language processing is unquestionable, its feasibility is quite justifiably questioned. In this talk, we will not only argue that some approximation of human-like language processing is possible, we will present a program of R&D that is working on making it a reality. This vision, as well as progress to date, is described in the book Linguistics for the Age of AI (MIT Press, 2021).

  • 09/17/2021 Micha Elsner In Search of Abstract Morphological Structure

In many languages, grammatical distinctions are signaled by different morphological inflections (for example, the plural of cat is cats). Computational linguists have become adept at predicting the inflectional forms of unseen words using sequence-to-sequence models. When data is plentiful, models presumably do this by generalizing across similar words (from training sample dog ~ dogs to test sample cat ~ ?). But models remain relatively successful when there is not much data (McCarthy et al 2019), which suggests that generalization can also occur across dissimilar words, across non-target word forms and even across languages. The predictive relationships which allow such generalization to succeed are called abstract morphological structures. These have been a frequent subject of linguistic inquiries, but it is not yet clear how they can be automatically described and quantified, or how to encourage computer models to use them more effectively to make predictions. I will discuss several related efforts to discover and use abstract morphological structures within a computational framework. I will begin with the discovery problem of clustering together related word forms without any gold standard grammatical labels at all, and the pitfalls inherent in this still not-yet-solved problem. I will then discuss a case study of the cross-cutting generalizations inherent in the Spanish verb system, and why such structures make the problem difficult. Finally, I will show how neural network models which solve word-to-word analogy problems may offer some advantages over the standard sequence-to-sequence framework. (These papers were co-authored with Alex Erdmann, Grace LeFevre, Andrea Sims and others.)

  • 10/01/2021 Yejin Choi Knowledge is Power: Symbolic Knowledge Distillation, Commonsense Morality, and Multimodal Script Knowledge

Scale appears to be the winning recipe in today's leaderboards. And yet, extreme-scale neural models are still brittle to make errors that are often nonsensical and even counterintuitive. In this talk, I will argue for the importance of knowledge, and demonstrate how smaller models developed in academia can still have an edge over larger industry-scale models, if powered with knowledge.

First, I will introduce "symbolic knowledge distillation", a new framework to distill larger neural language models into smaller commonsense models, which leads to a machine-authored KB that wins, for the first time, over a human-authored KB in all criteria: scale, accuracy, and diversity. Next, I will introduce a new conceptual framework for language-based commonsense moral reasoning, and discuss how we can teach neural language models about complex social norms and human values, so that the machine can reason that “helping a friend” is generally a good thing to do, but “helping a friend spread fake news” is not. Finally, I will discuss an approach to multimodal script knowledge, which leads to new SOTA performances on a dozen leaderboards that require grounded commonsense reasoning.

Neural network language models have had great success in learning language processing solutions by encoding language statistics. These solutions have been shown to produce good approximations of human behavior in many situations (e.g., predicting that a particular construction should be considered less acceptable than another). However, these solutions are also very sample inefficient and they are brittle outside their training domains. This talk will highlight a number of aspects of human language processing that are unlikely to be learnable from language modeling statistics precisely because the domains of language to which we have access during training are distinct from the domains in which we would like NLP models to operate. I will provide some background from psycholinguistics to discuss different ways language models are likely to be inherently inadequate to model human language processing. This framing may be helpful when analyzing, designing, and fine-tuning models in order to achieve human-like language processing

  • 10/22/2021 Sam Bowman When Combating Hype, Proceed with Caution

In an effort to avoid reinforcing widespread hype about the capabilities of state-of-the-art language technology, researchers have developed practices in framing and citation that serve to deemphasize the field’s successes. Though well-meaning, these practices often yield misleading or even false claims about the limits of our best technology. This is a problem, and it may be more serious than it looks: It limits our ability to mitigate short-term harms from NLP deployments and it limits our ability to prepare for the potentially enormous impacts of more distant future advances. This paper urges researchers to be careful about these claims and suggests some research directions and communication strategies that will make it easier to avoid or rebut them.

Schedule for Spring 2021

  • 1/29/2021 Gabriel Stanovsky NLP in the Wild: From Ancient Akkadian to Biochemistry Protocols

I’ll present two recent projects showing the range of domains I’d like to tackle in my work to help experts with diverse real-world research questions. First, I’ll present a model capable of filling in missing parts in ancient cuneiform tablets written thousands of years ago in now-extinct languages (Akkadian and Sumerian). Due to deterioration over time, these excavated tablets are often broken, faded, or cracked, making it hard for historians and archaeologists to read and interpret them. We show that by leveraging large-scale language models pretrained on modern texts we achieve good results in restoring missing parts in various domains and time periods, in the automatic evaluation as well as human analysis. Second, I will discuss a novel document-level representation of wet lab biochemistry protocols geared towards experiment automation and reproducibility, addressing challenges such as cross-sentence relations, long-range coreference, grounding, and implicit arguments. I’ll show examples from a manually-annotated corpus of complex lab protocols, and present graph-prediction models that form the first step towards fully executable lab protocols.

Artificial Intelligence has made unprecedented progress in the past decade. However, there still remains a large gap between the decision-making capabilities of humans and machines. In this talk, I will investigate two factors to explain why. First, I will discuss the presence of undesirable biases in datasets, which ultimately hurt generalization. I will then present bias mitigation algorithms that boost the ability of AI models to generalize to unseen data. Second, I will explore task-specific prior knowledge which aids robust generalization, but is often ignored when training modern AI architectures. Throughout this discussion, I will focus my attention on language applications, and will show how certain underlying structures can provide useful biases for inferring meaning in natural language. I will conclude with a discussion of how the broader framework of dataset and model biases will play a critical role in the societal impact of AI, going forward.

  • 2/26/2021 Mohit Iyyer Challenges in evaluating natural language generation systems

Recent advances in neural language modeling have opened up a variety of exciting new text generation applications. However, evaluating systems built for these tasks remains difficult. Most prior work relies on a combination of automatic metrics such as BLEU (which are often uninformative) and crowdsourced human evaluation (which are also usually uninformative, especially when conducted without careful task design). In this talk, I focus on two specific applications: (1) unsupervised sentence-level style transfer and (2) long-form question answering. I will go over our recent work on building models for these systems and then describe the ensuing struggles to properly compare them to baselines. In both cases, we identify (and propose solutions for) issues with existing evaluations, including improper aggregation of multiple metrics, missing control experiments with simple baselines, and high cognitive load placed on human evaluators. I'll conclude by briefly discussing our work on machine-in-the-loop text generation systems, in which both humans and machines participate in the generation process, where reliable human evaluation becomes much more feasible.

  • 3/12/2021 Sameer Singh Evaluating and Testing Natural Language Processing Models

Current evaluation of natural language processing (NLP) systems, and much of machine learning, primarily consists of measuring the accuracy on held-out instances of the dataset. Since the held-out instances are often gathered using similar annotation process as the training data, they include the same biases that act as shortcuts for machine learning models, allowing them to achieve accurate results without requiring actual natural language understanding. Thus held-out accuracy is often a poor proxy for measuring generalization, and further, aggregate metrics have little to say about where the problem may lie.

In this talk, I will introduce a number of approaches we are investigating to perform a more thorough evaluation of NLP systems. I will first provide an overview of automated techniques for perturbing instances in the dataset that identify loopholes and shortcuts in NLP models, including semantic adversaries and universal triggers. I will then describe recent work in creating comprehensive and thorough tests and evaluation benchmarks for NLP that aim to directly evaluate comprehension and understanding capabilities. The talk will cover a number of NLP tasks, including sentiment analysis, textual entailment, paraphrase detection, and question answering.

With large-scale pre-trained models, natural language processing as a field has made giant leaps in a wide range of tasks. But how are we doing on those that require a deeper understanding of discourse pragmatics, tasks that we humans use language to accomplish on a daily basis? We discuss a case study of advice giving in online forums, and reveal rich discourse strategies in the language of advice. Understanding advice would equip systems with a better grasp of language pragmatics, yet we show that advice identification is challenging for modern NLP models. So then --- how do people comprehend at the discourse level? We tackle this via a novel question generation paradigm, by capturing questions elicited from readers as they read through a text sentence by sentence. Because these questions are generated while the readers are processing the information, they are naturally inquisitive, with a variety of types such as causal, elaboration, and background. Finally, we briefly showcase a new task that requires high level inferences when the target audience of a document changes: providing elaborations and explanations during text simplification.

  • 4/9/2021 Yulia Tsvetkov Proactive NLP: How to Prevent Social and Ethical Problems in NLP Systems?

Much NLP literature has examined social biases in datasets, algorithms, or model performance, and the negative pipeline between them: models absorb and amplify data biases, which causes representational harms and impacts performance. In this talk, I will present studies that look further up the pipeline and rely on the assumption that biases in data originate in human cognition. I will discuss several lightly supervised, interpretable approaches—grounded in social psychology and causal reasoning—to detect implicit social bias in written discourse and narrative text. Together, these approaches aim at providing people-centered text analytics, to proactively pinpoint and explain potentially biased framings—across languages, data domains, and social contexts—before these biased framings propagate into downstream AI systems.

  • 4/16/2021 Lu Wang Building Controllable and Efficient Natural Language Generation Systems

Large pre-trained language models have enabled rapid progress in natural language generation (NLG). However, existing NLG systems still largely lack control over the content to be generated, and thus suffer from incoherence and unfaithfulness. In this talk, I will first introduce a neural generation framework that separately tackles the challenges of content planning and surface realization, built upon large models. Experiment results show that the model is more effective in various tasks: constructing persuasive arguments, writing opinion articles, and generating news stories. It alleviates existing models' issue of producing bland and incorrect text, a result of lacking global planning. I then discuss how to extend the model to conduct dynamic content planning with mixed language models. Finally, I present our recent long document summarization work where efficient attentions are designed to handle more than 10k tokens while prior work can only process hundreds of words.

Schedule for Fall 2020

  • 9/11/2020 Lara Martin Dungeons and Discourse: Using Computational Storytelling to Look at Natural Language Use

Although we are currently riding a technological wave of personal assistants, many of these agents still struggle to communicate appropriately. Humans are natural storytellers, so it would be fitting if artificial intelligence could tell stories as well. Automated story generation is an area of AI research that aims to create agents that tell “good” stories. Previous story generation systems use planning and discrete symbols to create new stories, but these systems require a vast amount of knowledge engineering. The stories created by these systems are coherent, but only a finite set of stories can be generated. In contrast, very large neural language models, such as transformers, have made the headlines in the natural language processing community. Though impressive on the surface, these models begin to lose coherence over time. My research looks at various techniques of automated story generation, culminating in the blend of symbolic and neural approaches. In this talk, I show how a neuro-symbolic model can provide more interesting and coherent stories than those from solely neural or symbolic systems.

  • 9/25/2020 Ian Stewart Through the looking glass: what NLP can reveal about sociolinguistic variation

People adapt their language use in different social contexts to meet communicative needs: a person may use the word <going> with colleagues and <goin'> with their close friends. Sociolinguistics researchers investigate the systematic variation in language use across different contexts to determine the social meaning of variation, such as how people change their word choices for different audiences. While traditional sociolinguistics investigates variation in spoken language, computational sociolinguistics relies on natural language processing and statistical methods to investigate written language in online discussions. This talk will explore how NLP can help isolate sociolinguistic phenomena that would otherwise go understudied in spoken contexts, and more broadly how NLP can help social science research.

  • 10/9/2020 Maarten Sap Reasoning about Social Dynamics and Social Bias in Language

Humans easily make inferences to reason about the social and power dynamics of situations (e.g., stories about everyday interactions), but such reasoning is still a challenge for modern NLP systems. In this talk, I will address how we can make machines reason about social commonsense and social biases in text, and how this reasoning could be applied in downstream applications. In the first part, I will discuss PowerTransformer, our new unsupervised model for controllable debiasing of text through the lens of connotation frames of power and agency. Trained using a combined reconstruction and paraphrasing objective, this model can rewrite story sentences such that its characters are portrayed with more agency and decisiveness. After establishing its performance through automatic and human evaluations, we show how PowerTransformer can be used to mitigate gender bias in portrayals of movie characters. Then, I will introduce Social Bias Frames, a conceptual formalism that models the pragmatic frames in which people project social biases and stereotypes onto others to reason about biased or harmful implications in language. Using a new corpus of 150k structured annotations, we show that models can learn to reason about high-level offensiveness of statements, but struggle to explain why a statement might be harmful. I will conclude with future directions for better reasoning about social dynamics and social biases.

  • 10/23/2020 Greg Durett Addressing the Paradox of Flexible but Reliable Text Generation

Text generation is a paradox. We want our generation models to imitate patterns in training data, but also have the flexibility to work in new settings and behave in new ways. We want our models to say creative things, but also be reliable and factual with respect to their inputs. How can we achieve these dual goals with a single system? Our work focuses on generation systems that are controlled and assessed in fine-grained ways: control mechanisms can help enumerate diverse inputs, which are then assessed according to our desired criteria. I will describe work in paraphrasing and summarization where intermediate syntactic control mechanisms can make our models more expressive. I will then describe how to assess these models' outputs from the standpoint of factuality and grammaticality in a fine-grained way, localizing errors to individual words and dependency arcs. By achieving diversity and then enforcing quality, we can build systems that are simultaneously flexible and reliable enough to handle a range of generation settings.

  • 11/6/2020 William Wang Self-Supervised Language-and-Vision Reasoning

A key challenge for Artificial Intelligence research is to go beyond static observational data, and consider more challenging settings that involve dynamic actions and incremental decision-making. In this talk, I will introduce our recent work on visually-grounded language reasoning via the studies of vision-and-language navigation. In particular, I will emphasize three benefits of self-supervised learning that: (1) improves generalization in unseen environments; (2) creates counterfactuals to augment observational data; (3) enables transfer learning for challenging settings. I will conclude by briefly introducing other reasoning problems that my groups are working on recently.

  • 11/20/2020 Junjie Hu Cross-lingual Generalization, Alignment and Applications

While text on the web is an invaluable information source, this text is not available in large quantities for most languages in the world. It is even difficult to ask native speakers to annotate text in most languages for training individual machine learning models. With recent advances in multilingual machine learning models, we are able to transfer knowledge across languages in one single model, and apply the model to deal with text written in more than 100 languages. However, a benchmark that enables the comprehensive evaluation of such models on a diverse range of languages and tasks is still missing. In this talk, I will focus on analyzing cross-lingual generalization effects in these models, and propose methods to improve the performance in real applications. Specifically, I will start with introducing Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual models across 40 languages and 9 tasks. Secondly, I will show that a compact multilingual model trained on parallel translation text can align multilingual representations, performing on a par with or even better than much larger models on NLP tasks such as sentence classification, and retrieval. Finally, I will present our recent translation initiative for COVID-19, a multilingual translation benchmark in 35 different languages, in order to foster the development of tools and resources for improving access to information about COVID19 in these languages.


The GT NLP Seminar is organized by Amal Alabdulkarim, William Held, and Dr. Diyi Yang.