GT NLP Seminar


GT NLP Seminar is an interactive talk series held bi-weekly, on Fridays 12:00 pm to 1:00 pm, where students/faculty/staff with interest in Natural Language Processing at Georgia Tech meet together, have lunch and listen to talks about recent NLP research in a wide range of topics. Our speakers come from both inside Georgia Tech or outside, and will usually give a 45-minute talk, followed by a 15-minute QA/discussion session. Currently, GT NLP Seminar is held remotely through Bluejeans.

All are welcome!

If you are interested in join the GT NLP Seminar, please be sure to subscribe to the mailing list ( Future emails about the seminars go to the mailing list only. You could email Amal Alabdulkarim ( or William Held ( for subscribing to the mailing list.

Subscribe to our Spring 2022 Calendar!

Schedule for Spring 2022

  • 02/11/2022 Colin Raffel A call to build models like we build open-source software
    Large pre-trained models have become a cornerstone of modern ML pipelines thanks to the fact that they facilitate improved performance with less labeled data on downstream tasks. However, these models are typically created by a resource-rich research group that unilaterally decides how a given model should be built, trained, and released, after which point it is left as-is until a better pre-trained model comes along to completely supplant it. In contrast, open-source development has proven that it is possible for a distributed community of contributors to work together to iteratively build complex and widely-used software. This kind of large-scale distributed collaboration is made possible through a mature set of tools including version control, continuous integration, merging, and more. In this talk, I will present a vision for building machine learning models in the way that open-source software is developed, including preliminary work from my lab on "merging" and "patching" models. I will also give some insight into the future work required to make this vision a reality.

  • 02/18/2022 Chenguang Zhu How We Achieved Human Parity in CommonsenseQA -- Fusing Knowledge into Language Models
    Large-scale language models (LM) have achieved great results in many NLP applications. However, there is still a non-negligible gap compared with human's capability. One of the key reasons is the lack of external knowledge integration. We argue that language models should be equipped with knowledge to better understand world common sense and relations. In this talk, I will introduce how to represent and fuse knowledge into language models, which includes three steps: 1) Ground language into related knowledge, 2) Represent knowledge, and 3) Fuse knowledge representation into language model. We demonstrate our proposed knowledge-boosted LM in the following work: i) achieving human parity in Commonsense Q&A, ii) Dictionary-boosted Language Model, and iii) Knowledge-text Co-pretraining.

  • 03/11/2022 Divyansh Kaushik Robustifying NLP with Humans in the Loop
    Most machine learning methods address prediction problems under restrictive assumptions but when applied to drive decisions in environments where those assumptions are violated. This disconnect between what the methodological framework offers and the desired applications have caused confusion both among researchers (who often lack the right formalism to tackle these problems coherently), practitioners (who have developed a folks tradition of ad hoc practices for deploying and monitoring systems), and regulators (who have applied frameworks designed for biomedical ethics to machine learning). In this talk I'll discuss some of these issues affecting the application of machine learning and our fledgling efforts to bridge some of these gaps by injecting causal knowledge via humans in the loop, along with some critical disconnects between how humans are employed in ML research to perform various tasks and the regulatory framework around research ethics, and its implications.

  • 03/18/2022 Yonatan Bisk Following Instructions and Asking Questions
    As we move towards the creation of embodied agents that understand natural language, several new challenges and complexities arise for grounding (e.g. complex state-spaces), planning (e.g. long horizons), and social interaction (e.g. asking for help or clarifications). In this talk, I'll discuss several recent results both on improvements to embodied instruction following within ALFRED and initial steps towards building agents that ask questions or model theory-of-mind.

  • 04/01/2022 Iz Beltagy Efficient Scaling of Language Model Pretraining
    As language models get larger, they get more expressive and they start to be competent at new tasks like generation and zero-shot prediction. However, scaling language models to larger sizes is getting more expensive and more challenging. In this talk, I will talk about two projects that focus on language model scaling.
    The first is a Staged Training method for efficient scaling. Our pretraining method starts by training a small model then incrementally grows the model size until it reaches the target size. We develop growth operators that are "loss-preserving" and "training-dynamics-preserving", and develop an optimal training schedule. We show that our staged training method can save up to 22% of the pretraining cost.
    The second project is addressing the question; given that different architectures are better for different applications, which transformer architecture is best to train if you have the budget to train only one large model?. We compare various architectures (encoder-decoder vs. decoder-only) and pretraining objectives (autoregressive vs. span corruption), and show that decoder-only autoregressive has the best zero-shot performance while encoder-decoder span-corruption has the best zero-shot performance after supervised multi-task pretraining. Then we propose a pretraining regime that archives the best of both worlds without the need to train two separate models. The recommendations from this work are being applied to the training of the first publicly available ~200 Billion parameter language model as part of the BigScience project.

  • 04/08/2022 CANCELLED

  • 04/22/2022 Yoon Kim TBD

Schedule for Fall 2021

  • 09/03/2021 Marjorie McShane and Sergei Nirenburg Toward Broad and Deep Language Understanding for Intelligent Systems
    The early vision of AI included the goal of endowing intelligent systems with human-like language processing capabilities. This proved harder than expected, leading the vast majority of natural language processing practitioners to pursue less ambitious, shorter-term goals. Whereas the utility of human-like language processing is unquestionable, its feasibility is quite justifiably questioned. In this talk, we will not only argue that some approximation of human-like language processing is possible, we will present a program of R&D that is working on making it a reality. This vision, as well as progress to date, is described in the book Linguistics for the Age of AI (MIT Press, 2021).

  • 09/17/2021 Micha Elsner In Search of Abstract Morphological Structure
    In many languages, grammatical distinctions are signaled by different morphological inflections (for example, the plural of cat is cats). Computational linguists have become adept at predicting the inflectional forms of unseen words using sequence-to-sequence models. When data is plentiful, models presumably do this by generalizing across similar words (from training sample dog ~ dogs to test sample cat ~ ?). But models remain relatively successful when there is not much data (McCarthy et al 2019), which suggests that generalization can also occur across dissimilar words, across non-target word forms and even across languages. The predictive relationships which allow such generalization to succeed are called abstract morphological structures. These have been a frequent subject of linguistic inquiries, but it is not yet clear how they can be automatically described and quantified, or how to encourage computer models to use them more effectively to make predictions. I will discuss several related efforts to discover and use abstract morphological structures within a computational framework. I will begin with the discovery problem of clustering together related word forms without any gold standard grammatical labels at all, and the pitfalls inherent in this still not-yet-solved problem. I will then discuss a case study of the cross-cutting generalizations inherent in the Spanish verb system, and why such structures make the problem difficult. Finally, I will show how neural network models which solve word-to-word analogy problems may offer some advantages over the standard sequence-to-sequence framework. (These papers were co-authored with Alex Erdmann, Grace LeFevre, Andrea Sims and others.)

  • 10/01/2021 Yejin Choi Knowledge is Power: Symbolic Knowledge Distillation, Commonsense Morality, and Multimodal Script Knowledge
    Scale appears to be the winning recipe in today's leaderboards. And yet, extreme-scale neural models are still brittle to make errors that are often nonsensical and even counterintuitive. In this talk, I will argue for the importance of knowledge, and demonstrate how smaller models developed in academia can still have an edge over larger industry-scale models, if powered with knowledge.
    First, I will introduce "symbolic knowledge distillation", a new framework to distill larger neural language models into smaller commonsense models, which leads to a machine-authored KB that wins, for the first time, over a human-authored KB in all criteria: scale, accuracy, and diversity. Next, I will introduce a new conceptual framework for language-based commonsense moral reasoning, and discuss how we can teach neural language models about complex social norms and human values, so that the machine can reason that “helping a friend” is generally a good thing to do, but “helping a friend spread fake news” is not. Finally, I will discuss an approach to multimodal script knowledge, which leads to new SOTA performances on a dozen leaderboards that require grounded commonsense reasoning.

  • 10/15/2021 Marten van Schijndel Language statistics won't solve language processing
    Neural network language models have had great success in learning language processing solutions by encoding language statistics. These solutions have been shown to produce good approximations of human behavior in many situations (e.g., predicting that a particular construction should be considered less acceptable than another). However, these solutions are also very sample inefficient and they are brittle outside their training domains. This talk will highlight a number of aspects of human language processing that are unlikely to be learnable from language modeling statistics precisely because the domains of language to which we have access during training are distinct from the domains in which we would like NLP models to operate. I will provide some background from psycholinguistics to discuss different ways language models are likely to be inherently inadequate to model human language processing. This framing may be helpful when analyzing, designing, and fine-tuning models in order to achieve human-like language processing.

  • 10/22/2021 Sam Bowman When Combating Hype, Proceed with Caution
    In an effort to avoid reinforcing widespread hype about the capabilities of state-of-the-art language technology, researchers have developed practices in framing and citation that serve to deemphasize the field’s successes. Though well-meaning, these practices often yield misleading or even false claims about the limits of our best technology. This is a problem, and it may be more serious than it looks: It limits our ability to mitigate short-term harms from NLP deployments and it limits our ability to prepare for the potentially enormous impacts of more distant future advances. This paper urges researchers to be careful about these claims and suggests some research directions and communication strategies that will make it easier to avoid or rebut them.

  • 11/12/2021 Ana Marasovic Explanation Selection Through The Lens of Free-Text and Contrastive Explanations
    One approach to realizing some of the trustworthy AI goals is through local explanations—justifications of models' individual predictions. A dominant approach to producing local explanations in NLP is to identify *all* factors, usually among input tokens, that cause model predictions. However, people rarely expect an explanation that consists of the actual and complete causal chain that led to an event. Instead, people expect selecting one or two causes from the causal chain as the explanation. Explanation selection is done to (i) simplify explanations and consequently reduce cognitive load of understanding them, and (ii) to meet human expectations in order to truly give people agency. In my talk, I will focus on free-text and contrastive explanations of NLP models to illustrate how explanation selection can be done today.

  • 12/03/2021 Dongyeop Kang Fixing the NLP Pipeline with Humans and Data
    NLP systems trained on standard machine learning pipelines; annotation, learning, and evaluation, are limited to causing various problems; for instance, the dataset collected from crowd workers often contains annotation artifacts or repeating patterns, reducing their generalizability and robustness; as the systems are deployed to real-world users, they are not well controlled, interpreted, or interacted with real users. In order to address these problems caused by the ML pipeline, I will present recent work on human-centric and data-centric approaches by the Minnesota NLP group. For the human-centric aspect, we collect human perceptions of linguistic styles and then make a model that mimics the way humans perceive styles. We then develop interactive NLP systems that assist scholars in reading and writing academic papers. In the data-centric NLP, we model data informativeness based on various training dynamics and then use them to find new important data points for data augmentation and annotation. With more human involvement and consideration of data dynamics, we believe the traditional ML-driven NLP pipeline becomes more robust, interactive, and informative-effective.

Schedule for Spring 2021

  • 1/29/2021 Gabriel Stanovsky NLP in the Wild: From Ancient Akkadian to Biochemistry Protocols

I’ll present two recent projects showing the range of domains I’d like to tackle in my work to help experts with diverse real-world research questions. First, I’ll present a model capable of filling in missing parts in ancient cuneiform tablets written thousands of years ago in now-extinct languages (Akkadian and Sumerian). Due to deterioration over time, these excavated tablets are often broken, faded, or cracked, making it hard for historians and archaeologists to read and interpret them. We show that by leveraging large-scale language models pretrained on modern texts we achieve good results in restoring missing parts in various domains and time periods, in the automatic evaluation as well as human analysis. Second, I will discuss a novel document-level representation of wet lab biochemistry protocols geared towards experiment automation and reproducibility, addressing challenges such as cross-sentence relations, long-range coreference, grounding, and implicit arguments. I’ll show examples from a manually-annotated corpus of complex lab protocols, and present graph-prediction models that form the first step towards fully executable lab protocols.

Artificial Intelligence has made unprecedented progress in the past decade. However, there still remains a large gap between the decision-making capabilities of humans and machines. In this talk, I will investigate two factors to explain why. First, I will discuss the presence of undesirable biases in datasets, which ultimately hurt generalization. I will then present bias mitigation algorithms that boost the ability of AI models to generalize to unseen data. Second, I will explore task-specific prior knowledge which aids robust generalization, but is often ignored when training modern AI architectures. Throughout this discussion, I will focus my attention on language applications, and will show how certain underlying structures can provide useful biases for inferring meaning in natural language. I will conclude with a discussion of how the broader framework of dataset and model biases will play a critical role in the societal impact of AI, going forward.

  • 2/26/2021 Mohit Iyyer Challenges in evaluating natural language generation systems

Recent advances in neural language modeling have opened up a variety of exciting new text generation applications. However, evaluating systems built for these tasks remains difficult. Most prior work relies on a combination of automatic metrics such as BLEU (which are often uninformative) and crowdsourced human evaluation (which are also usually uninformative, especially when conducted without careful task design). In this talk, I focus on two specific applications: (1) unsupervised sentence-level style transfer and (2) long-form question answering. I will go over our recent work on building models for these systems and then describe the ensuing struggles to properly compare them to baselines. In both cases, we identify (and propose solutions for) issues with existing evaluations, including improper aggregation of multiple metrics, missing control experiments with simple baselines, and high cognitive load placed on human evaluators. I'll conclude by briefly discussing our work on machine-in-the-loop text generation systems, in which both humans and machines participate in the generation process, where reliable human evaluation becomes much more feasible.

  • 3/12/2021 Sameer Singh Evaluating and Testing Natural Language Processing Models

Current evaluation of natural language processing (NLP) systems, and much of machine learning, primarily consists of measuring the accuracy on held-out instances of the dataset. Since the held-out instances are often gathered using similar annotation process as the training data, they include the same biases that act as shortcuts for machine learning models, allowing them to achieve accurate results without requiring actual natural language understanding. Thus held-out accuracy is often a poor proxy for measuring generalization, and further, aggregate metrics have little to say about where the problem may lie.

In this talk, I will introduce a number of approaches we are investigating to perform a more thorough evaluation of NLP systems. I will first provide an overview of automated techniques for perturbing instances in the dataset that identify loopholes and shortcuts in NLP models, including semantic adversaries and universal triggers. I will then describe recent work in creating comprehensive and thorough tests and evaluation benchmarks for NLP that aim to directly evaluate comprehension and understanding capabilities. The talk will cover a number of NLP tasks, including sentiment analysis, textual entailment, paraphrase detection, and question answering.

With large-scale pre-trained models, natural language processing as a field has made giant leaps in a wide range of tasks. But how are we doing on those that require a deeper understanding of discourse pragmatics, tasks that we humans use language to accomplish on a daily basis? We discuss a case study of advice giving in online forums, and reveal rich discourse strategies in the language of advice. Understanding advice would equip systems with a better grasp of language pragmatics, yet we show that advice identification is challenging for modern NLP models. So then --- how do people comprehend at the discourse level? We tackle this via a novel question generation paradigm, by capturing questions elicited from readers as they read through a text sentence by sentence. Because these questions are generated while the readers are processing the information, they are naturally inquisitive, with a variety of types such as causal, elaboration, and background. Finally, we briefly showcase a new task that requires high level inferences when the target audience of a document changes: providing elaborations and explanations during text simplification.

  • 4/9/2021 Yulia Tsvetkov Proactive NLP: How to Prevent Social and Ethical Problems in NLP Systems?

Much NLP literature has examined social biases in datasets, algorithms, or model performance, and the negative pipeline between them: models absorb and amplify data biases, which causes representational harms and impacts performance. In this talk, I will present studies that look further up the pipeline and rely on the assumption that biases in data originate in human cognition. I will discuss several lightly supervised, interpretable approaches—grounded in social psychology and causal reasoning—to detect implicit social bias in written discourse and narrative text. Together, these approaches aim at providing people-centered text analytics, to proactively pinpoint and explain potentially biased framings—across languages, data domains, and social contexts—before these biased framings propagate into downstream AI systems.

  • 4/16/2021 Lu Wang Building Controllable and Efficient Natural Language Generation Systems

Large pre-trained language models have enabled rapid progress in natural language generation (NLG). However, existing NLG systems still largely lack control over the content to be generated, and thus suffer from incoherence and unfaithfulness. In this talk, I will first introduce a neural generation framework that separately tackles the challenges of content planning and surface realization, built upon large models. Experiment results show that the model is more effective in various tasks: constructing persuasive arguments, writing opinion articles, and generating news stories. It alleviates existing models' issue of producing bland and incorrect text, a result of lacking global planning. I then discuss how to extend the model to conduct dynamic content planning with mixed language models. Finally, I present our recent long document summarization work where efficient attentions are designed to handle more than 10k tokens while prior work can only process hundreds of words.

Schedule for Fall 2020

  • 9/11/2020 Lara Martin Dungeons and Discourse: Using Computational Storytelling to Look at Natural Language Use

Although we are currently riding a technological wave of personal assistants, many of these agents still struggle to communicate appropriately. Humans are natural storytellers, so it would be fitting if artificial intelligence could tell stories as well. Automated story generation is an area of AI research that aims to create agents that tell “good” stories. Previous story generation systems use planning and discrete symbols to create new stories, but these systems require a vast amount of knowledge engineering. The stories created by these systems are coherent, but only a finite set of stories can be generated. In contrast, very large neural language models, such as transformers, have made the headlines in the natural language processing community. Though impressive on the surface, these models begin to lose coherence over time. My research looks at various techniques of automated story generation, culminating in the blend of symbolic and neural approaches. In this talk, I show how a neuro-symbolic model can provide more interesting and coherent stories than those from solely neural or symbolic systems.

  • 9/25/2020 Ian Stewart Through the looking glass: what NLP can reveal about sociolinguistic variation

People adapt their language use in different social contexts to meet communicative needs: a person may use the word <going> with colleagues and <goin'> with their close friends. Sociolinguistics researchers investigate the systematic variation in language use across different contexts to determine the social meaning of variation, such as how people change their word choices for different audiences. While traditional sociolinguistics investigates variation in spoken language, computational sociolinguistics relies on natural language processing and statistical methods to investigate written language in online discussions. This talk will explore how NLP can help isolate sociolinguistic phenomena that would otherwise go understudied in spoken contexts, and more broadly how NLP can help social science research.

  • 10/9/2020 Maarten Sap Reasoning about Social Dynamics and Social Bias in Language

Humans easily make inferences to reason about the social and power dynamics of situations (e.g., stories about everyday interactions), but such reasoning is still a challenge for modern NLP systems. In this talk, I will address how we can make machines reason about social commonsense and social biases in text, and how this reasoning could be applied in downstream applications. In the first part, I will discuss PowerTransformer, our new unsupervised model for controllable debiasing of text through the lens of connotation frames of power and agency. Trained using a combined reconstruction and paraphrasing objective, this model can rewrite story sentences such that its characters are portrayed with more agency and decisiveness. After establishing its performance through automatic and human evaluations, we show how PowerTransformer can be used to mitigate gender bias in portrayals of movie characters. Then, I will introduce Social Bias Frames, a conceptual formalism that models the pragmatic frames in which people project social biases and stereotypes onto others to reason about biased or harmful implications in language. Using a new corpus of 150k structured annotations, we show that models can learn to reason about high-level offensiveness of statements, but struggle to explain why a statement might be harmful. I will conclude with future directions for better reasoning about social dynamics and social biases.

  • 10/23/2020 Greg Durett Addressing the Paradox of Flexible but Reliable Text Generation

Text generation is a paradox. We want our generation models to imitate patterns in training data, but also have the flexibility to work in new settings and behave in new ways. We want our models to say creative things, but also be reliable and factual with respect to their inputs. How can we achieve these dual goals with a single system? Our work focuses on generation systems that are controlled and assessed in fine-grained ways: control mechanisms can help enumerate diverse inputs, which are then assessed according to our desired criteria. I will describe work in paraphrasing and summarization where intermediate syntactic control mechanisms can make our models more expressive. I will then describe how to assess these models' outputs from the standpoint of factuality and grammaticality in a fine-grained way, localizing errors to individual words and dependency arcs. By achieving diversity and then enforcing quality, we can build systems that are simultaneously flexible and reliable enough to handle a range of generation settings.

  • 11/6/2020 William Wang Self-Supervised Language-and-Vision Reasoning

A key challenge for Artificial Intelligence research is to go beyond static observational data, and consider more challenging settings that involve dynamic actions and incremental decision-making. In this talk, I will introduce our recent work on visually-grounded language reasoning via the studies of vision-and-language navigation. In particular, I will emphasize three benefits of self-supervised learning that: (1) improves generalization in unseen environments; (2) creates counterfactuals to augment observational data; (3) enables transfer learning for challenging settings. I will conclude by briefly introducing other reasoning problems that my groups are working on recently.

  • 11/20/2020 Junjie Hu Cross-lingual Generalization, Alignment and Applications

While text on the web is an invaluable information source, this text is not available in large quantities for most languages in the world. It is even difficult to ask native speakers to annotate text in most languages for training individual machine learning models. With recent advances in multilingual machine learning models, we are able to transfer knowledge across languages in one single model, and apply the model to deal with text written in more than 100 languages. However, a benchmark that enables the comprehensive evaluation of such models on a diverse range of languages and tasks is still missing. In this talk, I will focus on analyzing cross-lingual generalization effects in these models, and propose methods to improve the performance in real applications. Specifically, I will start with introducing Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual models across 40 languages and 9 tasks. Secondly, I will show that a compact multilingual model trained on parallel translation text can align multilingual representations, performing on a par with or even better than much larger models on NLP tasks such as sentence classification, and retrieval. Finally, I will present our recent translation initiative for COVID-19, a multilingual translation benchmark in 35 different languages, in order to foster the development of tools and resources for improving access to information about COVID19 in these languages.


The GT NLP Seminar is organized by Amal Alabdulkarim, William Held, and Dr. Diyi Yang.