Invited Speakers

Junyi Jessy Li
(University of Texas, Austin)

Modeling Discourse as Questions and Answers

Discourse structures characterize inter-sentential relationships and textual organization, enhancing high-level text comprehension. However, deriving these structures relies on annotated data that are linguistically sophisticated and thus challenging to obtain. In the wake of large language models, we introduce a paradigm shift that views discourse structure through the lens of free-form question answering, aligning with the linguistic framework of Questions Under Discussion (QUD). By considering each sentence as the answer to an implicit question elicited from context, we construct two tasks under QUD: (1) discourse comprehension, i.e., given QUD questions that are typically curiosity-driven and open-ended, locate their answer sentences in a document. (2) QUD dependency parsing, i.e., given an answer sentence, generate QUD questions and identify where in prior context these questions are grounded. We conclude with potential new applications enabled by the QUD framework that highlight its versatility, and natural fit with LLMs and human interactions alike.

Bio: Junyi Jessy Li is an assistant professor in the Linguistics Department at The University of Texas at Austin, where she works on computational linguistics and natural language processing. Her work focuses on discourse processing, text generation, and language pragmatics in social contexts. She received her Ph.D. in Computer and Information Science from the University of Pennsylvania. She is a recipient of the NSF CAREER Award, an ACL Outstanding Paper Award (2022), an ACM SIGSOFT Distinguished Paper Award (2019), an Area Chair Favorite honor at COLING (2018), and a Best Paper nomination at SIGDIAL (2016).

Hinrich Schütze

(University of Munich)

Glot500: Creating and Evaluating a Language Model for 500 Languages

Most work on large language models (LLMs) has focused on what we call "vertical" scaling: making LLMs even better for a relatively small number of high-resource languages. We address "horizontal" scaling instead: extending LLMs to a large subset of the world's languages, focusing on low-resource languages. Our Glot500-m model is trained on more than 500 languages, many of which are not covered by any other language model. But how do we know that the model has actually learned these 500 languages? Broad low-resource evaluation turns out to be a difficult problem in itself and we tried to innovate in several ways. One issue we were not able to solve is that parts of our evaluation standard cannot be distributed due to copyright restrictions. We also find that attributing good/bad performance to the so-called curse of multilinguality is naive and there is in fact also a "boon of multilinguality." We have released Glot500-m and are in the process of making our training corpus Glot500-c publicly available.

Bio: Hinrich Schütze is Chair of Computational Linguistics and co-director of the Center for Language and Information Processing at LMU Munich. Ever since starting his PhD in the early 1990s, Hinrich's research interests have been at the interface of linguistics, cognitive science, neural networks and computer science. Recent examples include learning with natural language instructions, multilingual representation learning for low-resource languages, computational morphology and neurosymbolic approaches. Hinrich is coauthor of two well-known textbooks (Foundations of Statistical Natural Language Processing and Introduction to Information Retrieval) and a fellow of HessianAI, ELLIS (the European Laboratory for Learning and Intelligent Systems), and ACL.

Danushka Bollegala

(University of Liverpool)

Time Travel with Large Language Models

The meaning associated with a word is a dynamic phenomenon that varies with time. New meanings are constantly assigned to existing words, while new words are proposed to describe novel concepts. Despite this dynamic nature of lexical semantics, most NLP systems remain agnostic to the temporal effects of meaning change. For example, Large Language Models (LLMs) that act as the backbone of modern-day NLP systems are often trained once, using a fixed snapshot of a corpus collected at some specific point in time. It is both costly and time consuming to retrain LLMs from scratch on recent data. On the other hand, if we can somehow predict which words have their meanings altered over time, we could perform on-demand fine-tuning of LLMs to reflect those changes in a timely manner. In this talk, I will first review various techniques that have been proposed in NLP research to predict the semantic change of words over time. I will then describe a lightweight prompt-based approach for the temporal adaptation of LLMs.