ELLIS NLP2021

Keynote Talk

Graham Neubig: How Can We Know What and When Language Models Know?

Abstract: One recent remarkable finding in natural language processing is that by training a model to simply predict words in a sentence, language models can learn a significant amount of world knowledge that would traditionally be expressed by symbolic knowledge bases. In this presentation, I will present research regarding two questions. First: how can we most effectively elicit this knowledge from language models by designing textual prompts that allow the model to predict particular facts? Second: how can we best know when these predictions are accurate, and when they are no better than a random guess? I will also try to discuss the potential of this rapidly growing research paradigm, and point to some open research questions in the future.

Bio: Graham Neubig is an associate professor at the Language Technologies Institute of Carnegie Mellon University. His work focuses on natural language processing, specifically multi-lingual models that work in many different languages, and natural language interfaces that allow humans to communicate with computers in their own language. Much of this work relies on machine learning, and he is also active in developing methods and algorithms for machine learning over natural language data. He publishes regularly in the top venues in natural language processing, machine learning, and speech, and his work has won awards at EMNLP 2016, EACL 2017, and NAACL 2019.

Link: TBA

Slides: TBA

Invited Talks

Raquel Fernandez: Human production strategies for neural language generation

Abstract: Progress on language generation has experienced a huge boost with the advent of large models trained on huge amounts of text. However, this kind of language modelling will only take us that far. Most natural language use is driven by communicative goals and is often grounded both in the conversational context and in extralinguistic information. Can we take inspiration from human production strategies in situated environments to drive forward natural language generation models? I will argue that yes, we can, and present a few examples of recent and ongoing research carried out within my group that follow this research programme.

Bio: Raquel Fernández is Associate Professor at the University of Amsterdam, where she leads the Dialogue Modelling Group. Her interests include computational semantics and pragmatics, dialogue interaction, and visually-grounded language processing. She received her PhD in Computational Linguistics from King's College London and has held research positions at the University of Potsdam and at Stanford University. Over her career, she has been awarded several personal fellowships and is currently the recipient of an ERC Consolidator Grant. She is an active member of the NLP community, having been part of the editorial board of the Computational Linguistics journal, the standing reviewer team of TACL, as well as chair of SIGdial 2016 and CoNLL 2020, among others.

Link: TBA

Slides: TBA

Barbara Plank: What to do about Human Disagreement in Natural Language Processing?

Abstract: Disagreement between human annotators is common in natural language interpretation tasks. However, the state of the art is to neglect human disagreement. NLP models are inferred from datasets aggregated to a single "ground truth" interpretation. Prior work in both Natural Language Processing and Computer Vision has shown that not all disagreement is "noise". Disagreement can carry valuable information. This talk embraces disagreement in three acts: data, modelling and evaluation, and will discuss some early and recent advances on human disagreement in NLP.

Bio: Barbara Plank is Professor in the Computer Science Department at ITU (IT University of Copenhagen). She is also the Head of the MSc Program in Data Science. She received her PhD in Computational Linguistics from the University of Groningen. Her research interests focus on Natural Language Processing, in particular transfer learning and adaptation, learning from beyond the text, and in general learning under limited supervision and fortuitous data sources. She (co)-organised several workshops and international conferences, amongst which the PEOPLES workshop (since 2016) and the first European NLP Summit (EurNLP 2019). Barbara was general chair of the 22nd Northern Computational Linguistics conference (NoDaLiDa 2019) and workshop chair for ACL in 2019. Barbara is member of the advisory board of the European Association for Computational Linguistics (EACL) and vice-president of the Northern European Association for Language Technology (NEALT).

Link: TBA

Slides: TBA

Stefan Riezler: Validity, Reliability, and Significance: A Model-Based Approach to Empirical Methods for NLP

Abstract: Validity, reliability, and significance are methodological pillars of empirical science, however, they are easily neglected in the race for improved state-of-the-art results on benchmark data. We discuss exemplary violations of these principles, and present model-based statistical tests to assess validity, reliability, and significance of machine learning predictions in NLP. Our tests are based on the parameters of interpretable and identifiable generalized additive models (GAMs) and linear mixed effects models (LMEMs) that are trained on predictions of machine learning models. We present a validity test based on GAMs that allows detecting circular features that circumvent learning. Furthermore, we present reliability coefficients using variance decomposition based on random effect parameters of LMEMs. Lastly, we present significance tests that compare the likelihood ratios of nested LMEM models trained on the performance evaluation data of two machine learning models.

Bio: Stefan Riezler is full professor at the Department of Computational Linguistics at Heidelberg University, Germany, since 2010, and also co-opted in Informatics at the Department of Mathematics and Computer Science.

He received his PhD (with distinction) in Computational Linguistics from the University of Tübingen in 1998, conducted post-doctoral work at Brown University in 1999, and spent a decade in industry research (Xerox PARC, Google Research). His research focus is on interactive machine learning for natural language processing problems, especially for the application areas of cross-lingual information retrieval and statistical machine translation. He is engaged as editorial board member of the main journals of the field --- Computational Linguistics and Transactions of the Association for Computational Linguistics --- and is a regular member of the program committee of various natural language processing and machine learning conferences. He has published more than 100 journal and conference papers in these areas. He also conducts interdisciplinary research as member of the Interdisciplinary Center for Scientific Computing (IWR), for example, on the topic of early prediction of sepsis using machine learning and natural language processing techniques.

Link: TBA

Slides: TBA

Thomas Wolf: BigScience: Building a Large-Hadron-Collider in AI and NLP

Abstract: The acceleration in Artificial Intelligence (AI) and Natural Language Processing (NLP) will have a fundamental impact on society, as these technologies are at the core of the tools we use on a daily basis. A considerable part of this effort currently stems in NLP from training increasingly larger language models on increasingly larger quantities of texts.

Unfortunately, the resources necessary to create the best-performing models are found mainly in industry rather than academia. This unbalance on a transformative technology poses problems, from a research advancement, environmental, ethical and societal perspective. The BigScience project aims to demonstrate another way of creating, studying, and sharing large language models and large research artifacts in general within the AI/NLP research communities. BigScience takes inspiration from scientific creation schemes existing in other scientific fields, such as CERN and the LHC in particle physics, in which open scientific collaborations facilitate the creation of large-scale artifacts useful for the entire research community.

Gathering a much larger research community around the creation of these artifacts makes it possible to consider in advance the many research questions surrounding large language models (capabilities, limitations, potential improvements, bias, ethics, environmental impact, general AI/cognitive research landscape) that will be interesting to answer with the created artifacts and to reflect and prepare the tools needed to answer as many of these questions as possible.

The BigScience project is seen as a proposal for an alternative way to conduct large scale science projects in a more international and inclusive way. Beyond the research artifacts created and shared, the project’s success will ultimately be measured by its long-term impact on the field: by proposing another way for large-scale collaborations inspired by the successes in fields like particle physics.

Bio: Thomas Wolf is co-founder and Chief Science Officer of HuggingFace. His team is on a mission to catalyze and democratize NLP research. Prior to HuggingFace, Thomas gained a Ph.D. in physics, and later a law degree. He worked as a physics researcher and a European Patent Attorney.