CRAC 2022

Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

CRAC 2022, the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference, was held at COLING 2022 (in hybrid mode) on October 16–17, in Gyeongju, Republic of Korea and online.

About the workshop series

Background: The end of Discourse Anaphora and Anaphor Resolution Colloquium series in 2011 scattered the research papers on coreference resolution among different fora until a common event in Computational Linguistics entirely dedicated to this area was revived in 2016 with the Coreference Beyond OntoNotes (CORBON) workshop co-located with NAACL and in 2017 with EACL. In 2018 its focus was broadened to cover all cases of computational modelling of reference, anaphora, and coreference and the CRAC workshop was born. It was held in the next years at NAACL 2018 and 2019, COLING 2020 and EMNLP 2021.

Objectives: The aim of the workshop is to provide a forum where work on all aspects of computational work on anaphora resolution and annotation, including both coreference and types of anaphora such as bridging references resolution and discourse deixis, can be presented.

Topics

The workshop welcomes submissions describing theoretical and applied computational work on anaphora/coreference resolution. Topics of interest include but are not limited to:

  • coreference resolution for less-researched languages

  • annotation and interpretation of anaphoric relations, including relations other than identity coreference (e.g., bridging references)

  • investigation of difficult cases of anaphora and their resolution

  • coreference resolution in noisy data (e.g. in social media)

  • new applications of coreference resolution

  • Universal Anaphora.

Important dates

  • Workshop papers due: Aug 1, 2022

  • Notification of acceptance: Sep 1, 2022

  • Camera-ready papers due: Sep 12, 2022

  • Workshop date: Oct 16–17, 2022

Accepted papers

Long papers

Short papers

CRAC shared task papers

CODI-CRAC shared task papers

Invited talks

Sharid Loáiciga: Bringing together Anaphora Resolution and Linguistic Theory

Early work on anaphora resolution was intrisically connected to linguistic theories of discourse interpretation. In later years, with the adoption of machine learning methods, great progress has been achieved in anaphora resolution as an independent task. This success has been even greater with current deep neural networks methods. However, the focus has been much more on solving the task than on acquiring new linguistic insights concerning anaphora resolution. In this talk, I present two ways in which we can gain and also utilize linguistic insights for anaphora resolution. First, I present experiments combining psycholinguistics with large-scale NLP tools. These show some of the complexities of hypothesis testing with corpus data. Second, I present the annotation of a multimodal corpus with anaphora information. The combination of images and text presents a unique opportunity to test our annotation schemes (i.e., our current linguistic knowledge) and to explore new ways to annotate what is unaccounted for in the same annotation schemes (i.e., new linguistic insights).

Sharid Loáiciga is a Researcher in the Department of Philosophy, Linguistics and Theory of Science at the University of Gothenburg, Sweden. She is also the Associate Director of CLASP (Centre for Linguistic Theory and Studies in Probability) in the same department. Her research is focused on discourse, and in particular on understanding human and machine interpretation of referring expressions. In recent work, she developed techniques for combining psycholinguistic methods with large-scale resources, and studied the discourse knowledge of pre-trained language models.

Massimo Poesio, Lori Levin: Annotating anaphoric reference in dialogue: the CODI/CRAC 2002 Shared Task corpus

(joint work with Maris Camilleri, Paloma Carretero Garcia, Taiqi He, Mark-Cristoph Mueller, Carolyn Rose, Michael Strube, Juntao Yu and Katherine Zhang)

Most current research on anaphoric reference focuses on news text, in particular written, and on identity anaphora (coreference). This is largely due to the lack of annotated datasets of a sufficient size to train and evaluate models for other genres, and other types of anaphoric reference. Arguably the most important among the understudied genres is conversational language in dialogue. Anaphora resolution in dialogue requires systems to handle grammatically incorrect language suffering from disfluencies and mentions jointly created across utterances (Poesio & Rieser, 2010) or whose function is to establish common ground rather than refer (Clark & Brennan, 1990; Heeman & Hirst, 1995). Dialogue involves much more deictic reference, vaguer anaphoric and discourse deictic reference, speaker grounding of pronouns and long-distance conversation structure. These complexities are normally absent from news or Wikipedia articles, which constitute the bulk of current datasets for coreference resolution (Poesio et al., to appear).

The series of CODI/CRAC Shared Tasks in Anaphora Resolution in Dialogue (Khosla et al., 2021; Yu et al., 2022) was organized to address this issue by creating datasets that our community could use to study anaphoric reference in different types of conversational setups, and to tackle less studied forms of anaphoric reference such as bridging reference or discourse deixis. The annotated corpus created for the CODI/CRAC series consists of conversations from four well-known conversational datasets: the AMI corpus (Carletta, 2006), the LIGHT corpus (Urbanek et al., 2019), the PERSUASION corpus (Wang et al., 2019) and SWITCHBOARD (Godfrey et al., 1992). These documents were annotated according to the annotation scheme for the ARRAU 3 corpus, which includes guidelines for identifying discontinuous markables and annotating split antecedent plurals, bridging reference, and discourse deixis. For this second edition, we created new test sets, but also systematically checked the data annotated for the first edition. As this annotation effort also involved annotators that had not been previously involved in the ARRAU 3 annotation, this work also involved extensive discussions about the scheme; new reliability tests of the annotation scheme were carried out, and the annotation guidelines were substantially revised.

Massimo Poesio is a full professor in Computational Linguistics at the School of Electronic Engineering and Computer Science, Queen Mary University of London, and a member of the University's Cognitive Science and Games and AI research groups. He is also a Fellow of the Turing Institute, a supervisor in the IGGI Doctoral training centre in Intelligent Games and Game Intelligence and the Wellcome Trust's PhD programme in Health Data in Practice. He is co-founder and have been Associate Editor of Dialogue and Discourse since its foundation and he recently became co-editor of the Computational and Mathematical section of Language and Linguistics Compass.

Lori Levin has a Ph.D. in linguistics and has been working in the fields of computational linguistics and natural language processing since the 1980's, where she uses her expertise in linguistics in the annotation of corpora and the design of meaning representations. She specializes in NLP for low-resource and endangered languages. She is the co-founder and co-chair of the North American Computational Linguistics Open competition.

Juntao Yu, Michal Novák: The recent developments in Universal Anaphora Scorer

The Universal Anaphora initiative aims to push forward the state of the art in anaphora and anaphora resolution by expanding the aspects of anaphoric interpretation which are or can be reliably annotated in anaphoric corpora, producing unified standards to annotate and encode these annotations, deliver datasets encoded according to these standards, and developing methods for evaluating models carrying out this type of interpretation. Such expansion of the scope of anaphora resolution requires a comparable expansion of the scope of the scorers used to evaluate this work. Last year, we introduce an extended version of the Reference Coreference Scorer (Pradhan et al., 2014) that can be used to evaluate identity anaphora resolution (including singletons, split-antecedents), bridging reference resolution, non-referring expressions and discourse deixis. The scorer has been used in the two recent CODI-CRAC Shared Tasks on Anaphora Resolution in Dialogues. Recently, an extension of the UA scorer that supports also discontinuous markables has been used by Novák et al (2022) in the CRAC 2022 Shared Task on Multilingual Coreference Resolution. In this talk, we will introduce the details about the scorer on scoring the different aspects of anaphora resolutions and how has it been used in recently shared tasks. In addition, we will also discuss the work in progress for the scorers such as mention overlap ratio, anaphor-decomposable score and the adaptation for CRAFT shared task.

Juntao Yu is a Lecturer at the School of Computer Science and Electronic Engineering, University of Essex. Before joining Essex, He was a post-doctoral researcher at the Queen Mary University of London, working with Professor Massimo Poesio on his five-year DALI project (Disagreements and Language Interpretation, ERC-2015-AdG). He did his PhD at the University of Birmingham, working on out-of-domain dependency parsing supervised by Dr Bernd Bohnet. His research interests include Deep Learning for NLP, Information Extraction, Coreference Resolution, Conversational AI, Dependency Parsing, Domain Adaptation, Semi-supervised Learning, and Multi-task Learning.

Michal Novák is a researcher at the Institute of Formal and Applied Linguistics at the Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic. He received his PhD from the same university, exploring coreference and its resolution methods from cross-lingual perspective. Recently, he has co-authored the CorefUD dataset, which in its latest release harmonizes coreference of 17 corpora in 11 languages under the same annotation scheme. Besides coreference, his research also focuses on machine translation. He has participated on the Czech-Ukrainian translation system within the Charles Translator project, which aims to narrow the communication gap between Ukrainian refugees and other people in the Czech Republic.

Workshop schedule

Day 1: October 16

Opening remarks

Invited talk

Paper session 1

Coffee break

Paper session 2

Closing remarks

  • 18:00 – 18:10: Closing remarks (Vincent Ng, Maciej Ogrodniczuk and Sameer Pradhan)

Day 2: October 17

CRAC shared task session

Coffee break

CRAC shared task invited talk

Panel discussion

  • 12:00 – 12:30: Universal Anaphora (Sameer Pradhan)

Lunch break

CODI-CRAC joint shared task session

CODI-CRAC joint shared task Invited talk

CODI-CRAC joint shared task discussion and closing

  • 16:45 17:45: Open discussion

  • 17:45 18:00: Closing remarks

Program Committee

  • Antonio Branco (University of Lisbon)

  • Arie Cattan (Bar-Ilan University)

  • Haixia Chai (Heidelberg University)

  • Stephanie Dipper (Ruhr-University Bochum)

  • Yansong Feng (Peking University)

  • Yulia Grishina (Amazon)

  • Christian Hardmeier (IT University of Copenhagen)

  • Lars Hellan (Norwegian University of Science and Technology)

  • Veronique Hoste (Ghent University)

  • Ruihong Huang (Texas A&M University)

  • Sobha Lalitha Devi (AU-KBC Research Center, Anna University of Chennai)

  • Loic De Langhe (Ghent University)

  • Ekaterina Lapshinova-Koltunski (Saarland University)

  • Sharid Loáiciga (University of Gothenburg)

  • Costanza Navaretta (University of Copenhagen)

  • Anna Nedoluzhko (Charles University in Prague)

  • Michal Novák (Charles University in Prague)

  • Massimo Poesio (Queen Mary University of London)

  • Marta Recasens (Google)

  • Carolyn Rosé (Carnegie Mellon University)

  • Nobuhiro Ueda (Kyoto University)

  • Bonnie Webber (University of Edinburgh)

  • Yaqin Yang (Brandeis University)

  • Juntao Yu (University of Essex)

  • Yilun Zhu (Georgetown University)

  • Heike Zinsmeister (University of Hamburg)

Organizing Committee

  • Maciej Ogrodniczuk (Institute of Computer Science, Polish Academy of Sciences)

  • Sameer Pradhan (University of Pennsylvania and cemantix)

  • Anna Nedoluzhko (Charles University in Prague)

  • Vincent Ng (University of Texas at Dallas)

  • Massimo Poesio (Queen Mary University of London)