Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

CRAC 2021

CRAC 2021, the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, took place at EMNLP 2021 on November 11, in Punta Cana, Dominican Republic, in hybrid mode. Thank you for joining us!

About the workshop

Background: The last edition of Discourse Anaphora and Anaphor Resolution Colloquium series in 2011 scattered the research papers on anaphora/coreference resolution among very different fora until a common event in Computational Linguistics entirely dedicated to this area was revived in 2016 with the Coreference Beyond OntoNotes (CORBON 2016) workshop. After its second edition of CORBON in 2017 the focus of the workshop was broadened to cover all cases of computational modelling of reference, anaphora, and coreference – this is how CRAC 2018 was born. CRAC 2019 and CRAC 2020 followed the recent advances in application of word embeddings and deep neural networks to various NLP tasks. We believe that the task of cross-lingual coreference resolution can still benefit from this new perspective – and here we are!

Objectives: The aim of the workshop is to provide a forum where work on all aspects of computational work on anaphora resolution and annotation, including both coreference and types of anaphora such as bridging references resolution and discourse deixis, can be presented.

Topics

The workshop welcomed submissions describing theoretical and applied computational work on anaphora/coreference resolution. Topics of interest included but were not limited to:

  • coreference resolution for less-researched languages

  • annotation and interpretation of anaphoric relations, including relations other than identity coreference (e.g., bridging references)

  • investigation of difficult cases of anaphora and their resolution

  • coreference resolution in noisy data (e.g. in social media)

  • new applications of coreference resolution

  • Universal Anaphora.

Special Theme

A special theme of the 2021 edition of the workshop was the Universal Anaphora (UA) framework – a unified markup scheme applicable to multiple languages, reflecting common cross-linguistic understanding of reference-related phenomena. This theme was motivated by the recent successes of the development of Universal Dependencies. As with Universal Dependencies, the UA framework aims to facilitate referential analysis of similarities and idiosyncracies among typologically different languages, support comparative evaluation of anaphora resolution systems and enable comparative linguistic studies. In addition, the workshop includes a panel discussion on possible improvements of the initial framework based on the outcomes of the shared task.

Shared Task associated with the workshop

The joint CODI-CRAC shared task provided:

  • 3 tracks: resolution of anaphoric identity, resolution of bridging references, resolution of discourse deixis/abstract anaphora

  • New paradigm: two-stage shared task to facilitate community-wide visioning

  • Emphasis on less-studied forms of anaphora: Abstract and Bridging

  • New genre: Conversation

  • New computational techniques: transfer of learned representations across genres

  • New opportunities for interaction between communities: Discourse and Dialogue

  • New data set

Important dates

  • Workshop papers due: Aug 12, 2021

  • Notification of acceptance: Sep 5, 2021

  • Camera-ready papers due: Sep 19, 2021

  • Workshop date: Nov 11, 2021

ACCEPTED PAPERS

Long research papers:

Long survey papers:

Short research papers:

Findings Papers

5 papers accepted to Findings of EMNLP were presented at CRAC 2021:

SHARED TASK Papers

8 papers from the shared task were included in the special CODI-CRAC proceedings and presented at CODI 2021:

PARTICIPATION

EMNLP 2021 adopted the hybrid mode, i.e. both online and on-site participation was possible.

Invited talk by ido dagan

ABSTRACT

Day-to-day and professional information needs are most often met only by combining information scattered across multiple texts. Accordingly, many NLP and text processing applications confront multi-text information, including multi-document summarization, multi-hop and conversational QA, knowledge base extraction, text mining and more. Properly performing these tasks requires identifying a range of informational links and relations across texts. While identifying such relations is a common infrastructure ingredient, there is very little application-independent foundational research in this area, where even the basic task of cross-document coreference resolution is heavily underexplored. In this talk I will present our line of work on modeling multi-text information. I will first describe infrastructure contributions to cross-document coreference resolution: a more realistic evaluation protocol, a corpus-wide (rather than topic-based) event coreference dataset, and an efficient crowdsourcing annotation tool. Next, I will describe a novel Cross-document Language Model (CDLM), which is geared to model cross-text information and hence better supports multi-text tasks. Lastly, I will present new tasks that extend the scope of cross-text informational links to be modeled. These include hierarchical cross-document coreference for scientific concepts, aligning matching proposition spans, and aligning propositional predicate-argument relations which are represented by QA-SRL question-answer pairs. I will conclude by discussing prospects for the evolution of this research line, while suggesting the added value of the multi-text setting as a foundational text understanding touchstone.

BIOGRAPHY OF THE INVITED SPEAKER

Ido Dagan is a Professor at the Department of Computer Science at Bar-Ilan University, Israel, the founder of the Natural Language Processing (NLP) Lab at Bar-Ilan, the founder and head of the nationally funded Bar-Ilan University Data Science Institute, and a Fellow of the Association for Computational Linguistics (ACL). His interests are in applied semantic processing, focusing on textual inference, natural open semantic representations, consolidation and summarization of multi-text information, and interactive text summarization and exploration. Dagan and colleagues initiated and promoted textual entailment recognition (RTE, later aka NLI) as a generic empirical task. He was the President of the ACL in 2010 and served on its Executive Committee during 2008-2011. In that capacity, he led the establishment of the journal Transactions of the Association for Computational Linguistics, which became one of two premiere journals in NLP. Dagan received his B.A. summa cum laude and his Ph.D. (1992) in Computer Science from the Technion. He was a research fellow at the IBM Haifa Scientific Center (1991) and a Member of Technical Staff at AT&T Bell Laboratories (1992-1994). During 1998-2003 he was co-founder and CTO of FocusEngine and VP of Technology of LingoMotors, and has been regularly consulting in the industry. His academic research has involved extensive industrial collaboration, including funds from IBM, Google, Thomson-Reuters, Bloomberg, Intel and Facebook, as well as collaboration with local companies under funded projects of the Israel Innovation Authority.

workshop schedule

In 2021 the workshop was actually running for 1.5 days, starting on November 10 with the CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogues and continuing on November 11 with the CRAC programme. Below you can find PDF presentations of all papers. The papers are available in the ACL Anthology in separate CODI-CRAC and CRAC proceedings. You can find links to individual papers in Accepted papers, Findings papers and Shared task papers sections above.

Day 1: CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogues (at CODI 2021)


Session 1: Welcome

Session 2: System Talks (part 1)

10:30 – 11:00: Coffee Break

Session 3: System Talks (part 2)

Session 4: Analysis Papers

Session 5: Visioning and Discussion

Day 2: CRAC 2021

NOTE: Findings of EMNLP papers presented at CRAC 2021 are marked with an asterisk.

Opening Remarks

Paper Session 1

10:30 – 11:00: Coffee break

Invited Talk

12:00 – 13:00: Lunch break

Paper Session 2

14:30 – 14:45: Mini break

Universal Anaphora panel

Best Papers Session

16:15 – 16:45: Coffee break

Paper Session 3

Closing Remarks

  • 17:50 – 18:00: Closing remarks (Vincent Ng, Maciej Ogrodniczuk and Sameer Pradhan)

Program Committee

  • Antonio Branco, University of Lisbon

  • Arie Cattan, Bar-Ilan University

  • Jackie Chi Kit Cheung, McGill University

  • Dan Cristea, Alexandru Ioan Cuza University of Iasi

  • Stephanie Dipper, Ruhr-University Bochum

  • Elisa Ferracane, Abridge

  • Yulia Grishina, Amazon

  • Christian Hardmeier, IT University of Copenhagen

  • Lars Hellan, Norwegian University of Science and Technology

  • Veronique Hoste, Ghent University

  • Yufang Hou, IBM

  • Mohit Iyyer, University of Massachusetts Amherst

  • Sobha Lalitha Devi, AU-KBC Research Center, Anna University of Chennai

  • Ekaterina Lapshinova-Koltunski, Saarland University

  • Sharid Loáiciga, University of Potsdam

  • Costanza Navaretta, University of Copenhagen

  • Anna Nedoluzhko, Charles University in Prague

  • Michal Novák, Charles University in Prague

  • Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences

  • Constantin Orasan, University of Surrey

  • Sameer Pradhan, University of Pennsylvania and cemantix

  • Marta Recasens, Google

  • Manfred Stede, University of Potsdam

  • Don Tuggener, Zurich University of Applied Sciences

  • Yannick Versley, Amazon

  • Bonnie Webber, University of Edinburgh

  • Juntao Yu, Queen Mary University of London

  • Yilun Zhu, Georgetown University

  • Heike Zinsmeister, University of Hamburg

Organizing Committee

  • Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences

  • Sameer Pradhan, University of Pennsylvania and cemantix

  • Yulia Grishina, Amazon

  • Vincent Ng, University of Texas at Dallas

  • Massimo Poesio, Queen Mary University of London