CRAC 2021, the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, took place at EMNLP 2021 on November 11, in Punta Cana, Dominican Republic, in hybrid mode.

Background: The last edition of Discourse Anaphora and Anaphor Resolution Colloquium series in 2011 scattered the research papers on anaphora/coreference resolution among very different fora until a common event in Computational Linguistics entirely dedicated to this area was revived in 2016 with the Coreference Beyond OntoNotes (CORBON 2016) workshop. After its second edition of CORBON in 2017 the focus of the workshop was broadened to cover all cases of computational modelling of reference, anaphora, and coreference – this is how CRAC 2018 was born. CRAC 2019 and CRAC 2020 followed the recent advances in application of word embeddings and deep neural networks to various NLP tasks. We believe that the task of cross-lingual coreference resolution can still benefit from this new perspective – and here we are!

Objectives: The aim of the workshop is to provide a forum where work on all aspects of computational work on anaphora resolution and annotation, including both coreference and types of anaphora such as bridging references resolution and discourse deixis, can be presented.


The workshop welcomed submissions describing theoretical and applied computational work on anaphora/coreference resolution. Topics of interest included but were not limited to:

  • coreference resolution for less-researched languages

  • annotation and interpretation of anaphoric relations, including relations other than identity coreference (e.g., bridging references)

  • investigation of difficult cases of anaphora and their resolution

  • coreference resolution in noisy data (e.g. in social media)

  • new applications of coreference resolution

  • Universal Anaphora.

A special theme of the 2021 edition of the workshop was the Universal Anaphora (UA) framework – a unified markup scheme applicable to multiple languages, reflecting common cross-linguistic understanding of reference-related phenomena. This theme was motivated by the recent successes of the development of Universal Dependencies. As with Universal Dependencies, the UA framework aims to facilitate referential analysis of similarities and idiosyncracies among typologically different languages, support comparative evaluation of anaphora resolution systems and enable comparative linguistic studies. In addition, the workshop includes a panel discussion on possible improvements of the initial framework based on the outcomes of the shared task.

The joint CODI-CRAC shared task provided:

  • 3 tracks: resolution of anaphoric identity, resolution of bridging references, resolution of discourse deixis/abstract anaphora

  • New paradigm: two-stage shared task to facilitate community-wide visioning

  • Emphasis on less-studied forms of anaphora: Abstract and Bridging

  • New genre: Conversation

  • New computational techniques: transfer of learned representations across genres

  • New opportunities for interaction between communities: Discourse and Dialogue

  • New data set

EMNLP 2021 adopted the hybrid mode, i.e. both online and on-site participation was possible.

Invited talk by ido dagan


Day-to-day and professional information needs are most often met only by combining information scattered across multiple texts. Accordingly, many NLP and text processing applications confront multi-text information, including multi-document summarization, multi-hop and conversational QA, knowledge base extraction, text mining and more. Properly performing these tasks requires identifying a range of informational links and relations across texts. While identifying such relations is a common infrastructure ingredient, there is very little application-independent foundational research in this area, where even the basic task of cross-document coreference resolution is heavily underexplored. In this talk I will present our line of work on modeling multi-text information. I will first describe infrastructure contributions to cross-document coreference resolution: a more realistic evaluation protocol, a corpus-wide (rather than topic-based) event coreference dataset, and an efficient crowdsourcing annotation tool. Next, I will describe a novel Cross-document Language Model (CDLM), which is geared to model cross-text information and hence better supports multi-text tasks. Lastly, I will present new tasks that extend the scope of cross-text informational links to be modeled. These include hierarchical cross-document coreference for scientific concepts, aligning matching proposition spans, and aligning propositional predicate-argument relations which are represented by QA-SRL question-answer pairs. I will conclude by discussing prospects for the evolution of this research line, while suggesting the added value of the multi-text setting as a foundational text understanding touchstone.


Ido Dagan is a Professor at the Department of Computer Science at Bar-Ilan University, Israel, the founder of the Natural Language Processing (NLP) Lab at Bar-Ilan, the founder and head of the nationally funded Bar-Ilan University Data Science Institute, and a Fellow of the Association for Computational Linguistics (ACL). His interests are in applied semantic processing, focusing on textual inference, natural open semantic representations, consolidation and summarization of multi-text information, and interactive text summarization and exploration. Dagan and colleagues initiated and promoted textual entailment recognition (RTE, later aka NLI) as a generic empirical task. He was the President of the ACL in 2010 and served on its Executive Committee during 2008-2011. In that capacity, he led the establishment of the journal Transactions of the Association for Computational Linguistics, which became one of two premiere journals in NLP. Dagan received his B.A. summa cum laude and his Ph.D. (1992) in Computer Science from the Technion. He was a research fellow at the IBM Haifa Scientific Center (1991) and a Member of Technical Staff at AT&T Bell Laboratories (1992-1994). During 1998-2003 he was co-founder and CTO of FocusEngine and VP of Technology of LingoMotors, and has been regularly consulting in the industry. His academic research has involved extensive industrial collaboration, including funds from IBM, Google, Thomson-Reuters, Bloomberg, Intel and Facebook, as well as collaboration with local companies under funded projects of the Israel Innovation Authority.

In 2021 the workshop was actually running for 1.5 days, starting on November 10 with the CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogues and continuing on November 11 with the CRAC programme. Below you can find PDF presentations of all papers. The papers are available in the ACL Anthology in separate CODI-CRAC and CRAC proceedings. You can find links to individual papers in Accepted papers, Findings papers and Shared task papers sections above.

