Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

CRAC 2021

CRAC 2021, the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference, took place at EMNLP 2021 on November 11, in Punta Cana, Dominican Republic, in hybrid mode. Thank you for joining us!

About the workshop

Background: The last edition of Discourse Anaphora and Anaphor Resolution Colloquium series in 2011 scattered the research papers on anaphora/coreference resolution among very different fora until a common event in Computational Linguistics entirely dedicated to this area was revived in 2016 with the Coreference Beyond OntoNotes (CORBON 2016) workshop. After its second edition of CORBON in 2017 the focus of the workshop was broadened to cover all cases of computational modelling of reference, anaphora, and coreference – this is how CRAC 2018 was born. CRAC 2019 and CRAC 2020 followed the recent advances in application of word embeddings and deep neural networks to various NLP tasks. We believe that the task of cross-lingual coreference resolution can still benefit from this new perspective – and here we are!

Objectives: The aim of the workshop is to provide a forum where work on all aspects of computational work on anaphora resolution and annotation, including both coreference and types of anaphora such as bridging references resolution and discourse deixis, can be presented.

Topics

The workshop welcomed submissions describing theoretical and applied computational work on anaphora/coreference resolution. Topics of interest included but were not limited to:

coreference resolution for less-researched languages
annotation and interpretation of anaphoric relations, including relations other than identity coreference (e.g., bridging references)
investigation of difficult cases of anaphora and their resolution
coreference resolution in noisy data (e.g. in social media)
new applications of coreference resolution
Universal Anaphora.

Special Theme

A special theme of the 2021 edition of the workshop was the Universal Anaphora (UA) framework – a unified markup scheme applicable to multiple languages, reflecting common cross-linguistic understanding of reference-related phenomena. This theme was motivated by the recent successes of the development of Universal Dependencies. As with Universal Dependencies, the UA framework aims to facilitate referential analysis of similarities and idiosyncracies among typologically different languages, support comparative evaluation of anaphora resolution systems and enable comparative linguistic studies. In addition, the workshop includes a panel discussion on possible improvements of the initial framework based on the outcomes of the shared task.

Shared Task associated with the workshop

The joint CODI-CRAC shared task provided:

3 tracks: resolution of anaphoric identity, resolution of bridging references, resolution of discourse deixis/abstract anaphora
New paradigm: two-stage shared task to facilitate community-wide visioning
Emphasis on less-studied forms of anaphora: Abstract and Bridging
New genre: Conversation
New computational techniques: transfer of learned representations across genres
New opportunities for interaction between communities: Discourse and Dialogue
New data set

Important dates

Workshop papers due: Aug 12, 2021
Notification of acceptance: Sep 5, 2021
Camera-ready papers due: Sep 19, 2021
Workshop date: Nov 11, 2021

ACCEPTED PAPERS

Long research papers:

CoreLM: Coreference-aware Language Model Fine-Tuning (Nikolaos Stylianou and Ioannis Vlahavas)
Improving Span Representation for Domain-adapted Coreference Resolution (Nupoor Gandhi, Anjalie Field and Yulia Tsvetkov)
Data Augmentation Methods for Anaphoric Zero Pronouns (Abdulrahman Aloraini and Massimo Poesio)
Coreference by Appearance: Visually Grounded Event Coreference Resolution (Liming Wang, Shengyu Feng, Xudong Lin, Manling Li, Heng Ji and Shih-Fu Chang)
DramaCoref: A Hybrid Coreference Resolution System for German Theater Plays (Janis Pagel and Nils Reiter)
Event and Entity Coreference using Trees to Encode Uncertainty in Joint Decisions (Nishant Yadav, Nicholas Monath, Rico Angell and Andrew McCallum)
FantasyCoref: Coreference Resolution on Fantasy Literature Through Omniscient Writer’s Point of View (Sooyoun Han, Sumin Seo, Minji Kang, Jongin Kim, Nayoung Choi, Min Song and Jinho D. Choi)
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature (Andreas van Cranenburgh, Esther Ploeger, Frank van den Berg and Remi Thüss)
Anatomy of OntoGUM—Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms (Yilun Zhu, Sameer Pradhan and Amir Zeldes)

Long survey papers:

A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution in English (Hongming Zhang, Xinran Zhao and Yangqiu Song)
Coreference Resolution for the Biomedical Domain: A Survey (Pengcheng Lu and Massimo Poesio)

Short research papers:

Understanding Mention Detector-Linker Interaction in Neural Coreference Resolution (Zhaofeng Wu and Matt Gardner)
On Generalization in Coreference Resolution (Shubham Toshniwal, Patrick Xia, Sam Wiseman, Karen Livescu and Kevin Gimpel)
Lazy Low-Resource Coreference Resolution: a Study on Leveraging Black-Box Translation Tools (Semere Kiros Bitew, Johannes Deleu, Chris Develder and Thomas Demeester)
Resources and Evaluations for Danish Entity Resolution (Maria Barrett, Hieu Trong Lam, Martin Wu, Ophélie Lacroix, Barbara Plank and Anders Søgaard)
Exploring Pre-Trained Transformers and Bilingual Transfer Learning for Arabic Coreference Resolution (Bonan Min)

Findings Papers

5 papers accepted to Findings of EMNLP were presented at CRAC 2021:

End-to-end Neural Information Status Classification (Yufang Hou)
CDLM: Cross-Document Language Modeling (Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew Peters, Arie Cattan and Ido Dagan)
Coreference-aware Surprisal Predicts Brain Response (Evan Jaffe, Byung-Doh Oh and William Schuler)
Do UD Trees Match Mention Spans in Coreference Annotations? (Martin Popel, Zdeněk Žabokrtský, Anna Nedoluzhko, Michal Novák and Daniel Zeman)
Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation (Shahar Levy, Koren Lazar and Gabriel Stanovsky)

SHARED TASK Papers

8 papers from the shared task were included in the special CODI-CRAC proceedings and presented at CODI 2021:

The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue (Sopan Khosla, Juntao Yu, Ramesh Manuvinakurike, Vincent Ng, Massimo Poesio, Michael Strube and Carolyn Rosé)
Neural Anaphora Resolution in Dialogue (Hideo Kobayashi, Shengjie Li and Vincent Ng)
Anaphora Resolution in Dialogue: Description of the DFKI-TalkingRobots System for the CODI-CRAC 2021 Shared-Task (Tatiana Anikina, Cennet Oguz, Natalia Skachkova, Siyu Tao, Sharmila Upadhyaya, Ivana Kruijff-Korbayova)
The Pipeline Model for Resolution of Anaphoric Reference and Resolution of Entity Reference (Hongjin Kim, Damrin Kim and Harksoo Kim)
An End-to-End Approach for Full Bridging Resolution (Joseph Renner, Priyansh Trivedi, Gaurav Maheshwari, Rémi Gilleron and Pascal Denis)
Adapted End-to-End Coreference Resolution System for Anaphoric Identities in Dialogues (Liyan Xu and Jinho D. Choi)
Anaphora Resolution in Dialogue: Cross-Team Analysis of the DFKI-TalkingRobots Team Submissions for the CODI-CRAC 2021 Shared-Task (Natalia Skachkova, Cennet Oguz, Tatiana Anikina, Siyu Tao, Sharmila Upadhyaya and Ivana Kruijff-Korbayova)
The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis Resolution in Dialogue: A Cross-Team Analysis (Shengjie Li, Hideo Kobayashi and Vincent Ng)

PARTICIPATION

EMNLP 2021 adopted the hybrid mode, i.e. both online and on-site participation was possible.

Invited talk by ido dagan

ABSTRACT

Day-to-day and professional information needs are most often met only by combining information scattered across multiple texts. Accordingly, many NLP and text processing applications confront multi-text information, including multi-document summarization, multi-hop and conversational QA, knowledge base extraction, text mining and more. Properly performing these tasks requires identifying a range of informational links and relations across texts. While identifying such relations is a common infrastructure ingredient, there is very little application-independent foundational research in this area, where even the basic task of cross-document coreference resolution is heavily underexplored. In this talk I will present our line of work on modeling multi-text information. I will first describe infrastructure contributions to cross-document coreference resolution: a more realistic evaluation protocol, a corpus-wide (rather than topic-based) event coreference dataset, and an efficient crowdsourcing annotation tool. Next, I will describe a novel Cross-document Language Model (CDLM), which is geared to model cross-text information and hence better supports multi-text tasks. Lastly, I will present new tasks that extend the scope of cross-text informational links to be modeled. These include hierarchical cross-document coreference for scientific concepts, aligning matching proposition spans, and aligning propositional predicate-argument relations which are represented by QA-SRL question-answer pairs. I will conclude by discussing prospects for the evolution of this research line, while suggesting the added value of the multi-text setting as a foundational text understanding touchstone.

BIOGRAPHY OF THE INVITED SPEAKER

Ido Dagan is a Professor at the Department of Computer Science at Bar-Ilan University, Israel, the founder of the Natural Language Processing (NLP) Lab at Bar-Ilan, the founder and head of the nationally funded Bar-Ilan University Data Science Institute, and a Fellow of the Association for Computational Linguistics (ACL). His interests are in applied semantic processing, focusing on textual inference, natural open semantic representations, consolidation and summarization of multi-text information, and interactive text summarization and exploration. Dagan and colleagues initiated and promoted textual entailment recognition (RTE, later aka NLI) as a generic empirical task. He was the President of the ACL in 2010 and served on its Executive Committee during 2008-2011. In that capacity, he led the establishment of the journal Transactions of the Association for Computational Linguistics, which became one of two premiere journals in NLP. Dagan received his B.A. summa cum laude and his Ph.D. (1992) in Computer Science from the Technion. He was a research fellow at the IBM Haifa Scientific Center (1991) and a Member of Technical Staff at AT&T Bell Laboratories (1992-1994). During 1998-2003 he was co-founder and CTO of FocusEngine and VP of Technology of LingoMotors, and has been regularly consulting in the industry. His academic research has involved extensive industrial collaboration, including funds from IBM, Google, Thomson-Reuters, Bloomberg, Intel and Facebook, as well as collaboration with local companies under funded projects of the Israel Innovation Authority.

workshop schedule

In 2021 the workshop was actually running for 1.5 days, starting on November 10 with the CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogues and continuing on November 11 with the CRAC programme. Below you can find PDF presentations of all papers. The papers are available in the ACL Anthology in separate CODI-CRAC and CRAC proceedings. You can find links to individual papers in Accepted papers, Findings papers and Shared task papers sections above.

Day 1: CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogues (at CODI 2021)

Session 1: Welcome

9:05 – 9:30: The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue (Sopan Khosla, Juntao Yu, Ramesh Manuvinakurike, Vincent Ng, Massimo Poesio, Michael Strube and Carolyn Rosé)

Session 2: System Talks (part 1)

9:30 – 9:45: Neural Anaphora Resolution in Dialogue (Hideo Kobayashi, Shengjie Li and Vincent Ng)
9:45 – 10:00: Anaphora Resolution in Dialogue: Description of the DFKI-TalkingRobots System for the CODI-CRAC 2021 Shared Task (Tatiana Anikina, Cennet Oguz, Natalia Skachkova, Siyu Tao, Sharmila Upadhyaya and Ivana Kruijff-Korbayova)
10:00 – 10:15: The Pipeline Model for Resolution of Anaphoric Reference and Resolution of Entity Reference (Hongjin Kim, Damrin Kim and Harksoo Kim)
10:15 – 10:30: An End-to-End Approach for Full Bridging Resolution (Joseph Renner, Priyansh Trivedi, Gaurav Maheshwari, ‪Rémi Gilleron and Pascal Denis)

10:30 – 11:00: Coffee Break

Session 3: System Talks (part 2)

11:00 – 11:15: Adapted End-to-End Coreference Resolution System for Anaphoric Identities in Dialogues (Liyan Xu and Jinho D. Choi)

Session 4: Analysis Papers

11:15 – 11:30: Anaphora Resolution in Dialogue: Cross-Team Analysis of the DFKI-TalkingRobots Team Submissions for the CODI-CRAC 2021 Shared Task (Natalia Skachkova, Cennet Oguz, Tatiana Anikina, Siyu Tao, Sharmila Upadhyaya and Ivana Kruijff-Korbayova)
11:30 – 11:45: The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis Resolution: A Cross-Team Analysis (Shengjie Li, Hideo Kobayashi and Vincent Ng)

Session 5: Visioning and Discussion

11:45 – 12:00: Visioning, Discussion and Next Steps (Sopan Khosla, Ramesh Manuvinakurike, Vincent Ng, Massimo Poesio, Michael Strube and Carolyn Rose)

Day 2: CRAC 2021

NOTE: Findings of EMNLP papers presented at CRAC 2021 are marked with an asterisk.

Opening Remarks

9:00 – 9:05: Opening and welcome (Vincent Ng, Maciej Ogrodniczuk and Sameer Pradhan)

Paper Session 1

9:05 – 9:20: A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution in English (Hongming Zhang, Xinran Zhao and Yangqiu Song)
9:20 – 9:35: Coreference Resolution for the Biomedical Domain: A Survey (Pengcheng Lu and Massimo Poesio)
9:35 – 9:50: FantasyCoref: Coreference Resolution on Fantasy Literature Through Omniscient Writer’s Point of View (Sooyoun Han, Sumin Seo, Minji Kang, Jongin Kim, Nayoung Choi, Min Song and Jinho D. Choi)
9:50 – 10:00: CDLM: Cross-Document Language Modeling* (Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew Peters, Arie Cattan and Ido Dagan)
10:00 – 10:15: DramaCoref: A Hybrid Coreference Resolution System for German Theater Plays (Janis Pagel and Nils Reiter)
10:15 – 10:30: A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature (Andreas van Cranenburgh, Esther Ploeger, Frank van den Berg and Remi Thüss)

10:30 – 11:00: Coffee break

Invited Talk

11:00 – 12:00: Modeling informational relations across multiple texts (Ido Dagan) – see the abstract and bio of the invited speaker above

12:00 – 13:00: Lunch break

Paper Session 2

13:00 – 13:10: Lazy Low-Resource Coreference Resolution: a Study on Leveraging Black-Box Translation Tools (Semere Kiros Bitew, Johannes Deleu, Chris Develder and Thomas Demeester)
13:10 – 13:20: Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation* (Shahar Levy, Koren Lazar and Gabriel Stanovsky)
13:20 – 13:30: Do UD Trees Match Mention Spans in Coreference Annotations?* (Martin Popel, Zdeněk Žabokrtský, Anna Nedoluzhko, Michal Novák and Daniel Zeman)
13:30 – 13:40: End-to-end Neural Information Status Classification* (Yufang Hou)
13:40 – 13:50: Resources and Evaluations for Danish Entity Resolution (Maria Barrett, Hieu Trong Lam, Martin Wu, Ophélie Lacroix, Barbara Plank and Anders Søgaard)
13:50 – 14:05: CoreLM: Coreference-aware Language Model Fine-Tuning (Nikolaos Stylianou and Ioannis Vlahavas)
14:05 – 14:20: Data Augmentation Methods for Anaphoric Zero Pronouns (Abdulrahman Aloraini and Massimo Poesio)
14:20 – 14:30: Exploring Pre-Trained Transformers and Bilingual Transfer Learning for Arabic Coreference Resolution (Bonan Min)

14:30 – 14:45: Mini break

Universal Anaphora panel

14:45 – 14:55: Introduction (Massimo Poesio)
14:55 – 15:05: The Universal Anaphora Extension of the CONLL-U Markup Scheme (Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes and Dan Zeman)
15:05 – 15:15: The Universal Anaphora Scorer (Juntao Yu)
15:15 – 15:25: The CODI/CRAC Shared Task on Anaphora Resolution in Dialogue (Sopan Khosla, Juntao Yu, Ramesh Manuvinakurike, Vincent Ng, Massimo Poesio, Michael Strube and Carolyn Rosé)
15:25 – 15:45: Discussion

Best Papers Session

15:45 – 16:00: Event and Entity Coreference using Trees to Encode Uncertainty in Joint Decisions (Nishant Yadav, Nicholas Monath, Rico Angell and Andrew McCallum)
16:00 – 16:15: On Generalization in Coreference Resolution (Shubham Toshniwal, Patrick Xia, Sam Wiseman, Karen Livescu and Kevin Gimpel)

16:15 – 16:45: Coffee break

Paper Session 3

16:45 – 17:00: Improving Span Representation for Domain-adapted Coreference Resolution (Nupoor Gandhi, Anjalie Field and Yulia Tsvetkov)
17:00 – 17:15: Coreference by Appearance: Visually Grounded Event Coreference Resolution (Liming Wang, Shengyu Feng, Xudong Lin, Manling Li, Heng Ji and Shih-Fu Chang)
17:15 – 17:30: Anatomy of OntoGUM—Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms (Yilun Zhu, Sameer Pradhan and Amir Zeldes)
17:30 – 17:40: Understanding Mention Detector-Linker Interaction in Neural Coreference Resolution (Zhaofeng Wu and Matt Gardner)
17:40 – 17:50: Coreference-aware Surprisal Predicts Brain Response* (Evan Jaffe, Byung-Doh Oh and William Schuler)

Closing Remarks

17:50 – 18:00: Closing remarks (Vincent Ng, Maciej Ogrodniczuk and Sameer Pradhan)

Program Committee

Antonio Branco, University of Lisbon
Arie Cattan, Bar-Ilan University
Jackie Chi Kit Cheung, McGill University
Dan Cristea, Alexandru Ioan Cuza University of Iasi
Stephanie Dipper, Ruhr-University Bochum
Elisa Ferracane, Abridge
Yulia Grishina, Amazon
Christian Hardmeier, IT University of Copenhagen
Lars Hellan, Norwegian University of Science and Technology
Veronique Hoste, Ghent University
Yufang Hou, IBM
Mohit Iyyer, University of Massachusetts Amherst
Sobha Lalitha Devi, AU-KBC Research Center, Anna University of Chennai
Ekaterina Lapshinova-Koltunski, Saarland University
Sharid Loáiciga, University of Potsdam
Costanza Navaretta, University of Copenhagen
Anna Nedoluzhko, Charles University in Prague
Michal Novák, Charles University in Prague
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences
Constantin Orasan, University of Surrey
Sameer Pradhan, University of Pennsylvania and cemantix
Marta Recasens, Google
Manfred Stede, University of Potsdam
Don Tuggener, Zurich University of Applied Sciences
Yannick Versley, Amazon
Bonnie Webber, University of Edinburgh
Juntao Yu, Queen Mary University of London
Yilun Zhu, Georgetown University
Heike Zinsmeister, University of Hamburg

Organizing Committee

Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences
Sameer Pradhan, University of Pennsylvania and cemantix
Yulia Grishina, Amazon
Vincent Ng, University of Texas at Dallas
Massimo Poesio, Queen Mary University of London