Submission

Shared Task - Discourse Relation Parsing and Treebanking (DISRPT 2021)

In conjunction with: EMNLP 2021 and CODI 2021

Discourse Across Formalisms

This year we will hold the second iteration of a cross-formalism shared task covering three topics: discourse unit segmentation and discourse connective identification (repeated from 2019 with updated data) and a new task on discourse relation classification.

Since all major discourse parsing frameworks imply a segmentation of texts into segments, learning segmentations for and from diverse resources is a promising area for converging methods and insights. We will provide training, development and test datasets from all available languages in Rhetorical Structure Theory (RST, Mann & Thompson 1988), Penn Discourse Treebank (PDTB, Miltsatsaki et al. 2004), and Segmented Discourse Representation Theory (SDRT, Asher & Lascarides 2003), using a uniform format. Because different corpora, languages, and frameworks use different guidelines, the shared task will promote the design of flexible methods for dealing with various guidelines, and will help to push forward the discussion of converging standards for discourse units. For datasets which have treebanks, we will evaluate in two different scenarios: with and without gold syntax.

We also propose the first iteration of a cross-formalism shared task on relation classification. Data is again converted from three distinct, but overlapping frameworks: RST, PDTB, and SDRT. The goal of the shared task is to bring together diverse strands of research on discourse relation identification, which are sometimes siloed separately due to differences between underlying data structures and assumptions of different frameworks. In order to enable approaches benefitting from multiple datasets created using distinct points of view, the task aims to find a command denominator in representing all available datasets in a uniform way, for the widest possible range of languages.

System and Paper Submission


We believe that evaluation and analysis are very important, and therefore we require all systems to be accompanied by a paper using the EMNLP template. Participants may submit full long papers (max. 8 pages + 1 after acceptance) until 5 August 2021. Submissions should be anonymised for double blind reviewing. All papers will be published in the Shared Task section of the proceedings of the CODI workshop. Workshop proceedings will be available online via the ACL Anthology.


Submission website: https://www.softconf.com/emnlp2021/DISRPT/


In the interest of transparency, papers whose authors include shared task organizers or annotators of one of the included datasets, or both, must state so in a footnote in the final version of the paper.


During paper submission, authors will be asked to provide a link to their system, including all necessary resources which are not trivially available (for example, there is no need to provide pre-trained models available from huggingface, etc.). All systems must include code to retrain the system from scratch, so that evaluators can test aspects of the system’s performance and reproduce reported scores, as well as a detailed README file explaining how to train the system. Please note that system training and testing must be callable from the command line (so e.g. Jupyter notebooks do not meet this criterion). Systems which cannot be run in the evaluation phase will not be accepted.


Please make sure to use seeds to keep performance as reproducible as possible!