Discourse Unit Segmentation across Formalisms
The DISRPT 2019 workshop introduces the first iteration of a cross-formalism shared task on discourse unit segmentation. Since all major discourse parsing frameworks imply a segmentation of texts into segments, learning segmentations for and from diverse resources is a promising area for converging methods and insights. We provide training, development and test datasets from all available languages and treebanks in the RST, SDRT and PDTB formalisms, using a uniform format. Because different corpora, languages and frameworks use different guidelines for segmentation, the shared task is meant to promote design of flexible methods for dealing with various guidelines, and help to push forward the discussion of standards for discourse units. For datasets which have treebanks, we will evaluate in two different scenarios: with and without gold syntax, or otherwise using provided automatic parses for comparison.
Shared Task Data and Formats
Data for the shared task is released via GitHub together with format documentation and tools:
Fri, December 28, 2018 - shared task sample data release Mon, January 21, 2019 - training data release Fri, February 15, 2019 - test data release Thu, February 28, 2019 - papers due (shared task & regular workshop papers) Thu, March 7 (extended), 2019 - papers due (shared task & regular workshop papers) Wed, March 27, 2019 - notification of acceptance Fri, April 5, 2019 - camera-ready papers due
- June 6, 2019 - workshop
Results for DISPT 2019 shared task
Ranks on each task are determined by macro-averaged f-score on all datasets. Individual dataset scores are micro-averaged over discourse units/connectives.
- For teams that submitted multiple system, the best scoring system by macro-averaged f-score on all datasets was selected to represent the team.
- Scores for systems that were not deterministically seeded were collecting by averaging 5 randomly initiated runs on each data set, and macro-averages are computed from the set of 5 run averages, i.e. one mean f-score per run of all datasets, final f-score is the mean of these means.