Shared task

Discourse Unit Segmentation across Formalisms

The DISRPT 2019 workshop introduces the first iteration of a cross-formalism shared task on discourse unit segmentation. Since all major discourse parsing frameworks imply a segmentation of texts into segments, learning segmentations for and from diverse resources is a promising area for converging methods and insights. We provide training, development and test datasets from all available languages and treebanks in the RST, SDRT and PDTB formalisms, using a uniform format. Because different corpora, languages and frameworks use different guidelines for segmentation, the shared task is meant to promote design of flexible methods for dealing with various guidelines, and help to push forward the discussion of standards for discourse units. For datasets which have treebanks, we will evaluate in two different scenarios: with and without gold syntax, or otherwise using provided automatic parses for comparison.

Shared Task Data and Formats

Data for the shared task is released via GitHub together with format documentation and tools:

Important dates

  • Fri, December 28, 2018 - shared task sample data release
  • Mon, January 21, 2019 - training data release
  • Fri, February 15, 2019 - test data release
  • Thu, February 28, 2019 - papers due (shared task & regular workshop papers)
  • Wed, March 27, 2019 - notification of acceptance
  • Fri, April 5, 2019 - camera-ready papers due
  • June 6, 2019 - workshop