In conjunction with EMNLP 2025 and CODI-CRAC 2025 workshop
This year, we're organizing DISRPT 2025 as a shared task on discourse processing across formalisms, for a variety of languages and genres. It is the third iteration of a cross-formalism shared task on discourse analysis, with three subtasks this year:
Task 1: Discourse segmentation
Task 2: Connective identification
Task 3: Relation classification.
We will provide training, development and test datasets from all available languages in RST, SDRT, PDTB, ISO and DEP, using a uniform format. Because different corpora, languages, and frameworks use different guidelines, the shared task will promote the design of flexible methods for dealing with various guidelines, and will help to push forward the discussion of converging standards for discourse units. This year, we will use a unified set of labels for the Task3: Relation classification. For datasets which have treebanks, we will evaluate segmentation in two different scenarios: with and without gold syntax. An automatically parsed version is provided for all corpora without a gold parse.
Data for the shared task is released via GitHub together with format documentation and tools:
https://github.com/disrpt/sharedtask2025
See here for more information about the previous 2019, 2021 and 2023 shared tasks:
https://sites.google.com/view/disrpt2023/
May 16 2025 – Sample data release
June 17 2025 – Training / dev data release
July 16 2025 – Test data release
August 4 2025 – System and paper submissions due
September 12 2025 – Notification of acceptance
September 19 2025 – Camera ready paper due
November 9 2025 – CODI-CRAC workshop at EMNLP
Submission link: https://softconf.com/emnlp2025/disrpt2025/
Contact the organizers: disrpt_chairs@googlegroups.com
Official website: https://sites.google.com/georgetown.edu/disrpt2025
Google group for participants, please join us on: disrpt2025_participants@googlegroups.com
Discord group for participants, please join us on: https://discord.gg/3f7JuTYs