Results

Results for DISPT 2021 Shared Task

Ranks on each task are determined by macro-averaged score on all datasets. Individual dataset scores are micro-averaged over discourse units/connectives/discourse relations.

Main Results

Notes:

For teams that submitted multiple system, the best scoring system by macro-averaged f-score on all datasets was selected to represent the team.
Scores for systems that were not deterministically seeded were reproduced by averaging 5 randomly initiated runs on each data set, and macro-averages are computed from the set of 5 runs.

* Full disclosure: systems marked with an asterisk were submitted by teams which included Shared Task co-organizers and annotators of some of the original datasets included in the task. Systems, including those marked with asterisks, were always re-run and reproduced by different teams, and all datasets' original publications were already made independently of the Shared Task.

We support transparent Shared Task implementation along the lines proposed by Escartín et al. (2021).

Results by Dataset

EDU Segmentation (treebanked)

EDU Segmentation (plain)

Connective Detection

Treebanked (top) and plain (bottom)

Relation Classification

Report abuse