Task

Inspired by previous literature on coherence modeling, the DisCoTEX task is conceived to address two distinct scenarios: a more traditional one, aimed at evaluating whether models are able to distinguish well-organized documents from corrupted ones, where the latter are typically created by shuffling the sentence order of the original document, or by replacing specific linguistic elements that convey coherence within and across sentences (such as personal pronouns or discourse connectives); a less explored one, which assesses the models' performance on texts evaluated for coherence by human raters.

According to this distinction, we propose the following (possibly independent) sub-tasks:

Sub-task 1 - Last sentence classification: this is conceived as a binary classification task. Specifically, given a short paragraph (approximately made by three consecutive sentences), hereafter defined to as prompt, and an individual sentence (the target), participants will be asked to classify whether the target follows or not, thus joining it to the prompt gives out a coherent or incoherent text. The negative target can be either a sentence randomly chosen from a different document or a sentence extracted from the same document of the prompt, in order to introduce incremental degrees of complexity on the resolution of the task;
Sub-task 2 - Human score prediction: this is conceived as a regression task in which participants will be asked to predict the average coherence score assigned by human raters to short paragraphs which were evaluated both in their original and artificially modified version. Judgments will be collected through crowdsourcing and on an ordinal scale (i.e. 5-point Likert Scale), on the assumption that coherence is a gradual notion. As shown by previous work on the automatic assessment of subjective phenomena such as sentence complexity and acceptability (e.g. Brunato et al., 2020), we expect this to be a more challenging scenario, as it aims at modeling the human perception of coherence which may undergo the effect on both linguistic and extra-linguistic variables, as observed in previous studies (Lai and Tetrault, 2018).

For both sub-tasks, dataset were extracted from two corpora representative of two distinct domains. Please refer to the Data page for their description.