
Inspired by previous literature on coherence modeling, the DisCoTEX task is conceived to address two distinct scenarios: a more traditional one, aimed at evaluating whether models are able to distinguish well-organized documents from corrupted ones, where the latter are typically created by shuffling the sentence order of the original document, or by replacing specific linguistic elements that convey coherence within and across sentences (such as personal pronouns or discourse connectives); a less explored one, which assesses the models' performance on texts evaluated for coherence by human raters.

According to this distinction, we propose the following (possibly independent) sub-tasks:

For both sub-tasks, dataset were extracted from two corpora representative of two distinct domains. Please refer to the Data page for their description.