DisCoTex

Assessing DIScourse COherence in Italian TEXts

Shared Task at EVALITA 2023





Overview and Motivations 

DisCoTEX is the first shared task focused on modelling discourse coherence for Italian real-word texts.

Coherence is a key property of any well-organized text and it plays a crucial role in human discourse processing. Indeed, as individuals process unfolding text, they are required to assemble information from single sentences and to draw inferences between and among them in order to create a meaningful mental representation of the whole text. According to an influential theory of text representation, i.e. the tripartite model developed by Van Dijk and W. Kintsch (1983), this is the outcome of a three-step process in which readers construct multileveled memory representations of a text, encoding different, and progressively more abstract, information at each level. From this perspective, coherence is an inherently psychological construct, thus very hard to be modelled; however, it also has a counterpart at the level of linguistic content and structure, which is generally addressed in terms of ‘cohesion’, a property of a text that is conveyed by signalling linguistic devices such as reference, ellipsis, discourse connectives, argument overlap, which help readers make explicit the logical links between different units in texts. 

As regards the computational modelling, coherence has been widely investigated in the NLP community and particularly in the "pre-deep-learning" era. For instance, inspired by the Centering Theory framework (Grosz et al., 1995), many studies have introduced different versions of the entity-grid approach, which focuses on local coherence, i.e. coherence that can be assessed locally in terms of transitions between adjacent sentences (see, among others, Barzilay and Lapata, 2008; Elsner and Charniak, 2011). More recently, also neural models have been applied to deal with both structured representations of text and unstructured text by taking advantage of neural models’ ability to learn useful representations for the task (e.g. Nguyen and Joty, 2017; Li and Jurafsky, 2017) . 

Modelling coherence in natural language is of pivotal importance in a variety of downstream applications: from automatic essay scoring in language learning scenarios (Lai and Tetreault, 2018; Mesgar and Strube, 2018), where it can provide writing feedback such as detecting poorly organized paragraphs or abrupt transitions between topics, to automatic language assessment in clinical settings (Elvevåg et al., 2007; Iter et al., 2018) as speech irregularities, generally perceived as a lack of coherence, are a recognized marker of several mental disorders such as schizophrenia. In addition, coherence has also been introduced as an intrinsic evaluation metric for assessing the quality of texts automatically produced by Natural Language Generation systems. A further emerging scenario of great interest is related to research on the interpretability of modern deep neural networks. In this respect, while existing work on probing pre-trained language models has largely focused on sentence-level properties, the ability of these models to encode discourse and pragmatic phenomena is still unclear (Shen et al., 2021; Chen et al., 2019; Farag et al., 2020).

Organizers




Contact the organizers: discotex.evalita2023@gmail.com

Timeline