DisCoTex
Assessing DIScourse COherence in Italian TEXts
Shared Task at EVALITA 2023
Overview and Motivations
DisCoTEX is the first shared task focused on modelling discourse coherence for Italian real-word texts.
Coherence is a key property of any well-organized text and it plays a crucial role in human discourse processing. Indeed, as individuals process unfolding text, they are required to assemble information from single sentences and to draw inferences between and among them in order to create a meaningful mental representation of the whole text. According to an influential theory of text representation, i.e. the tripartite model developed by Van Dijk and W. Kintsch (1983), this is the outcome of a three-step process in which readers construct multileveled memory representations of a text, encoding different, and progressively more abstract, information at each level. From this perspective, coherence is an inherently psychological construct, thus very hard to be modelled; however, it also has a counterpart at the level of linguistic content and structure, which is generally addressed in terms of ‘cohesion’, a property of a text that is conveyed by signalling linguistic devices such as reference, ellipsis, discourse connectives, argument overlap, which help readers make explicit the logical links between different units in texts.
As regards the computational modelling, coherence has been widely investigated in the NLP community and particularly in the "pre-deep-learning" era. For instance, inspired by the Centering Theory framework (Grosz et al., 1995), many studies have introduced different versions of the entity-grid approach, which focuses on local coherence, i.e. coherence that can be assessed locally in terms of transitions between adjacent sentences (see, among others, Barzilay and Lapata, 2008; Elsner and Charniak, 2011). More recently, also neural models have been applied to deal with both structured representations of text and unstructured text by taking advantage of neural models’ ability to learn useful representations for the task (e.g. Nguyen and Joty, 2017; Li and Jurafsky, 2017) .
Modelling coherence in natural language is of pivotal importance in a variety of downstream applications: from automatic essay scoring in language learning scenarios (Lai and Tetreault, 2018; Mesgar and Strube, 2018), where it can provide writing feedback such as detecting poorly organized paragraphs or abrupt transitions between topics, to automatic language assessment in clinical settings (Elvevåg et al., 2007; Iter et al., 2018) as speech irregularities, generally perceived as a lack of coherence, are a recognized marker of several mental disorders such as schizophrenia. In addition, coherence has also been introduced as an intrinsic evaluation metric for assessing the quality of texts automatically produced by Natural Language Generation systems. A further emerging scenario of great interest is related to research on the interpretability of modern deep neural networks. In this respect, while existing work on probing pre-trained language models has largely focused on sentence-level properties, the ability of these models to encode discourse and pragmatic phenomena is still unclear (Shen et al., 2021; Chen et al., 2019; Farag et al., 2020).
Organizers
Dominique Brunato, ItaliaNLP Lab, Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR), Pisa
Davide Colla, Dipartimento di Informatica, Università degli Studi di Torino
Felice Dell'Orletta, ItaliaNLP Lab, Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR), Pisa
Irene Dini, ItaliaNLP Lab, Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR), Pisa
Daniele Paolo Radicioni, Dipartimento di Informatica, Università degli Studi di Torino
Andrea Amelio Ravelli, Università di Bologna, Bologna
Contact the organizers: discotex.evalita2023@gmail.com
Timeline
7th February 2023: development data available to participants (!!!now available!!!)
30th April 2023: registration closes
2nd-9th May 2023: evaluation window (TEST DATA and SUBMISSION FORM AVAILABLE!)
30th May 2023: assessment returned to participants
14th June 2023: final reports (from participants) due to task organizers
28th June 2023: final reports (from task organizers) due to EVALITA chairs
10th July 2023: review deadline
25th July 2023: camera ready version deadline
7th-8th September 2023: final workshop in Parma