AcCompl-it

Acceptability & Complexity evaluation task for Italian at EVALITA 2020


Overview

AcCompl-It is a task aimed at developing and evaluating methods to classify Italian sentences according to Acceptability and Complexity, which can be viewed as two “simple” numeric measures associated with linguistic productions. This is a task which is expected to attract the interest of different communities focusing on the study of language from different perspectives, whose outcome is expected to have an impact on NLP, psycholinguistics and theoretical linguistics investigations and applications.

From the NLP perspective, the renewed interest for automatic generation tasks (e.g. Machine Translation, Text Simplification, Summarization) has been recently prompted by more and more accurate systems mostly based on Deep Neural Networks algorithms (Gatt and Krahmer, 2018). In this scenario, the availability of annotated resources and systems aimed at predicting the level of grammatical acceptability or linguistic complexity of a sentence (see, among others, Warstadt et al., 2019 and Brunato et al., 2018) is becoming increasingly relevant either to evaluate automatically generated sentences or to investigate the ability of artificial neural networks to encode linguistic phenomena related to both notions in their representations.

From the theoretical linguistics perspectives, controlled datasets containing acceptability judgments and analyzed with machine learning techniques can be useful to test the extent to which syntactic and semantic deviance can be induced from corpus data alone, especially for low frequency phenomena (Chowdhury Zamparelli 2018, Gulordava et al. 2018, Wilcox et al. 2018), while the same data, seen from a psycholinguistic angle, can shed light on the relation between complexity and acceptability (Chesi & Canal 2019), and on the extent to which measures of on-line perplexity in artificial language models can track human parsing preferences (Demberg and Keller, 2008; Hale, 2001).

Definitions

  • Acceptability is the native speakers’ judgment of the well-formedness of a sentence. This is the core dimension for the evaluation of both formal and psycho-linguistic theories as well as for generation, summarization and machine translation tasks;

  • Complexity, namely a measure of the native speakers’ effort in processing a sentence, is required both for comparing neurolinguistic and psycholinguistic theories and for rating the accessibility of a text. Unlike more conventional studies on human sentence processing carried out in experimental settings, in this task this measure is intended as a judgment of perceived complexity given by humans to a sentence. This measure is also relevant for developing text generation systems addressing a specific target user in terms of linguistic competence.

In this task, the Acceptability and Complexity metrics will be declined as a score given to a sentence on a 7-points Likert scale (1 = lowest, 7 = highest).