AcCompl-It is articulated into the following subtasks:
ACCEPT Subtask: providing an acceptability score on a 1-7 Likert scale for each sentence in the test set, along with an estimation of its standard error;
COMPL Subtask: providing a complexity score on a 1-7 Likert scale for each sentence in the test set, along with an estimate of its standard error;
OPEN Subtask: modeling linguistic phenomena correlated with the human ratings of sentence acceptability and/or complexity in the datasets provided.
The three subtasks are independent. Participants can decide to participate in just one of them, though we encourage participation in multiple subtasks, since the complexity metrics might be influenced by the grammatical status of an expression, and vice versa.
In the ACCEPT and COMPL subtasks, the reference metrics will be 7-points Likert scale scores, 1 = lowest, 7 = highest.
The OPEN subtask is an atypical shared task, conceived under the motto “NOT FIRST, BUT FURTHER”. Here, the aim of a participant is not to be ranked first or in a top position, but rather to go further in the explanation of a given phenomenon (to be freely chosen by each participant), by building a model in line with the human scores associated with sentences in the provided datasets. To assess the validity of the models of selected phenomena developed on the basis of the training datasets, the participants in the task will receive by the organizers a non-blind test set. The participants to the OPEN Subtask are expected to submit a final report describing the target phenomena and motivating their selection, the developed models, used resources and results achieved against the test set.
In all subtasks, participants are free to use external resources, but every resource used has to be described in detail in the final report.
Evaluation
For the ACCEPT Task and the COMPL Task, the evaluation metric will be based on Spearman's rank correlation coefficient between participants’ scores and test set scores. For each subtask, two different ranks will be produced according to the prediction of the relative scores and to standard errors.
For each subtask a different baseline is defined:
ACCEPT-baseline: it corresponds to the score assigned by a SVM linear regression using unigram and bigram of words as features;
COMPL- baseline: it corresponds to the score assigned by a SVM linear regression using sentence length as feature.