Evaluation

Metrics

Two different evaluation metrics will be defined according to the task setting:

For Sub-task 1: the evaluation metric will be based on the Accuracy (as the ratio between all hits and all processed records) obtained by each system in the test set. A second metric will be made also available, in order to grade the errors with respect to the gold results.
For Sub-task 2: the evaluation metric will be based on a standard correlation coefficient (Pearson and/or Spearman) between the participants' scores and test set scores.

Baseline

The baseline for both tasks will be computed by employing the one-hot vectors representation:

For Sub-task 1: the vector will be extracted from each sentence si in the input prompt P = {s1,s2, ..., sn} and another vector will be created for the target sentence t. The distance between P and the target sentence t, D(P, t) will be computed as the average distance between each pair involving one item si and t based on a distance metric Dist (e.g., Hamming distance, Jaccard, or a Edit distance):

To decide whether the target sentence t is coherent with the paragraph P we will first compute the median value across the whole training dataset, and then we will use this as a threshold: all the occurrences with a value above the median will be considered coherent, incoherent otherwise.

For Sub-task 2: the vector will be extracted from each sentence si in the input prompt P = {s1,s2, ..., sn}; that is the following vectors set will be computed:

The proximity between each two vectors ⟨vx, vx+1⟩ ∈ V will then be computed through a distance metric Dist(s1,s2) (e.g. Jaccard), thereby resulting in (n − 1) distance scores, grasping the degree of semantic overlap between each two neighbouring sentences. In order to compute the coherence score for the paragraph P score(P), we will average the scores featuring each pair of adjacent sentences. The value will then be compared with the human rating with correlation indices:

where corr indicates the Pearson or Spearman correlation index.

The code to run the baseline has been published on DisCoTex's GitHub repository.

Upload submission

Upload submission by pointing your browser to the URL https://forms.gle/dsWGuLEJdGPykfvx7 .