
THE TASK: The task can be seen as a “headline translation” problem. Given a (collection of) headlines from two newspapers at opposite ends of the political spectrum, namely Il Giornale (G) and La Repubblica (R), change all G-headlines to headlines in style R, and all R-headlines to headlines in style G.

A crucial aspect of this task is designing evaluation settings, as not only evaluation of generated text is difficult in itself, but also because it has been shown that humans are not particularly good at the style detection task [De Mattei et al., to appear in proceeding of LREC 2020]. Evaluation for this task is therefore performed entirely automatically, through the use of thre classifiers. One is the main classifier, and it’s the model that will assign final scores to the submitted systems. The other two are sanity checks that ensure that (i) the two headlines (original and transformed) are still compatible (HH-compatibility classifier); and (ii) the headline is still compatible with the original article (AH-compatibility classifier).