TWITTIRÒ-UD is a collection of ironic Italian tweets annotated in Universal Dependencies. The treebank can be exploited for the training of NLP systems to enhance their performance on social media texts, and in particular, for irony detection purposes.
TWITTIRÒ-UD has been created by enriching a resource originally developed for training and testing irony detection systems, also exploited as a benchmark for the Italian irony detection task held in EVALITA 2018 (Cignarella et al., 2018). The treebank comprises both the fine-grained annotation for irony applied in Karoui et al. (2017), and the morphological and syntactic information encoded by the UD format.
The original corpus consists of 1,424 tweets (28,387 tokens). The syntactic annotation process was carried out through alternating steps of automatic scripting and manual revision, and finally with some out-of-domain parsing experiments. Parsing results also underwent a manual revision by two independent annotators.
Inflected lexicon of Italian verbs, listing the content of the 53 non-periphtastic cells of the 2,053 lexemes of GLAFF-IT that have frequency of 10 or more in COLFIS, given in IPA transcription.