One of the outputs of the C4E project has been the creation of a corpus of 228,412 tweets in Italian, called the UNIOR Eye corpus. The UNIOR Eye Corpus is a 22-million-word collection of tweets on environmental issues posted on Twitter from 1 January 2013 to 6 August 2020. Tweets were downloaded through Twitter APIs on the basis of their thematic relevance by using as keywords the entries of the glossary C4E – Environmental Crimes Glossary of Terms.
The UNIOR Eye corpus is semantically subdivided into four sub-corpora:
waste and Terra dei Fuochi (The Land of Fires)
water-related crimes
hazardous substances and materials
environmental fires
UNIOR Eye corpus - details :
Language : Italian
Tokens : 22,780,746
Types : 569,905
Type/token ratio : 0.025
Number of tweets : 228,412
Timeframe : 1 January 2013 – 6 August 2020
The UNIOR Eye corpus Data statements are downloadable at this link.
The UNIOR Eye corpus is released under the Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0 International) license.
To have free access to the corpus, please fill this form.
If you use the UNIOR Eye corpus in your work, please cite the following papers:
Pascucci, A., Punzi Zarino, W., Manna, R., Simoniello, V., Magliacane, A., and Monti, J., Collective intelligence as a resource to monitor environmental crimes. How to recognize an alert tweet. Second International Conference of the European Association for Digital Humanities (EADH 2021), Siberian Federal University, Krasnoyarsk (Russia), September 2021.
Manna, R., Magliacane, A., Pascucci, A., Punzi Zarino, W., and Simoniello, V., Geoparsing with NooJ Italian toponym resolution for environmental crimes. In Magali Bigey, Annabel Richeton, Max Silberztein, Izabella Thomas (eds.), Book of abstracts of the 15th International NooJ 2021 Conference, Université de Franche-Comté, Besançon (France), 9-11 June 2021.
Manna, R., Pascucci, A., Punzi Zarino, W., Simoniello, V., and Monti, J. (2021), Hashtags as an information source. Analyzing tweets to map La Terra dei Fuochi. In Federico Boschetti, Angelo Mario Del Grosso, Enrica Salvatori (edd.), AIUCD 2021 - DHs for society: e-quality, participation, rights and values in the Digital Age. Book of extended abstracts of the 10th national conference, ISBN: 9788894253559.
Manna, R., Pascucci, A., Punzi Zarino, W., Simoniello, V., and Monti, J. (2020), Monitoring Social Media to Identify Environmental Crimes through NLP A Preliminary Study. In Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it), CEUR workshop proceedings, ISSN: 1613-0073, ISBN: 9791280136282.