The first results of C4E - Crowd for the Environment project were presented in January 2021 at the AIUCD 2021 – DHs for society: e-quality, participation, rights and values in the Digital Age. The core of this research is the extraction of information from tweets according to hashtags and their frequency. The analysis of each cluster (obtained by using the K-means clustering algorithm) at the lexical level allows us to identify alert tweets, political background tweets and personal opinion tweets. Once the alerts tweets about La Terra dei Fuochi are identified (precisely 36,207), we first extract the exact location (preceded by a hashtag) where the crime took place and then establish the type of crime, displaying this information on a map with the use of Carto.
Our Research Team also presented further results of the project in March 2021 at The Seventh Italian Conference on Computational Linguistics (CLiC-iT 2020). This study is carried out on the UNIOR Eye corpus, a dataset composed of tweets related to environmental crimes and organized into four different subsections, each one concerning a given environmental human-related disaster. In order to build a model that is able to detect which class a tweet belongs, more precisely, we extracted the waste crimes subsection (made up of 86,206 tweets) and we annotated the tweets on the basis of two labels, alert and no alert.
Some additional interesting recent results were also presented by our Team in June 2021 at the International NooJ 2021 Conference, a research work whose aims is to exploit NooJ's functionalities and grammars to build a location extraction system able to analyze the linguistic occurrences in Italian for crime, location and period of time textual mentions, and to detect and extract non-individual toponyms and fuzzy location patterns found in the UNIOR Eye corpus, in order to be able to offer responders a more accurate geographical position of the locations where environmental crimes and human-related disasters occur.
Future research will concern the extension of the annotation to the entire corpus, particularly furthering experiments through NLP techniques on texts and themes related to environmental crimes, as useful tools to provide effective support to the protection of environment by monitoring social media.