Social networks represent a significant threat to users who are exposed to many risks and potential attacks. One of such threats is aggressive comments, which can produce long-term harm to victims, in the more accurate cases they can lead to suicide. This track focuses on the detection of aggressive comments in Twitter, a topic that has not been widely studied in the community. Participants will have to develop methods to determine whether a tweet is aggressive or not. This challenging task is further complicated by the fact that tweets come from Mexican users and from with a variety of backgrounds, making it a quite challenging (yet realistic and with high impact) problem.
To build the corpus, we collected tweets for three months. We used rude words and controversial hashtags to narrow the search. To select the set of terms that served as seeds for extracting the tweets, we used the words classified as vulgar & non-colloquial in the Diccionario de Mexicanismos de la Academia Mexicana de la Lengua, as well as words and hashtags identified by the Instituto Nacional de las Mujeres as related to violence and sexual harassment against women on Twitter . To ensure their origin, the tweets were collected considering their geolocation. We considered Mexico City as the center and extracted all tweets that were within a radius of 500 km.
After revising the literature on the subject and analyzing the definitions of other related linguistic manifestations such as hate speech, cyberbullying, and racism; an offensive, aggressive and vulgar language typology was reached: