Aggressiveness Identification

Social networks represent a significant threat to users who are exposed to many risks and potential attacks. One of such threats is aggressive comments, which can produce long-term harm to victims, in the more accurate cases they can lead to suicide. This track focuses on the detection of aggressive comments in Twitter, a topic that has not been widely studied in the community. Participants will have to develop methods to determine whether a tweet is aggressive or not. This challenging task is further complicated by the fact that tweets come from Mexican users and from with a variety of backgrounds, making it a quite challenging (yet realistic and with high impact) problem.

To build the corpus, we collected tweets for three months. We used rude words and controversial hashtags to narrow the search. To select the set of terms that served as seeds for extracting the tweets, we used the words classified as vulgar & non-colloquial in the Diccionario de Mexicanismos de la Academia Mexicana de la Lengua, as well as words and hashtags identified by the Instituto Nacional de las Mujeres as related to violence and sexual harassment against women on Twitter . To ensure their origin, the tweets were collected considering their geolocation. We considered Mexico City as the center and extracted all tweets that were within a radius of 500 km.

After revising the literature on the subject and analyzing the definitions of other related linguistic manifestations such as hate speech, cyberbullying, and racism; an offensive, aggressive and vulgar language typology was reached:

  • Offensive language: aims at insulting or humiliating a group or individual, usually using derogatory or derogatory terms. An example from the corpus is: No es que estés gorda, lo gordo se quita. Es tu cara de caballo. This tweet humiliates a woman, makes fun of her body and compares her to an animal.
  • Aggressive language: seeks to harm or hurt a group or individual by referring to or inciting violence. An example from the corpus is: pero estas gorda... aprovecha tu fin pendeja que el lunes te violo. This tweet involves insults and a rape threat.
  • Vulgar language: it involves profanity, with sexual connotation and sometimes double entendre, but may or may not refer to an individual or collective. An example from the corpus is: Martes con de M de Mamando onvre se arreglan las cosas... creo... eso dicen. This tweet uses obscene vocabulary and is sexually explicit.