MEX-A3T: Authorship and aggressiveness analysis in Twitter
case study in Mexican Spanish
2019
2019
E-communication in general, and social networks, in particular, are increasingly playing crucial roles in everyone's life. Because of that, the analysis of textual information coming from social networks has been a popular research topic among the computational linguistics community. In this sense, very effective methods have been developed for such purpose, resulting in a better understanding on how to deal with inherent problems from such domain, such as shortness, slang, non-thematic nature, multilingualism, multimodality, among others. Largely, this research progress can be attributed to academic competitions or dedicated tasks that seek to advance the state of the art in a particular research topic of practical relevance (see e.g., the series of events organized by IberEval , TASS and PAN ).
Despite of such progress, there are still open issues that deserve further research in order to be solved or at least to better understand them. Accordingly, in the previous year we organized a shared task at IberEval 2018 aimed to advance the state of the art on the non-thematic analysis of short texts written in Mexican Spanish. In particular, the 2018 edition of MEX-A3T considered two main tracks: on the one hand, a track on author profiling, whose aim was to develop methods for profiling users according to non-standard dimensions (gender, occupation and place of residence), and, on the other hand, a track on aggressiveness detection in tweets.
The goal of the second edition of MEX-A3T is to further improve the research in these two important NLP tasks as well as to continue pushing the computational treatment of the Mexican Spanish. The MEX-A3T@IberLEF2019 has the following two tracks:
AUTHOR PROFILING TRACK: It consists on determining the gender, occupation and place of residence of users from their tweets. The track focuses on analyzing tweets generated by Mexican users, which poses additional challenges related to the treatment of a variety of Spanish with many cultural particularities. As a novelty in this edition, the track considers the use of text and images as information sources. The purpose is to explore and study the relevance and complementarity of multimodal information for profiling social media users.
AGGRESSIVENESS DETECTION TRACK: This track follows up on last year's evaluation task; it focuses on the detection of aggressive tweets in Mexican Spanish. Due to the low performances reported by most participants in the previous edition of MEX-A3T, we have decided to carry out exactly the same task and use the same data than in previous year.