Author profiling is the task of extracting as much information as possible from authors to generate a descriptive profile of them. Previous campaigns on author profiling, e.g., those from the PAN labs, have focused on profiling authors regarding their gender, age, and even personality traits. This track is going several steps further, by focusing on profiling dimensions that have not received enough attention from the community. Specifically, it consists on determining the gender, occupation and place of residence of users from their tweets. This is a much more challenging problem that will require of adapting existing methodologies or proposing new ones for the analysis of tweets. Additionally, the track focuses on tweets generated by Mexican users, which poses additional challenges related to the treatment of a variety of Spanish with many cultural particularities.
Also, this year, we have introduced the multimodal MEX-A3T collection for author profiling. In addition to text, we have included ten random images taken from the profile of each author besides his/her profile image. With this, the community will have the challenge of obtaining advantage of textual and visual information for author profiling, for predicting the place of residence and occupation traits of Mexican users.
The data set for this track was collected between June and November 2016 according to the following methodology. Firstly, two human taggers extract a set of twitter accounts representative for different regions of Mexico, for example, they selected some accounts from politicians, famous places as well as universities and city councils. Then, they searched for followers of these accounts such that the information of gender, occupation, and place of residence was available; granted by the same users in one of their social networks. The categories for each of three profiling dimensions, together with the distribution of samples available in the corpus, are described in Tables 1-3.
Table 1. Occupation distribution
Table 2. Location distribution
Table 3. Gender distribution
In order to provide more elements for the participants' experiments, we make available to all teams the following resources:
Download) .Download).Download last layer, download penultimate layer).