Data and Evaluation

For both tracks, we split the data into training and testing partitions. For developing their methods, participants will use the training partition, and subsequently the test partition will be used to evaluate the participant methods and to determine the winner of the challenge. For ranking participants, we will use the F1 measure: macro average F1 measure for the author profiling track, and the F1 on the aggressive class for the aggressiveness identification track.

Training corpus

The training data file is password-protected; to obtain the password you first need to be registered as participant.

Additional resources

In order to provide more elements for the participants' experiments, we make available to all teams the following resources:

FastText Word Embeddings for Spanish Language Variations: We trained the model on Mexican external tweets (Download) .
- 1,247.3M tokens
- 100 dimension each one
Word2vec MEX-A3t model, trained with the MEX-A3t corpus (Download).
- 500 thousands tokens
- 200 dimension each one
ImageNet vectors: We obtained the last 2 layers of the ImageNet model (VGG16) for each image in the author profiling collection (Download last layer, download penultimate layer).

Evaluation rules

The performance of your author profiling solution will be ranked by the average of the F1 measure of gender, residence and the occupation dimensions. We will use the macro average F1 measure.

The performance of your aggressive detection solution will the ranked by the F1 measure on the aggressive class.

Runs for Track 1 will be received from 15th April 0:01 until 6th May, 23:59 (-0600 UTC)Runs for Track 2 will be received from 15th April 0:01 until 6th May, 23:59 (-0600 UTC)

Participants are allowed to submit up to two runs for each track: one primary and one secondary. The participants must clearly flag each of the two.

Output submission

Submissions formatted as described below and sent via email to the account: mex.a3t@gmail.com

Your software has to output for each task of the dataset a corresponding txt file. The file must contain one line per classified instance. Each line looks like this:

"TaskName"\t"IdentifierOfAnInstance"\t"Class"\n

It's important to respect the format with the " character, \t (tabulator) and \n (linux enter). The naming of the output files is up to you, we recommend to use the author and a run's identifier as filename with "txt" as extension.

For the aggressiveness track the possible labels are:

TaskName: aggressiveness
IdentifierOfAnInstance: tweet-NumberOfLine
- where NumberOfLine is the number line of the each tweet in the test file.
Class: {0, 1}
Output example:

"aggressiveness" "tweet-1" "1" "aggressiveness" "tweet-2" "0" "aggressiveness" "tweet-3" "0" "aggressiveness" "tweet-4" "1" "aggressiveness" "tweet-5" "0"

For the author profiling track we need three different files and the possible labels are:

1. For gender identification task:

TaskName: gender
IdentifierOfAnInstance: Name of the classified file (even the extension)
Class: {male, female}
Output example:

"gender" "36ef4c1a63d30b5563502e305303ddcd.txt" "male" "gender" "c99417659c53cf9274a67b63e232a300.txt" "female" "gender" "42e78514662b69b682828ea292e937c1.txt" "female" "gender" "4afece969c3d4db458a1b07502c98c09.txt" "male" "gender" "d3ce2c105b723a76bc16d7fc2220c3ea.txt" "male"

2. For the location identification task:

TaskName: location
IdentifierOfAnInstance: Name of the classified file (even the extension)
Class: {northwest, north, northeast, west, center, southeast}
Output example:

"location" "36ef4c1a63d30b5563502e305303ddcd.txt" "center" "location" "c99417659c53cf9274a67b63e232a300.txt" "center" "location" "42e78514662b69b682828ea292e937c1.txt" "northeast" "location" "4afece969c3d4db458a1b07502c98c09.txt" "center" "location" "d3ce2c105b723a76bc16d7fc2220c3ea.txt" "southeast"

3. For the occupation identification task:

TaskName: occupation
IdentifierOfAnInstance: Name of the classified file (even the extension)
Class: {administrative, arts, health, sciences, others, social, sports, student}
Output example:

"occupation" "0c387347349d18ecedfe438d0bfb50b1.txt" "social" "occupation" "3715c379cdbb936b42531c47b8b72d30.txt" "arts" "occupation" "8c3f774931fb10ad8eca0c9d241136a2.txt" "administrative" "occupation" "573325dfb5f374b27379262d017e66aa.txt" "arts" "occupation" "8d36d5b62c3492a130e10cf267396e20.txt" "health"

4. The name of the outputs files:

Each file must contain the team name, the trait target, the modality used for the output, all separated by "_".
Format: TeamName_Trait_Source.txt
Where
- TeamName: It is the registration team name
- Trait: {Gender, Location, Occupation}
- Source: {Text, Image, Text-Image}

A submission failing the format checking will be considered null.

Paper submission

Participants of the tasks will be given the opportunity to write a paper that describes their system, resources used, results, and analysis that will be part of the official IberLef-2019 proceedings. The paper is to be FOUR pages long plus two pages at most for references, and are required to be formatted in the Springer LNCS format (see http://www.springer.de/comp/lncs/authors.html).

Papers must be written in English.

Google Sites

Report abuse