Task description

GeoLingIt comprises two tracks, with two subtasks each

We devise a standard track and a special track. For the standard track, participants will be provided with data at the country level, whereas for the special track, participants will be provided with data restricted to an area of choice (subject to data availability and representativeness).

In both tracks, two subtasks are envisioned: coarse-grained geolocation (subtask A) and fine-grained geolocation (subtask B). Subtask A is the simplest one from a technical point of view (predict a region for each post among the possible ones). Subtask B is instead more challenging (predict longitude and latitude coordinates), nevertheless it has the potential to uncover fine-grained linguistic variation, also overcoming the simplification of subtask A (language use lies on a continuum and may cross administrative borders).

Participants can decide to participate in one or more tracks and subtasks, and can specify their preference after registration.

Standard track

Subtask A: Coarse-grained geolocation

Given the text of a tweet exhibiting non-standard Italian language, predict its region of provenance. This is a classification task, i.e., a region of Italy needs to be predicted.

Evaluation
Systems will be evaluated using macro Precision, Recall, and F1 score on a subset of the regions of Italy (i.e., 13 known, 1<=k<=7 unknown during development), and ranked by macro F1 score (the higher the better).


Subtask B: Fine-grained geolocation

Given the text of a tweet exhibiting non-standard Italian language, predict its location in terms of longitude and latitude coordinates. This is a (double) regression task, i.e., a pair of real-valued numbers needs to be predicted.

Evaluation
Systems will be evaluated using mean distance in km of predicted coordinates from actual coordinates (the lower the better) on a subset of the regions of Italy (i.e., 13 known, 1<=k<=7 unknown during development).

Special track

The special track consists of the same subtasks and evaluation protocol as the standard track, but the focus will be on a subset of the data representing an area chosen by the participants (constrained to data availability and representativeness). This means that the training, development, and test sets will all represent that particular area, and that proposed solutions will be ranked separately for each area.

An area can be a region (e.g., Campania) or a set of regions (areas that are relevant in terms of linguistic variation). In the case of a single region, only subtask B will be possible. 

This track allows interested participants to make use local knowledge of variants, dialectal terms, and regional forms to study geolocation of linguistic variation and ultimately uncover little known linguistic patterns within specific areas.

Baseline methods

Baseline methods have been provided to participants along with training and development data:

You can find the evaluation scorer and baselines' scores in the GeoLingIt repository on GitHub.

Additional Information

Is there a limit to the number of runs (i.e., predictions on test data) that can be submitted?
In the evaluation phase, we will accept up to 3 runs for each track and subtask from each participant team. Different runs can reflect e.g., different solutions or different configurations of the same system. The format for the prediction file to submit is described in the GeoLingIt repository on GitHub.
For example, if you want to participate only in the Standard Track, but in both subtask A and B, you will be able to submit up to 6 runs, of which up to 3 for subtask A and up to 3 for subtask B. If you want to participate in both tracks and in all subtasks, you will be able to submit up to a total of 12 runs, of which up to 3 for each subtask.

Can I use other resources in addition to the training set provided by the task organizers?
Sure, and we encourage you to do so! Participants are allowed to use external resources in addition to (or in place of) the data provided by the organizers to train their models. Examples of allowed external resources are pre-trained models, dictionaries and lexicons, existing datasets, and newly annotated data (see a list of potentially useful resources and links). The only external source that you can not use is Twitter, since some tweets can be part of our test set. In case of doubt, you can write to us through the Google Group.