Task constraints

The dataset has been compiled from publicly available corpora. This means that it's necessary to impose certain restrictions to the participants’ solutions such that they do not use data from the test partitions as part of their training:

1) Publicly available pretrained models from the literature can be used. However, you are only allowed to use text derived from the training data. That is, data augmentation, further self-supervised pre-training, or other techniques that involve the usage of additional text must be done only with text derived from the training data.

2) The usage of knowledge bases, lexicons and other structured data resources is also allowed.

3) Usage of data from one subtask in the other subtask is not allowed.

Contestants can participate in any set of tasks for both languages.

Three submissions per task and language are allowed.

Page updated

Google Sites

Report abuse