The dataset has been compiled from publicly available corpora. This means that it is necessary to impose certain restrictions to the participants’ solutions such that they do not use data from the test partitions as part of their training:
1) Publicly available pretrained models from the literature can be used. However, participants are only allowed to use image and text derived from the training data. That is, data augmentation, further self-supervised pre-training, or other techniques that involve the usage of additional texts and images must be done only with texts and images derived from the training data.
2) The usage of knowledge bases, lexicons and other structured data resources is also allowed.
3) Usage of data from one subtask in the other subtask is not allowed.
Contestants can participate in subtask, and they are allowed to submit, at most, three runs (3️⃣) per subtask.