at WAT2021, an ACL-IJCNLP 2021 Workshop

Task Description

The target of the task is to improve translate performance with another modality (images) along with input sentences:

Task 1-(a): Multimodal English to Arabic translation (Constrained).
Task 1-(b): Multimodal English to Arabic translation (Unconstrained).
Task 2-(a): Multimodal Arabic to English translation (Constrained).
Task 2-(b): Multimodal Arabic to English translation (Unconstrained).

In the constrained versions (a), external resources such as additional datasets and pre-trained models with external data, are not allowed to be used except for preprocessing and visualization purposes.

Dataset

We use the Multi30K dataset for this task. It is an extension of the Flicker30K dataset that provides 31,014 images along with five descriptions for each image in English. One of the five English descriptions was professionally translated by human experts to German, French and Czech. For this task, we have extended the Multi30K by providing an additional translation in Arabic. The target of this task is to focus on the translation between English and Arabic. However, participating teams are allowed and encouraged to benefit from the multi-modal and multi-lingual nature of the dataset, even in the constrained versions.

Fig. 1 - Multilingual examples in the Multi30K dataset.

Training and Validation

We will provide the training and validation of the Multi30K dataset, which includes the raw text of (English, German, French, Czech, and Arabic) languages along with the images and their IDs for training and validation.

To download the images, visit the page of Flickr30k. Images and raw English captions are also available at Kaggle. To obtain the full raw training and validation datasets that contains the all 5 languages raw text along with the image IDs, visit our Zenodo page.

ArEnMulti30K

at WAT2021, an ACL-IJCNLP 2021 Workshop

Task Description

Dataset

Training and Validation

Schedule, Submission, and Evaluation