MADAR Shared Task

Arabic Fine-Grained Dialect Identification

WANLP 2019: The Fourth Workshop for Arabic Natural Language Processing

Introduction

Arabic dialect identification is the task of automatically labeling a segment of speech or text with the dialect it comes from. Most of previous work and shared tasks on dialect identification focused on regional level dialect labeling as in efforts by Zaidan and Callison-Burch (2013), Elfardy and Diab (2013), and the VarDial ADI evaluation campaign. This shared task will be the first to target a large set of dialect labels at the city and country levels. The data for the shared task is created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project.

ShAred Task

There are two subtasks in this shared task.

Subtask 1: MADAR Travel Domain Dialect Identification. The data of this subtask is the same reported on in the following papers.

    • Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., et al. (2018). The MADAR Arabic Dialect Corpus and Lexicon. In Proceedings of the 11th International Conference on Language Resources and Evaluation. (PDF: http://www.lrec-conf.org/proceedings/lrec2018/pdf/351.pdf)
    • Salameh, M., Bouamor, H. & Habash, N. (2018). Fine-Grained Arabic Dialect Identification. In Proceedings of the 27th International Conference on Computational Linguistics. (PDF: http://aclweb.org/anthology/C18-1113)

Subtask 2: MADAR Twitter User Dialect Identification. This is a new data set created for this shared task.

Metrics: The evaluation metrics will include precision/recall/f-score/accuracy in addition to a new hierarchical evaluation metric designed for Arabic dialects. Macro Averaged F-score will be the official metric.

Participants need to register below. All participating teams will be provided with a common training data set and a common development set. No external manually labelled data sets are allowed. A blind test data set will be used to evaluate the output of the participating teams. An evaluation script will be also provided to all the teams. All teams are required to report on the development and test set in their writeups.

The shared task will be hosted through CODALAB.

CODALAB link for MADAR Shared Task Subtask 1: https://competitions.codalab.org/competitions/22476

CODALAB link for MADAR Shared Task Subtask 2: https://competitions.codalab.org/competitions/22475

Important dates

December 10, 2018: First announcement of the shared task

January 7, 2019: Announcement of shared task website and beginning of registration

January 28, 2019: Release of initial training data and scoring script

March 18, 2019: Final training data release

April 29, 2019: Registration deadline

May 6, 2019: Test set made available

May 17, 2019: Codalab shared task submission deadline

May 17, 2019: Required task Description submission deadline.

May 27, 2019: Shared task system paper submissions due

May 30, 2019: Notification of acceptance

June 5, 2019: Camera-ready version of shared task system papers due

August 1, 2019: ACL 2019 Workshop in Florence

Shared Task Paper submission

Please check our notes how to write a shared task system description paper.

Contact

For any questions related to this task, please post to this google group, or contact the organizers directly using the following email address: madar.shared.task@gmail.com