MADAR Shared Task
Arabic Fine-Grained Dialect Identification
WANLP 2019: The Fourth Workshop for Arabic Natural Language Processing
WANLP 2019: The Fourth Workshop for Arabic Natural Language Processing
Arabic dialect identification is the task of automatically labeling a segment of speech or text with the dialect it comes from. Most of previous work and shared tasks on dialect identification focused on regional level dialect labeling as in efforts by Zaidan and Callison-Burch (2013), Elfardy and Diab (2013), and the VarDial ADI evaluation campaign. This shared task will be the first to target a large set of dialect labels at the city and country levels. The data for the shared task is created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project.
There are two subtasks in this shared task.
Subtask 1: MADAR Travel Domain Dialect Identification. The data of this subtask is the same reported on in the following papers.
Subtask 2: MADAR Twitter User Dialect Identification. This is a new data set created for this shared task.
Metrics: The evaluation metrics will include precision/recall/f-score/accuracy in addition to a new hierarchical evaluation metric designed for Arabic dialects. Macro Averaged F-score will be the official metric.
Participants need to register below. All participating teams will be provided with a common training data set and a common development set. No external manually labelled data sets are allowed. A blind test data set will be used to evaluate the output of the participating teams. An evaluation script will be also provided to all the teams. All teams are required to report on the development and test set in their writeups.
The shared task will be hosted through CODALAB.
CODALAB link for MADAR Shared Task Subtask 1: https://competitions.codalab.org/competitions/22476
CODALAB link for MADAR Shared Task Subtask 2: https://competitions.codalab.org/competitions/22475
December 10, 2018: First announcement of the shared task
January 7, 2019: Announcement of shared task website and beginning of registration
January 28, 2019: Release of initial training data and scoring script
March 18, 2019: Final training data release
April 29, 2019: Registration deadline
May 6, 2019: Test set made available
May 17, 2019: Codalab shared task submission deadline
May 17, 2019: Required task Description submission deadline.
May 27, 2019: Shared task system paper submissions due
May 30, 2019: Notification of acceptance
June 5, 2019: Camera-ready version of shared task system papers due
August 1, 2019: ACL 2019 Workshop in Florence
Please check our notes how to write a shared task system description paper.
For any questions related to this task, please post to this google group, or contact the organizers directly using the following email address: madar.shared.task@gmail.com