Shared Tasks

AMIYA (عامية) Shared Task: Arabic Modeling In Your Accent

The AMIYA shared task will offer a chance for researchers to demonstrate innovations and improvements in language modeling of dialectal Arabic.

❗Don't forget to register your team using the registration form. ❕

Overview 🧑‍🏫

In the discussion of dialects and related language varieties, a topic of utmost relevance is Arabic. Some regard Arabic as a single language with a vast diversity of dialects, and others regard it as a clade of distinct but related languages. Regardless, Arabic language varieties are spoken by over 400M people

It is commonly known that standard LLMs are more proficient in Modern Standard Arabic (MSA) than Dialectal or Colloquial Arabic (DA). Because Colloquial Arabic varieties have fewer computational resources than MSA and other high-resource languages, building LLMs that support DA has become a recent focus of the research community. We present the first shared task for Dialectal Arabic Language Modeling: Arabic Modeling In Your Accent (AMIYA).

In the task, we will ask participants to contribute LLMs trained or adapted for DA. These will be evaluated using the AL-QASIDA benchmark (Robinson et al., 2025), an evaluation suite that comprehensively measures an LLM’s dialectal fidelity, understanding, generation quality, and MSA-DA diglossia in DA.

Submission Tracks and Rules 💽

We are accepting submissions in three tracks: (1) closed data, (2) closed models, and (3) open. We are officially accepting submissions for Arabic varieties from the following countries:

Morocco
Egypt
Palestine
Syria
Saudi Arabia

Teams who wish to create systems for additional varieties may seek approval by contacting the task organizers. Below we detail the different submission tracks.

Closed data track

In this track, teams will be allowed to use any fully open-source LLMs as a basis for their Dialectal Arabic LLMs. However, they will only be allowed to fine-tune these LLMs with the train data that we will provide (by the end of November).

Closed models track

In this track, teams cannot use pre-trained LLMs and must train their LLMs from scratch. (Using the model config from an existing model with a random initialization is okay.) However, they will be able to use any data sources for training, in addition to the data we provide---with the exception of any off-limits datasets listed below.

Open track

In this track, teams may use any pre-trained models and data sources (again with the exception of any off-limits datasets) to develop their DA LLMs.

Off-limits datasets

Use of the following datasets for system training or development is not permitted, as they may be included in our evaluation:

Any eval data used for the PalmX shared task
Any FLORES devtest data
Any MADAR-26 data that is part of the corpus-6-test-corpus-26-test split

Please do not use the following datasets without first checking with the task organizers:

⚠️ Additionally, please use the datasets already used in AL-QASIDA by default only for dev / tuning and NOT for training. Note the AL-QASIDA repo has been updated to be used for dev (i.e. to avoid the off-limits datasets listed above).

Evaluation 🧐

To submit systems for evaluation, teams will be required to upload a model to HuggingFace and send the HuggingFace link to the task organizers.

We will evaluate submissions using the AL-QASIDA benchmark and will recognize teams that can maximize any of the following metrics:

ADI2 dialect fidelity score, on both monolingual, cross-lingual, and translation prompts (see Robinson et al., 2025)
chrF++ translation score, on both DA-to-English, English-to-DA, DA-to-MSA, and MSA-to-DA translation
Human scores for fluency and adherence to DA instructions

The baseline model for comparison will be Llama-3.1 (8B).

Task Information 📣

Team registration

To register as AMIYA shared task participants, please submit this form.

Important dates

30 November, 2025 2 December, 2025: Release of official training data and scaffold code for a minimal submission
15 December, 2025: Registration deadline, eval data finalized
10 January, 2025: System submission deadline
20 January, 2025: System description paper submission deadline
TBD: Camera-ready paper due
24-29 March, 2026: VarDial workshop at EACL in Rabat, Morocco

Task organizers

Nathaniel R. Robinson (Johns Hopkins University)
Shahd Abdelmoneim (Cohere Labs Community)
Kelly Marchisio (Cohere)
Anjali Kantharuban (Carnegie Mellon University)
Kenton Murray (Johns Hopkins University)
Salima Lamsiyah (University of Luxembourg)
Otba Alsboul (Independent Researcher)

Contact

nrobin38@jhu.edu

Resources for Participants 🩼

Teams are encouraged to use whatever means necessary to improve model performance, including any of the following:

Prompt engineering
Fine-tuning
Training from scratch

Because prompt engineering is the lowest-cost strategy, we will provide a code scaffold implementing a minimally acceptable submission once tailored to a specific DA variety, which will focus on prompt engineering techniques to improve dialectal modeling. We expect, however, that model training or fine-tuning will lead to more competitive submissions.

[Coming soon: scaffold code, tutoring resources]

Page updated

Google Sites

Report abuse