Arabic Politeness Detection Shared Task
AdabEval 2026
Overview
The task investigates politeness in Arabic social‑media posts, a pragmatic phenomenon that affects how people perceive and respond to messages. Politeness plays a central role in conversation, languages provide diverse ways to encode respect and reduce imposition, and these markers are intimately tied to social power and interpersonal harmony. In Arabic the concept is deeply rooted in cultural norms; speakers employ honorifics, plural forms of address and kinship terms to show respect. Most computational work on politeness focuses on English, with only limited studies on other languages. For Arabic, there is no extensive public dataset available, and researchers often have to depend on small, limited datasets.
Participants will be given an annotated corpus of Arabic social‑media posts. Each record contains an identifier, raw text, a primary label (polite, neutral or impolite), and up to three category–keyword pairs. Categories capture pragmatic functions such as Criticism, Insult, Disparagement, Prayers, Greetings, Admiration, Respect, Felicitation and Hospitality & generosity. Keywords highlight words or phrases that motivated the annotation.
Politeness Classification (Subtask A).
This subtask focuses on building and evaluating models that automatically assess the politeness level of Arabic text. Given a text, participants must classify it into one of three categories: Polite, Neutral, or Impolite. Systems will be compared using accuracy as well as macro-averaged precision, recall, and F1-score on the test data.
Category Prediction (Subtask B).
This subtask evaluates the ability of systems to identify multiple pragmatic functions in Arabic social-media posts. Each text may express one or more categories from nine culturally grounded functions such as criticism, insult, respect, prayer, greeting, or hospitality. The task is framed as multi-label classification, and systems must predict all applicable categories for each instance. The official evaluation metric is macro-averaged F1-score across the nine categories.
The dataset will be divided into training, development and test splits. Participants may use the training data to train models and the development set to tune hyper‑parameters. The test set will be used for final evaluation via a leaderboard. System descriptions should report the methods used, including pre‑processing, features, architectures and external resources.
December 15, 2025: Release of training, dev and dev-test data, and evaluation scripts
January 21, 2026: Registration deadline and release of test data
January 28, 2026: End of evaluation cycle (test set submission closes)
February 04, 2026: Final results released
February 18, 2026: System description paper submissions due
March 12, 2026: Notification of acceptance
March 30, 2026: Camera-ready versions due
Hend Al-Khalifa, King Saud University
Noof Alfear, King Saud University
Reem Alqifari, King Saud University
Nadia Ghezaiel, University of Ha'il
Ameera Almasoud, King Saud University
Sharefah Al-Ghamdi, King Saud University
Hend Hamed, Saudi Center for Philosophy and Ethics
Maria Bounnit, Cadi Ayyad University