SHARED TASK

Data:

In this shared task you will use the data from:

  • Abu Farha, I. & Magdy, W. (2020). From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset. OSACT4 - LREC 2020 PDF

  • Abbes, I., Zaghouani, W., El-Hardlo, O. & Ashour, F. (2020). DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter. LREC 2020 PDF

The training data will be available through CODALAB on Jan 1, 2020.

For initial experimentation, participants can use the ArSarcasm dataset, which is publicly available here. Please be aware that this is not the final training data and extra data could be added later.

Tasks:

There are two subtasks in this shared task:

  • Subtask 1 (Sarcasm Detection): Identifying whether a tweet is sarcastic or not, this is a binary classification task.

  • Subtask 2 (Sentiment Analysis): Identifying the sentiment of a tweet and assigning one of three labels (Positive, Negative, Neutral), multiclass classification task.

Metrics:

  • Subtask 1: The evaluation metrics will include precision/recall/f-score/accuracy. F-score of the sarcastic class will be the official metric.

  • Subtask 2: The evaluation metrics will include precision/recall/f-score/accuracy. F-PN (Macro average of the F-score of the positive and negative classes) will be the official metric.

Participants need to register below. All participating teams will be provided with a common training data set. A blind test data set will be used to evaluate the output of the participating teams.

The shared task will be hosted through CODALAB.

CODALAB registration link Subtask 1: Subtask 1

CODALAB registration link for Subtask 2: Subtask 2

Resources:

The following resources might be helpful for participants:

  1. AraBERT

  2. Mazajak Word Embeddings