ArMI 2021

a subtrack of HASOC @FIRE2021

The first Arabic Misogyny Identification shared task.

Data Description

The participants will be provided with a Twitter dataset composed of (7,866) tweets written in Modern Standard Arabic (MSA) and several Arabic dialects including: Gulf, Egyptian and Levantine. The Levantine tweets were derived from Let-Mi dataset (see Mulki and Ghanem, 2021) which is the first Arabic dataset for misogynistic language. The multi-dialectal tweets, however, were collected based on anti-women specific hashtags, queries and misogynists' timelines within the Arabic Twitter sphere. All the tweets were collected during the period (January 2019 - January 2021) and manually annotated by Arabic-native speakers.

Evaluation Metrics

Regarding the Misogyny Content Identification task, the performance of the submitted approaches will be evaluated by accuracy. The submitted runs of the Misogyny Behavior Identification task will be evaluated using the macro-averaged measures (precision, recall and F1-score); the final rank of the systems will be sorted by the macro F1-score.

Data

The provided data will be in the following format:

tweet_id text misogyny category

7859 مستخدم@ حيوانة كيف وصلتي لمذيعة اخبار بغير العهر misogyny discredit


Registered participants will receive a copy of the data by their emails.

Edit:
The shared task has been released at: https://github.com/bilalghanem/armi