Arabic Hate Speech 2022 Shared Task!

Fine-Grained Hate Speech Detection on Arabic Twitter

@ OSACT 2022 Workshop, LREC 2022

Marseille, France, 20th June 2022

Recent Updates

2 August 2022: Gold labels for test data for Subtasks A, B, and C are available.
9 April 2022: Paper submission deadline is extended to April 13 Anywhere On Earth (AOE).
3 April 2022: Results are sent to participants (separately).
3 April 2020: Details about the dataset are announced: https://arxiv.org/pdf/2201.06723.pdf
30 March 2022: Competitions end.
26 March 2022: Test sets for Subtasks A, B and C are available.
13 March 2022: CodaLab competition for Subtask C (https://codalab.lisn.upsaclay.fr/competitions/2334) is created.
2 March 2022: CodaLab competition for Subtask B (https://codalab.lisn.upsaclay.fr/competitions/2332) is created.
1 March 2022: CodaLab competition for Subtask A (https://codalab.lisn.upsaclay.fr/competitions/2324) is created.
6 Feb 2022: Training data and development data are released.
24 Jan 2022: Discussion group (https://groups.google.com/g/arabichatespeech2022) is created.
20 Jan 2022: Website is up!

Discussion Group

Please join our Google discussion group (https://groups.google.com/g/arabichatespeech2022) to receive announcements and participate in discussions.

Task Overview

Fine-Grained Hate Speech Detection on Arabic Twitter

Disclaimer: Some examples have offensive language and hate speech!

Detecting offensive language and hate speech has gained increasing interest from researchers in NLP and computational social sciences communities in the past few years. For example, at the ACL 2021 main conference, there were 3 papers about offensive language and 10 papers about hate speech (https://aclanthology.org/events/acl-2021/). Additionally, there was a dedicated workshop on online abuse and harm with a shared task on hateful memes (https://www.workshopononlineabuse.com/home#h.czplpodomjyq).

Detecting offensive language and hate speech is very important for online safety, content moderation, etc. Studies show that the presence of hate speech may be connected to hate crimes (Hate Speech Watch, 2014).

Given the success of the shared task on Arabic offensive language detection at OSACT 2020 (https://edinburghnlp.inf.ed.ac.uk/workshops/OSACT4/), we decided to continue our effort to enhance the detection of offensive language and hate speech on Arabic Twitter. We share with the research community the largest annotated Arabic tweets without being biased towards specific topics, genres, or dialects. Each tweet is judged by 3 annotators using crowdsourcing for offensiveness. Offensive tweets were classified into one of the hate speech types: Race, Religion, Ideology, Disability, Social Class, and Gender. Also, annotators judged whether a tweet has vulgar language or violence.

Hate speech is defined as any kind of offensive language (insults, slurs, threats, encouraging violence, impolite language, etc.) that targets a person or a group of people based on common characteristics such as race/ethnicity/nationality, religion/belief, ideology, disability/disease, social class, gender, etc.

Hate Speech types in our dataset are:

HS1 (race/ethnicity/nationality). خطاب كراهية ضد العِرْق أو دولة معينة أو جنسية الدولة، مثلا: يا زنجي، في بلدك النساء عاهرات

HS2 (religion/belief). خطاب كراهية ضد الدين والمذهب، مثلا: دينك القذر

HS3 (ideology). خطاب كراهية ضد الانتماء الحزبي أو الفكري أو الرياضي، مثلا: أبوك شيوعي، ناديك الحقير

HS4 (disability/disease). خطاب كراهية ضد الإعاقة الجسدية، مثلا: يا معوق، يا قزم

HS5 (social class). خطاب كراهية ضد وظيفة أو جزء من المجتمع، مثلا: يا بواب، البدو متخلفين، أصله فلاح لا يفهم

HS6 (gender). خطاب كراهية ضد النوع (ذكر/أنثى)، مثلا: الله يأخذ الرجال، النساء صاروا بلا حياء

The corpus contains ~13K tweets in total: 35% are offensive and 11% are hate speech. Vulgar and violent tweets represent 1.5% and 0.7% of the whole corpus.

Percentages of offensive language and hate speech in the corpus are the highest among other corpora without using pre-specified keywords or selecting a specific domain.

After the competitions ended, details about the dataset were announced at: https://arxiv.org/pdf/2201.06723.pdf

We will have 3 shared subtasks:

Subtask A: Detect whether a tweet is offensive or not.

Labels for this task are: OFF (Offensive) or NOT_OFF (Not Offensive)

Example: الله يلعنه على هالسؤال (May God curse him for this question! )

Subtask B: Detect whether a tweet has hate speech or not.

Labels are: HS (Hate Speech) or NOT_HS (Not Hate Speech).

Subtask B is more challenging than Subtask A as 11% only of the tweets are labeled as hate speech.

Example: أنتم شعب متخلف (You are a retarded people)

Subtask C: Detect the fine-grained type of hate speech.

Labels are: HS1 (Race), HS2 (Religion), HS3 (Ideology), HS4 (Disability), HS5 (Social Class), and HS6 (Gender).

A tweet takes only one label for hate speech type based on the majority voting of the 3 annotators. In case there is no majority label, the final label was determined by a domain expert.

The same evaluation platform (Codalab) used in OSACT 2020 shared task will be used in the shared tasks.

Data will be split into 70% for training, 10% for development, and 20% for testing.

We encourage participants to use this data and/or any other external data (previous datasets, lexicons, in-house data, etc.) and try to explain model behavior and study model generalization.

License:

The data is made public for research purposes only (non-commercial use).

Important Dates

All times are Anywhere On Earth (AOE).

~~6 February 2022:~~ ~~Train/dev set release~~
~~26-29 March 2022:~~ ~~Runs submission (Test set available)~~
~~31 March 2022:~~ ~~Announcing runs results~~
~~10 April 2022:~~ 13 April 2022: Shared-task paper submission deadline
1 May 2022: Notification of acceptance
25 May 2022: Camera-ready submission of manuscripts
20 June 2022: Workshop in Marseille, France.

Dataset

Note: User mentions are replaced with @USER, URLs are replaced with URL, and empty lines in original tweets are replaced with <LF>.

Multiple user mentions (ex: @abc @xyz) are replaced with a single @USER.

Format:

The data is retrieved from Twitter and distributed in a tab-separated format as follows:

id \t tweet_text \t OFF_label (OFF/NOT_OFF) \t HS_label (HS1, HS2..HS6) \t Vulgar_label (VLG/NOT_VLG) \t Violence_label (VIO_NOT_VIO) \n

Ex: 1234 \t @USER من زمان ونحن ندعس بلدكم دعس \t OFF \t HS1 \t NOT_VLG \t NOT_VIO \n

Download training data from: https://alt.qcri.org/resources/OSACT2022/OSACT2022-sharedTask-train.txt

Download development data from: https://alt.qcri.org/resources/OSACT2022/OSACT2022-sharedTask-dev.txt

Download test data from: https://alt.qcri.org/resources/OSACT2022/OSACT2022-sharedTask-test-tweets.txt

After the competition ended, now you can download test gold labels from the following links:

Subtask A: https://alt.qcri.org/resources/OSACT2022/OSACT2022-sharedTask-test-taskA-gold-labels.txt

Subtask B: https://alt.qcri.org/resources/OSACT2022/OSACT2022-sharedTask-test-taskB-gold-labels.txt

Subtask C: https://alt.qcri.org/resources/OSACT2022/OSACT2022-sharedTask-test-taskC-gold-labels.txt

Subtask A - CODALAB link: https://codalab.lisn.upsaclay.fr/competitions/2324

Subtask B - CODALAB link: https://codalab.lisn.upsaclay.fr/competitions/2332

Subtask C - CODALAB link: https://codalab.lisn.upsaclay.fr/competitions/2334

Submission of System Results:

Evaluation Criteria:

Classification systems will be evaluated using the macro-averaged F1-score for all subtasks.

Submission Format:

Classifications of test and dev datasets (labels only) should be submitted as separate files in the following format with a label for each corresponding tweet (i.e. the label in line x in the submission file corresponds to the tweet in line x in the test file) :

For Subtask A:

OFF (or NOT_OFF)\n

For Subtask B:

HS (or NOT_HS)\n

For Subtask C:

<Hate Speech Type, ex: HS1>\n #One of HS types: HS1 (Race), HS2 (Religion), HS3 (Ideology), HS4 (Disability), HS5 (Social Class), and HS6 (Gender).

Participants can submit up to two system results (primary submission for best result, and a secondary submission for the 2nd best result).

Official results will consider primary submissions for ranking different teams, and results of secondary submissions will be reported for guidance. All participants are required to report on the development and test sets in their papers.

Submission filename should be in the following format:

ParticipantName_Subtask<A/B/C>_<test/dev>_<1/2>.zip (a plain .txt file inside each .zip file)

Ex: QCRI_SubtaskA_test_1.zip (the best results for Subtask A for test dataset from QCRI team)

Ex: KSU_SubtaskB_dev_2.zip (the 2nd best results for Subtask B for dev dataset from KSU team)

Ex: AUB_SubtaskC_test_1.zip (the best results for Subtask C for test dataset from AUB team)

Paper Submission

Please submit your paper using START system

Content Guidelines

Along with the paper, it is mandatory to submit the code that generated the submitted runs. Instructions on how to submit the code will be communicated later.
The paper title should follow the following template: <TeamID> at Arabic Hate Speech 2022: <Title>. An example title is "QCRI at Arabic Hate Speech 2022: Detect Hate Speech Using SVM and Transformers".
The paper should cover (among other standard sections) the following:
- Approach: system overview, models, training, external resources, etc. Please make sure that you give enough details about your approach.
- Experimental Evaluation: results on the dev set, official results on the test set, analysis/discussion of the results, etc. You can also report and analyze the results of other runs that you didn’t officially submit.
For ease of approach reproducibility (and faster learning by others), you are strongly encouraged to release your code and make it publicly available. If so, please indicate that in your paper and provide a public link.

Formatting Guidelines

Papers must use LREC template and comply to the LREC stylesheet.
Papers must be written in English.
Papers must be 4 to 8 pages long, excluding references and appendices (the number of appendix pages should remain reasonable though).
All papers will be peer reviewed by at least two independent reviewers. The review is single-blind. So, the submissions are NOT anonymous, i.e., you need to add the author names on the first page of the submitted paper.
Papers must be submitted electronically in PDF format.
When submitting a paper at the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper.
All authors are encouraged to share the described LRs (i.e., language resources, such as data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

Organizers

Hamdy Mubarak, Qatar Computing Research Institute (hmubarak@hbku.edu.qa)

Questions?

Contact hmubarak@hbku.edu.qa to get more information on the task