Detecting hateful content with AI is difficult -- and it’s even more difficult when the content is multimodal, such as a meme. Memes can be understood by humans because we do not think about the words and photos independently but, instead, combine the two together. In contrast, most AI systems analyze text and image separately and do not learn a joint representation. This is both inefficient and flawed, and such systems are likely to fail when a non-hateful image is combined with non-hateful text to produce content that is nonetheless still hateful. For AI to detect this sort of hate it must learn to understand content the way that people do: holistically.
To accelerate research on multimodal understanding and detection of hate speech, Facebook AI created the hateful memes challenge in 2020, and released a dataset containing 10,000+ annotated memes. We now present this dataset for the WOAH 5 Shared task with additional newly created fine-grained labels for the protected category that has been attacked (e.g., women, black people, immigrants) as well as the type of attack (e.g., inciting violence, dehumanizing, mocking the group).
Task A (multi-label): For each meme, detect the protected category. Protected categories are: race, disability, religion, nationality, sex. If the meme is not_hateful the protected category is: pc_empty.
Task B (multi-label): For each meme, detect the attack type. Attack types are: Attack types are: contempt, mocking, inferiority, slurs, exclusion, dehumanizing, inciting_violence. If the meme is not_hateful the protected category is: attack_empty.
Tasks A and B are multi-label because memes can contain attacks against multiple protected categories and can involve multiple attack types.
March 19th: Shared task data is available. Go to the competition page on DrivenData for the memes dataset (see detailed instructions below) and our Github page for the fine-grained annotations.
March 25th: MMF setup for getting started, with initial baselines and pre-trained models released
May 28th 23:59 (AOE): Predictions due
May 31st, 23:59 (AOE): Shared task paper submissions due
May 8th: Notifications
June 21st, 23:59 (AOE): Camera-ready papers due
August 5th - 6th: Workshop day!
Information about each meme is presented in JSON. Memes can be described as:
img: Relative path of the raw image (.png file)
text: Extracted text from the meme
set_name: Data partition, indicating training and development splits
pc: Protected category annotations, with annotations from up to 3 annotators
gold_pc: Gold standard labels for protected categories used in Task B, based on majority voting
attacks: Attack type annotations, with annotations from up to 3 annotators
gold_attack: Gold standard labels for attack types used in Task C, based on majority voting
gold_hate: Gold standard labels for hateful or not used in Task A
id: Unique identifier for each entry
The output file should have the following record structure, which simply adds an additional field to the record.
pred_hate: Dictionary of {label:score} for the hate classification task
pred_attack: Dictionary of {label:score} for the attack category task
pred_pc: Dictionary of {label:score} for the protected category task
set_name: The partition for which the predictions are being computed
id: Unique identifier of the record
Entries for all three tasks are evaluated using AUROC, implemented using the standard roc_auc metric provided in sklearn library. The evaluation scripts, example predictions, and how to use the scoring script will be made available shortly.
Join woah2021task@googlegroups.com. Your request should include the first name, last name, and affiliation of all team members.
Get the original hateful memes dataset here from DrivenData. To access the data:
Register for DrivenData
Find the competition: https://www.drivendata.org/competitions/64/hateful-memes/
Join the competition and e-sign the ‘Data Access Agreement’
Then download the data from the ‘Data Download’ option
You can access the fine-grained annotations from github.
You must submit your code with your predictions, and make it available open source.
You cannot hand label any of the entries or manually assign them scores.
You should treat the test set examples as independent.
Your system should predict protected category and attacks over the entire dataset, for non-hateful the model should be able to predict pc_empty and attack_empty
If you do not adhere to the spirit of the competition rules then your entry will be rejected.
Within this shared task, hate speech is defined as a direct attack against people on the basis of protected characteristics, such as race, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity, socio-economic status and serious disease. It includes violent or dehumanizing speech, harmful stereotypes, statements of inferiority, expressions of contempt, disgust or dismissal, cursing, and calls for exclusion or segregation.
Shaoliang Nie, Facebook AI
Aida Davani, University of Southern California
Lambert Mathias, Facebook
Douwe Kiela, Facebook
Zeerak Waseem, University of Sheffield
Bertie Vidgen, The Alan Turing Institute
Vinodkumar Prabhakaran, Google Research
To help clarify the categories for Task A and Task B we provide definitions and examples of all classes. The examples are synthetic.
In all cases, we have used a * to remove the first character in hateful terms (e.g., slurs). Readers are likely to find the statements offensive.