LREC 2022 Tutorial
on
Building Reliable Datasets for Aggressive and Hateful Language Identification: Theory, Taxonomies and Approaches
May 20, 2022
Marseille, Paris
Tutorial Outline
The tutorial is broadly divided into two parts - each of 90 minutes, with each containing three broad modules of roughly 30 minutes. A broad outline of the topics to be covered during the tutorial is given below -
PART 1 [90 minutes]
Introduction - An overview of im/politeness and its definition
Sociopragmatic Models of im/politeness
Pragmalinguistic research involving (im)politeness and aggression: Ritual Frame Indicating Expressions Theory
Comparative Analysis of English and German
Comparative Analysis of English and Chinese
Comparative Analysis of English, Hindi and Bangla
PART 2 [90 minutes]
Major annotation taxonomies in NLP
Offensive Language
Abusive Language
Hate Speech
Aggressive Language
Annotating datasets for abusive language identification
Building datasets from scratch
Analysis to datasets and datasets to analysis
Sociopragmatic models and Mapping Multiple Datasets
Mapping Aggressive and Offensive Language Datasets
Mapping Hate Speech and Abusive Language Datasets
For the second part, the audience will be given a small dataset to analyse, and the audience will also be encouraged to bring their own small dataset(s) for analysis.