LREC 2022 Tutorial


Building Reliable Datasets for Aggressive and Hateful Language Identification: Theory, Taxonomies and Approaches

May 20, 2022

Marseille, Paris

Tutorial Outline

The tutorial is broadly divided into two parts - each of 90 minutes, with each containing three broad modules of roughly 30 minutes. A broad outline of the topics to be covered during the tutorial is given below -

PART 1 [90 minutes]

    1. Introduction - An overview of im/politeness and its definition

    2. Sociopragmatic Models of im/politeness

    3. Pragmalinguistic research involving (im)politeness and aggression: Ritual Frame Indicating Expressions Theory

      • Comparative Analysis of English and German

      • Comparative Analysis of English and Chinese

      • Comparative Analysis of English, Hindi and Bangla

PART 2 [90 minutes]

  1. Major annotation taxonomies in NLP

    • Offensive Language

    • Abusive Language

    • Hate Speech

    • Aggressive Language

  2. Annotating datasets for abusive language identification

    • Building datasets from scratch

    • Analysis to datasets and datasets to analysis

  3. Sociopragmatic models and Mapping Multiple Datasets

    • Mapping Aggressive and Offensive Language Datasets

    • Mapping Hate Speech and Abusive Language Datasets

For the second part, the audience will be given a small dataset to analyse, and the audience will also be encouraged to bring their own small dataset(s) for analysis.