OLID

The Offensive Language Identification Dataset (OLID) contains a collection of 14,200 annotated English tweets using an annotation model that encompasses following three levels:
 
  • A: Offensive Language Detection
  • B: Categorization of Offensive Language
  • C: Offensive Language Target Identification
 

OLID has been in students projects in different universities. To the best of our knowledge, so far it has been used by students at The University of Arizona (USA), Imperial College London (UK), and University of Leeds (UK) Some of the student system papers are available here.
 
Download OLID

The complete dataset OLID v1.0 dataset (train, test, and gold labels) is available in the link below.
 
More information about OLID can be found in the NAACL 2019 paper
 
If you used OLID, please refer to this paper:
 
@inproceedings{zampierietal2019, 
    title={{Predicting the Type and Target of Offensive Posts in Social Media}}, 
    author={Zampieri, Marcos and Malmasi, Shervin and Nakov, Preslav and Rosenthal, Sara and Farra, Noura and Kumar, Ritesh}, 
    booktitle={Proceedings of NAACL}, 
    year={2019}

ċ
OLIDv1.0.zip
(854k)
Marcos Zampieri,
4 Aug 2019, 04:40