The Semi-Supervised Offensive Language
Identification Dataset (SOLID) contains over 9,000,000 tweets annotated following OLID's three-level taxonomy: A: Offensive Language Detection B: Categorization of Offensive Language C: Offensive Language Target Identification SOLID was the official English dataset used in the OffensEval 2020 shared task. If you used SOLID, please refer to this paper: @article{rosenthal2020large, title={A Large-Scale Semi-Supervised Dataset for Offensive Language Identification}, author={Rosenthal, Sara and Atanasova, Pepa and Karadzhov, Georgi and Zampieri, Marcos and Nakov, Preslav}, journal={arXiv preprint arXiv:2004.14454}, year={2020} } |