Toxic Spans Detection

Toxic Spans Detection is a task at SemEval 2021 (Task 5).

Task description

The Toxic Spans Detection task concerns the evaluation of systems that detect the spans that make a text toxic, when detecting such spans is possible. Moderation is crucial to promoting healthy online discussions. Although several toxicity (a.k.a. abusive language) detection datasets (Wulczyn et al., 2017; Borkan et al., 2019) and models (Schmidt and Wiegand, 2017; Pavlopoulos et al., 2017b; Zampieri et al., 2019) have been released, most of them classify whole comments or documents, and do not identify the spans that make a text toxic. But highlighting such toxic spans can assist human moderators (e.g., news portals moderators) who often deal with lengthy comments, and who prefer attribution instead of just a system-generated unexplained toxicity score per post. The evaluation of systems that could accurately locate toxic spans within a text is thus a crucial step towards successful semi-automated moderation.


Please, press the following button to visit our Codalab page and learn more details about the data and important dates.

Important Dates

  • Trial data ready: July 31, 2020

  • Training data ready: October 1, 2020 [extended]

  • Test data ready: December 3, 2020

  • Evaluation start (test data released): January 10, 2021 (AoE)

  • Evaluation end: January 31, 2021 (13h00 CET)

  • Paper submission due: February 23, 2021

  • Notification to authors: March 29, 2021

  • Camera ready due: April 5, 2021

  • SemEval workshop: Summer 2021


John Pavlopoulos, Stockholm University, Sweden.

Ion Androutsopoulos, Athens University of Economics and Business, Greece.

Jeffrey Sorensen, Google, USA.

Léo Laugier, Institut Polytechnique de Paris, France.


D. Borkan, L. Dixon, J. Sorensen, N. Thain, andL. Vasserman. 2019. Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification. In WWW, pages 491–500, San Francisco, USA.

S. Feng, E. Wallace, A. Grissom II, M. Iyyer, P. Rodriguez, and J. Boyd-Graber. 2018. Pathologies of Neural Models make Interpretations Difficult. In EMNLP, pages 3719–3728, Brussels, Belgium.

J. Li, W. Monroe, and D. Jurafsky. 2016. Understanding Neural Networks Through Representation Erasure. In arXiv preprint arXiv:1612.08220.

J. Pavlopoulos, P. Malakasiotis, and I. Androutsopoulos. 2017a. Deep learning for user comment moderation. In ALW, pages 25–35, Vancouver, Canada.

J. Pavlopoulos, P. Malakasiotis, and I. Androutsopoulos. 2017b. Deeper Attention to Abusive User Content Moderation. In EMNLP, pages 1125–1135, Copenghagen, Denmark.

J. Pavlopoulos, N. Thain, L. Dixon, and I. Androutsopoulos. 2019. ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT. In SemEval, Minneapolis, USA.

M. T. Ribeiro, S. Singh, and C. Guestrin. 2016. ”Why should I trust you?” Explaining the Predictions of any Classifier. In SIGKDD, pages 1135–1144, San Francisco, USA.

A. Schmidt and M. Wiegand. 2017. A Survey on Hate Speech Detection using Natural Language Processing. In the Workshop on Natural Language Processing for Social Media, pages 1–10, Valencia, Spain.

E. Wulczyn, N. Thain, and L. Dixon. 2017. ExMachina: Personal Attacks Seen at Scale. In WWW, pages 1391–1399, Perth, Australia.

M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar. 2019. Semeval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In SemEval, Minneapolis, USA.