OffensEval 2020

This is the website of the second edition of OffensEval 2020: Multilingual Offensive Language Identification in Social Media organized at SemEval 2020 (Task 12).

The competition is now finished. For more information, please consult the OffensEval 2020 report. When referring to OffensEval 2020 please use the bib entry below. 

@inproceedings{zampieri-etal-2020-semeval,

    title = {{SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)}},

    author = {Zampieri, Marcos and Nakov, Preslav and Rosenthal, Sara and Atanasova, Pepa and Karadzhov, Georgi and Mubarak, Hamdy and Derczynski, Leon and Pitenis, Zeses and \c{C}\"{o}ltekin, \c{C}a\u{g}r{\i}},

    booktitle = {Proceedings of SemEval},

    year = {2020}

}

Description 

Offensive language is pervasive in social media. Individuals frequently take advantage of the perceived anonymity of computer-mediated communication, using this to engage in behavior that many of them would not consider in real life. Online communities, social media platforms, and technology companies have been investing heavily in ways to cope with offensive language to prevent abusive behavior in social media. One of the most effective strategies for tackling this problem is to use computational methods to identify offense, aggression, and hate speech in user-generated content (e.g. posts, comments, microblogs, etc.). 

This topic has attracted significant attention in recent years as evidenced in recent publications (Waseem et al. 2017; Davidson et al., 2017, Malmasi and Zampieri, 2018, Kumar et al. 2018) and workshops such as AWL and TRAC and competitions such as HatEval 2019 (Basile et al. 2019), HASOC 2019, and OffensEval 2019 (Zampieri et al. 2019).

The first OffensEval was organized at SemEval 2019. OffensEval 2019 used the Offensive Language Identification dataset (OLID) a dataset containing English tweets annotated using a hierarchical three-level annotation model described in this paper. Nearly 800 teams signed up to participate in OffensEval 2019. The competition received more than 100 submissions across three sub-tasks.The findings are described in the OffensEval 2019 report. The response received in 2019 by far exceeded our expectations and motivated us to organize OffensEval 2020. 

Data

OffensEval 2020 features a multilingual dataset with five languages. The languages included in OffensEval 2020 are:

The annotation follows the hierarchical tagset proposed in the Offensive Language Identification Dataset (OLID) and used in OffensEval 2019. In this taxonomy we break down offensive content into the following three sub-tasks taking the type and target of offensive content into account. The following sub-tasks will be organized:

Important Dates

Results

The test phase of OffensEval 2020 has come to an end. Thank you very for participating! 

The rankings are in the spreadsheets attached to this page. We are listing the usernames as they appear on CodaLab and the F1 score of your LAST submission. Please see the OffensEval 2020 report for more information.

Organizers

Marcos Zampieri - Rochester Institute of Technology, USA

Preslav Nakov - Qatar Computing Research Institute, Qatar

Sara Rosenthal - IBM Research, USA

Pepa Atanasova - University of Copenhagen, Denmark

Georgi Karadzhov - University of Cambridge, UK

Hamdy Mubarak - Qatar Computing Research Institute, Qatar

Leon Derczynski - IT University Copenhagen, Denmark

Zeses Pitenis - University of Wolverhampton, UK

Çağrı Çöltekin - University of Tübingen, Germany

References

Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Pardo, F.M.R., Rosso, P. and Sanguinetti, M., (2019) Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation (pp. 54-63).

Davidson, T., Warmsley, D., Macy, M. and Weber, I. (2017) Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of ICWSM.

Kumar, R., Ojha, A.K., Malmasi, S. and Zampieri, M. (2018) Benchmarking Aggression Identification in Social Media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC). pp. 1-11.

Malmasi, S., Zampieri, M. (2018) Challenges in Discriminating Profanity from Hate Speech. Journal of Experimental & Theoretical Artificial Intelligence. Volume 30, Issue 2, pp. 187-202. Taylor & Francis. 

Waseem, Z., Davidson, T., Warmsley, D. and Weber, I. (2017) Understanding Abuse: A Typology of Abusive Language Detection Subtasks. Proceedings of the Abusive Language Online Workshop.

Previous OffensEval

REPORT

Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N. and Kumar, R. (2019) SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation. pp. 75-86.

DATASET

Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N. and Kumar, R. (2019) Predicting the Type and Target of Offensive Posts in Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 1415-1420.