Learning with Disagreements


This is a SemEval-2021 shared task on learning to classify with datasets containing disagreements. The aim of this shared task is to provide a unified testing framework for learning from disagreements using the best-known datasets containing information about disagreements for interpreting language and classifying images.

Modern research in Cognitive Science and Artificial Intelligence (AI) is driven by the availability of large datasets annotated with human judgements. Most annotation projects assume that a single preferred interpretation exists for each item, but this assumption has been shown to be an idealisation at best, both in computational linguistics and in computer vision. Virtually all annotation projects for tasks such as anaphora resolution (Poesio et al. 2005, 2006), word sense disambiguation (Passonneau et al, 2006), POS tagging (Plank et al, 2014), sentiment analysis, image classification, natural language inference, and others, encounter numerous cases on which humans disagree.

Participants are invited to train models for several tasks by harnessing labels from a crowd of people. The Shared Task is hosted on CodaLab

Important Dates

  • Trial data ready: July 31, 2020

  • Training data ready: September 4, 2020

  • Test data ready: December 3, 2020

  • Evaluation start: January 10, 2021

  • Evaluation end: January 31, 2021

  • Paper submission due: February 23, 2021

  • Notification to authors: March 29, 2021

  • Camera ready due: April 5, 2021

  • SemEval workshop: Summer 2021


Alexandra Uma, Queen Mary University of London, United Kingdom

Anca Dumitrache, Talpa Network, Netherlands

Tommaso Fornaciari , Bocconi University, Italy

Tristan Miller, Austrian Research Institute for Artificial Intelligence

Edwin Simpson, University of Bristol, United Kingdom

Jon Chamberlain, University of Essex, Colchester, United Kingdom

Silviu Paun, Queen Mary University of London, United Kingdom

Barbara Plank, IT University of Copenhagen , Denmark

Massimo Poesio, Queen Mary University of London, United Kingdom

Join Our Google Group

Please consider joining our google group here


Alexandra Uma, Silviu Paun , Jon Chamberlain, and Massimo Poesio were supported by the DALI project, ERC Advanced Grant 2015-Adv-G to Massimo Poesio.


  • Anca Dumitrache, Lora Aroyo, and Chris Welty. 2019. A crowdsourced frame disambiguation corpus with ambiguity. In Proc. of NAACL

  • Kevin Gimpel, Nathan Schneider, Brendan O’Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. 2011. Part-of-speech tagging for twitter: Annotation, features, and experiments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 42–47, Portland, Oregon, USA. Association for Computational Linguistics.

  • Joshua C. Peterson, Ruairidh M. Battleday, Thomas L. Griffiths, and Olga Russakovsky. 2019. Human uncertainty makes classification more robust.2019 IEEE/CVF International Conference on Computer Vision (ICCV),pages 9616–9625

  • Slav Petrov, Dipanjan Das, and Ryan McDonald. 2011. A universal part-of-speech tagset. Computing Research Repository - CORR.

  • Barbara Plank, Dirk Hovy, and Anders Søgaard. 2014a. Learning part-of-speech taggers with inter-annotator agreement loss. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 742–751, Gothenburg, Sweden. Association for Computational Linguistics.

  • Massimo Poesio, Uwe Reyle, and Rosemary Stevenson. 2007. Justified slop-piness in anaphoric reference. In H. Bunt and R. Muskens, editors,Computing Meaning, volume 3, pages 11–34. Kluwer.

  • Massimo Poesio, Jon Chamberlain, and Udo Kruschwitz. 2017. Crowdsourcing. In N. Ide and J. Pustejovsky, editors, The Handbook of Linguistic Annotation, pages 277–295. Springer.

  • Filipe Rodrigues, Mariana Lourenco, Bernardete Ribeiro, and Francisco Pereira. 2017. Learning supervised topic models for classification and regression from crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP:1–1

  • Bryan Russell, Antonio Torralba, Kevin Murphy, and William Freeman.2008. Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77.