Crowdsourcing and Games with a Purpose
Crowdsourcing - using workers contacted via the Web to label data - has become the de facto standard for small and medium scale annotation in CL ever since the Snow et al (2008) paper (Poesio et al, 2017). In our research, as well, we have been using crowdsourcing systematically, in particular for summarization (first to prepare the 2009 Arabic Summarization data for MULTILING, more recently in the SENSEI project) and text classification (in our KTP with Minority Rights Group). But we have also been developers of crowdsourcing technology, in two areas in particular: using Games-With-A-Purpose (Poesio et al, 2013) to collect data, and analyzing data collected using crowdsourcing using Bayesian models (Paun et al, 2018a, 2018b).
Phrase Detectives
Phrase Detectives (Poesio et al, 2008; Chamberlain et al, 2008; Poesio et al, 2013; Poesio et al, 2017; Poesio et al, 2019) is a Game-With-a-Purpose developed to annotate anaphoric information. It is one of the most successful GWAPs for Computational Linguistics, having collected over the years more than 4 million judgments. The second release of the dataset, consisting of 542 completely annotated English documents half from Wikipedia and half fiction from Project Gutenberg, for a total of slightly over 408,000 tokens and 2.5 million judgments, was released in 2019 (Poesio et al, 2019).
Other GWAPs
Three other GWAPs have been developed so far as part of the DALI project:
- TileAttack!, a GWAPs for mention identification (Madge et al, 2017, Madge et al, 2019a, Madge et al 2019d);
- WordClicker, a GWAP for POS tagging (Madge et al, 2019b, 2019c)
- Wormingo, a new GWAP for anaphoric annotation (Kicikoglu et al, 2019)
Analysing and aggregating crowdsourced data
The data collected using crowdsourcing tend to be very noisy; some method is required to identify unreliable workers and assign a reliability to labels. Bayesian models of annotation (Dawid and Skene, 1979; Carpenter, 2008; Hovy et al, 2013; Passonneau and Carpenter, 2014; Paun et al, 2018a) have proven much more effective than majority voting in assessing the reliability of the label and are becoming the new standard. In our research, we have used such methods to assess the reliability of labels obtained with a variety of methods, and to identify the most reliable workers. One of the main contribution of the DALI project is the development of a Bayesian annotation model for anaphoric information, Mention Pair Annotation (MPA) (Paun et al, 2018b). We also organized an EMNLP 2019 workshop on aggregating non-standard labels.
Using crowdsourcing for summary evaluation
As part of the ongoing SENSEI project, we organized the Online Forums Summarization Task of MULTILING-2015 in which system summaries were evaluated using crowdsourcing (Kabadjov et al, submitted).
Teaching
- Paris, 2014: Crowdsourcing (Anaphoric) Annotation
- Gunne, 2011: Human Computation
- LREC 2010: Assessing Reliability in Annotation (with Bob Carpenter)
Projects (in inverse chronological order)
- The DALI project, funded by ERC (2016-2021), is concerned with using the data collected through Games-With-A-Purpose to study anaphora and disagreements in anaphora.
- SENSEI , funded by the EU (2013-2016). This ongoing project is concerned with the use of discourse to summarize spoken and online conversations such as those in online forums.
- AnaWiki (2007-2009, funded by EPSRC) was the project in which Phrase Detectives was developed.
Workshops (in inverse chronological order)
- The AnnoNLP workshop at EMNLP 2019.
Main publications (in inverse chronological order)
- Madge, Christopher, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Silviu Paun and Massimo Poesio, 2019d. Progression in a language annotation game with a purpose. Proc. of HCOMP.
- Madge, Christopher, Richard Bartle, Jon Chamberlain, Udo Kruschwitz, and Massimo Poesio, 2019c. Incremental game mechanics applied to text annotation. Proc. of CHIPLAY.
- Kicikoglu, Doruk, Richard Bartle, Jon Chamberlain, and Massimo Poesio, 2019. Wormingo: A ‘True Gamification’ Approach to Anaphoric Annotation. Proc. of GAMNLP.
- Madge, Christopher, Richard Bartle, Jon Chamberlain, Udo Kruschwitz, and Massimo Poesio, 2019b. Making Text Annotation Fun with a Clicker Game. Proc. of GAMNLP.
- Madge, Christopher, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Silviu Paun and Massimo Poesio, 2019a. Crowdsourcing and Aggregating Nested Markable Annotations. Proc. of ACL.
- Poesio, Massimo, Jon Chamberlain, Silviu Paun, Juntao Yu, Alexandra Uma and Udo Kruschwitz, 2019. A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation, Proc. of NAACL.
- Paun, Silviu, Jon Chamberlain, Udo Kruschwitz, Juntao Yu and Massimo Poesio, 2018b. A probabilistic annotation model for crowdsourcing coreference. Proc. of EMNLP.
- Paun, Silviu, Bob Carpenter, Jon Chamberlain, Dirk Hovy, Udo Kruschwitz, and Massimo Poesio, 2018a. Comparing Bayesian Models of annotation. Transactions of the ACL, v. 6.
- Madge, Christopher, Jon Chamberlain, Udo Kruschwitz and Massimo Poesio, 2017. Experiment-driven development of a GWAP for marking segments in text. Proc. of CHIPLAY. DOI: 10.1145/3130859.3131332
- Poesio, Massimo, Jon Chamberlain, and Udo Kruschwitz, 2017. Case Study: Phrase Detectives. In N. Ide and J. Pustejovsky (eds.), Handbook of Annotation. Springer (pdf)
- Poesio, Massimo, Jon Chamberlain, and Udo Kruschwitz, 2017. Crowdsourcing. In N. Ide and J. Pustejovsky (eds.), Handbook of Annotation. Springer (pdf)
- Poesio, Massimo, Jon Chamberlain, Udo Kruschwitz, Livio Robaldo and Luca Ducceschi, 2013. Phrase Detectives: Utilizing Collective Intelligence for Internet-Scale Language Resource Creation. ACM Transactions on Intelligent Interactive Systems, 3(1). (pdf)
- Chamberlain, Jon, Karen Fort, Udo Kruschwitz, Mathieu Lafourcade and Massimo Poesio, 2013. Using games to create linguistic resources. In I. Gurevytch et al (eds.), The People's Web Meets NLP. Springer
- Chamberlain, J. and Kruschwitz, U. and Poesio, M., 2013. Methods for Engaging and Evaluating Users of Human Computation Systems. In P. Michelucci (eds.), Handbook of Human Computation. Springer
- Poesio, Massimo, Nils Diewald, Maik Stuehrenberg, Jon Chamberlain, Daniel Jettka, Daniela Goecke and Udo Kruschwitz, 2011. Markup infrastructure for the Anaphoric Bank, part I: Supporting web collaboration. In A. Mehler, K.-U. Kuehnberger, H. Lobin, H. Luengen, A. Storrer, and A. Witt, editors, Modelling, Learning and Processing of Text Technological Data Structures, Dordrecht, Springer.
- Chamberlain, Jon , Udo Kruschwitz and Massimo Poesio, 2009. Constructing an anaphorically annotated corpus with non-experts: assessing the quality of collaborative annotations. Proc. of the ACL Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resource, Singapore.
- Chamberlain, Jon, Massimo Poesio and Udo Kruschwitz, 2009. A new life for a dead parrot: incentive structure in the Phrase Detectives game. Proc. of Webcentives09, Madrid.
- Kruschwitz, Udo, Jon Chamberlain, and Massimo Poesio, 2009. (Linguistic) Science Through Web Collaboration in the ANAWIKI Project. Proc. of Web Science, Athens.
- Chamberlain, Jon, Massimo Poesio and Udo Kruschwitz, 2008. Phrase Detectives - A Web-based Collaborative Annotation Game. In Proc. of I-Semantics, Graz. (pdf)
- Poesio, Massimo, Udo Kruschwitz and Jon Chamberlain, 2008. ANAWIKI: Creating Anaphorically Annotated Resources Through Web Collaboration. In Proc. of LREC, Marrakesh. ( pdf)