
Until recently, one of the main stumbling blocks for research in semantics and discourse was the lack of semantically annotated resources of a size and quality comparable to that of, say, the Penn Treebank. As a result, a by-product of my research on anaphora has been my involvement since the mid-90s in a number of projects concerned with the creation of such resources, including developing appropriate annotation schemes for several types of semantic information (particularly anaphoric information), with the development of tools for annotation, and with the creation of semantically annotated resources. We created several anaphorically annotated corpora over the years, first in collaboration with Renata Vieira, then in the GNOME project and in the ARRAU project, and more recently using Games-With-A-Purpose such as Phrase Detectives, Tile Attack!, Wormingo, and WordClicker, in particular in the DALI project.

I've also been involved in the annotation of corpora of Italian, including VENEX with the University of Venice (Poesio et al, 2004), and more recently the LiveMemories corpus of anaphora (Uryupina & Poesio, 2011) . In all of these projects the focus has been on creating linguistically motivated annotations, and in 'pushing the envelope' by attempting to annotate cases of anaphora beyond simple coreference among concrete entities--in particular, we devoted much attention to annotating bridging references (Poesio and Vieira 1998, Poesio Teufel and Vieira 1997, Poesio et al 2002), and reference to abstract objects (Poesio and Modjeska 2003, Artstein and Poesio 2006).


The first of such projects was the EU-funded MATE project (, concerned with the development of customizable tools for dialogue annotation based on XML standoff technology. I was responsible for the design of the coreference annotation guidelines in the project (see also (Poesio et al, 1999)).


An important aspect of GNOME was the creation of a corpus annotated for syntactic, semantic and discourse information, such as anaphoric reference. The annotation scheme used in GNOME was based on the MATE guidelines, augmented with instructions for other types of semantic annotation. Creating the GNOME corpus involved quite a lot of work on agreement on semantic judgments, continuing the research initiated as part of my collaboration with Renata Vieira (see below); in particular, we found good agreement among our subjects on animacy, less good agreement on countability, and little on genericity, the assignment of thematic roles, and topichood (Poesio, 2004). The GNOME corpus was used to develop statistical models of NP realization and to study Centering theory. Papers describing the corpus and its annotation scheme are available from the corpus web page,


The largest annotation project I've been involved in has been the creation of the ARRAU corpus (Poesio and Artstein, 2008; Uryupina et al, 2016, 2019). ARRAU is a multi-genre corpus of anaphoric information created over 10 years to provide data for the next generation of coreference/anaphora resolution systems. The distinguishing features of ARRAU include treating all NPs as markables, including non-referring NPs, and annotating their (non-) referentiality status; distinguishing between several categories of non-referentiality and annotating non-anaphoric mentions; thorough annotation of markable boundaries (minimal/maximal spans, discontinuous markables); annotating a variety of mention attributes, ranging from morphosyntactic parameters to semantic category; annotating the genericity status of mentions; annotating a wide range of anaphoric relations, including bridging relations and discourse deixis; and, finally, annotating anaphoric ambiguity. The latest release, release 2, contains 350K tokens, and was used for the CRAC 2018 Shared Task. It is publicly available from LDC. A new release of the corpus is planned for 2020.

Corpora for Italian

I have been involved in two projects leading to the creation of anaphorically annotated corpora of Italian: VENEX, a collaborative project with the Universita' di Venezia (Poesio et al, 2004), that created a corpus covering both written text and spoken dialogue; and LiveMemories, that resulted in the LiveMemories corpus used for the Anaphora Resolution Task at EVALITA 2011 (Uryupina and Poesio, 2011).

Corpora created using Games-With-A-Purpose

Since 2007 I have been actively involved in the creation of corpora through Games-With-A-Purpose. The Phrase Detectives GWAP for annotating anaphoric information (Chamberlain et al, 2008; Poesio et al, 2013) has been active for over ten years, producing in this period two releases: Phrase Detectives 1.0 (Chamberlain et al, 2016) and Phrase Detectives 2 (Poesio et al, 2019). The latter release is the largest corpus with multiple judgments,containing over 2.2 million judgments about 108,000 markables, for an average of 20 judgments per item.

A new corpus is under construction as part of the DALI project. The DALI corpus is being annotated by a pipeline of GWAPs consisting at the moment of three games - WordClicker for POS annotation, Tile Attack! for mention annotation, and Wormingo for anaphoric annotation - and is meant to cover about 10 million words.

Agreement (and Disagreement) in Interpretation

A problem that has particularly interested me over the years is that of disagreements in interpretation. There are many anaphoric expressions on whose interpretation people disagree without finding them ambiguous. An example are pronouns like IT in the following example,

Can you kindly hook up engine E3 to the boxcar at Elmira and send IT to Corning as soon as possible please?

We carried out a number of studies of disagreements in the interpretation, in particular of anaphoric expressions (Poesio and Vieira, 1998; Poesio 2004; Poesio and Artstein, 2005). This work also led us to write a review of the literature on coefficients of agreement such as K or Alpha used in CL to test agreement (Artstein and Poesio, 2008). This work is now continued as part of the DALI project, discussed in more detail in the Ambiguity page.

Projects (in inverse chronological order)

  • The DALI project, funded by ERC (2016-2021), is concerned with using the data collected through Games-With-A-Purpose to study anaphora and disagreements in anaphora.
  • The AnaWiki project , funded by EPSRC (2007-2009) was concerned with developing games with a purpose to collect data about anaphoric reference. The main outcome of the project is the Phrase Detectives game-with-a-purpose.
  • The ARRAU project , funded by EPSRC (2004-2007) was concerned with studying 'difficult' cases of anaphoric reference. One of its outcomes was the ARRAU corpus; another was the study by Artstein and Poesio.
  • The GNOME project, funded by EPSRC (1998-2000) was concerned with the generation of anaphoric expressions. One of its outcomes was the GNOME corpus.

Main publications