The GREC Corpus (version 2.0) consists of about 2,000 introductory sections in Wikipedia articles. In each text, three broad categories of Main Subject Reference (MSR) have been annotated (13,000 REs in total). The GREC-MSR shared task version of the corpus was randomly divided into 90% training data (of which 10% were randomly selected as development data) and 10% test data.
Fig 1: GREC-MSR training/dev set example
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE TEXT SYSTEM "reg08-grec.dtd">
<TEXT ID="967">
<TITLE>Jean Baudrillard</TITLE>
<PARAGRAPH>
<REF ID="967.1" SEMCAT="person" SYNCAT="np-subj">
<REFEX REG08-TYPE="name" EMPHATIC="no" HEAD="nominal" CASE="plain">Jean Baudrillard</REFEX>
<ALT-REFEX>
<REFEX REG08-TYPE="name" EMPHATIC="no" HEAD="nominal" CASE="plain">Jean Baudrillard</REFEX>
<REFEX REG08-TYPE="name" EMPHATIC="yes" HEAD="nominal" CASE="plain">Jean Baudrillard himself</REFEX>
<REFEX REG08-TYPE="empty">_</REFEX>
<REFEX REG08-TYPE="pronoun" EMPHATIC="no" HEAD="pronoun" CASE="nominative">he</REFEX>
<REFEX REG08-TYPE="pronoun" EMPHATIC="yes" HEAD="pronoun" CASE="nominative">he himself</REFEX>
<REFEX REG08-TYPE="pronoun" EMPHATIC="no" HEAD="rel-pron" CASE="nominative">who</REFEX>
<REFEX REG08-TYPE="pronoun" EMPHATIC="yes" HEAD="rel-pron" CASE="nominative">who himself</REFEX>
</ALT-REFEX>
</REF>
(born June 20, 1929) is a cultural theorist, philosopher, political commentator, sociologist, and photographer.
<REF ID="967.2" SEMCAT="person" SYNCAT="subj-det">
<REFEX REG08-TYPE="pronoun" EMPHATIC="no" HEAD="pronoun" CASE="genitive">His</REFEX>
<ALT-REFEX>
<REFEX REG08-TYPE="name" EMPHATIC="no" HEAD="nominal" CASE="genitive">Jean Baudrillard's</REFEX>
<REFEX REG08-TYPE="pronoun" EMPHATIC="no" HEAD="pronoun" CASE="genitive">his</REFEX>
<REFEX REG08-TYPE="pronoun" EMPHATIC="no" HEAD="rel-pron" CASE="genitive">whose</REFEX>
</ALT-REFEX>
</REF>
work is frequently associated with postmodernism and post-structuralism.</PARAGRAPH>
</TEXT>
Figure 1 shows one of the texts in the GREC-MSR training/development data set. REFs indicate an instance of referring, REFEX is the selected RE and ALT-REFEX is a list of alternative REs for the referent. ALT-REFEX lists were generated for each text by an automatic method which collects all the (manually annotated) REs for the referent in the text and adds several defaults: pronouns and reflexive pronouns in all subdomains; and category nouns (e.g. the river), in all subdomains except people. Outputs generated by GREC-MSR systems are in the same format as the inputs, except that there are no ALT-REFEX lists and there is exactly one REFEX for each REF.
[GREC-MSR'09 Participants' Pack] (including GREC-MSR'09 training/development data)
The GREC-MSR Task is to develop a method for selecting one of the REFEXs in the ALT-REFEX list, for each REF in each TEXT in the test sets. The test data inputs are identical to the training/development data, except that REF elements contained only an ALT-REFEX list, not the preceding ‘selected’ REFEX. The main objective in the 2009 GREC-MSR Task was to get the word strings contained in REFEXs right (whereas in REG’08 it was the REG08-TYPE attributes).
For the GREC-MSR Shared Tasks we created an evaluation tool which computes the following metrics: (i) Accuracy of REFEX word strings, i.e. the proportion of REFEX word strings selected by a participating system that are identical to the one in the corpus; (ii) Accuracy of REG08-Type, i.e. the proportion of REFEXs selected by a participating system that have a REG08-TYPE value identical to the one in the corpus; (iii) String-edit distance; (iv) BLEU-3; and (v) NIST-5. In the case of the latter 3 string-comparison metrics, we assessed just the REs selected by peer systems (leaving out the surrounding text). In the human evaluations, we assessed Fluency, Clarity and Coherence of REs within the textual context, as described in Belz et al. (2009).
[geval package] which computes the metrics above
Detailed documentation for the GREC-MSR shared task can be found in the GREC-MSR'09 participants' pack: [GREC-MSR'09 Participants' Pack].
W08-1127: Anja Belz; Eric Kow; Jette Viethen; Albert Gatt
The GREC Challenge 2008: Overview and Evaluation Results
W08-1128: Bernd Bohnet
IS-G: The Comparison of Different Learning Techniques for the Selection of the Main Subject References
W08-1129: Iris Hendrickx; Walter Daelemans; Kim Luyckx; Roser Morante; Vincent Van Asch
CNTS: Memory-Based Learning of Generating Repeated References
W08-1130: Emily Jamison; Dennis Mehay
OSU-2: Generating Referring Expressions with a Maximum Entropy Classifier
W09-2816 [bib]: Anja Belz; Eric Kow; Jette Viethen; Albert Gatt
The GREC Main Subject Reference Generation Challenge 2009: Overview and Evaluation Results
W09-2818 [bib]: Benoit Favre; Bernd Bohnet
ICSI-CRF: The Generation of References to the Main Subject and Named Entities Using Conditional Random Fields
W09-2819 [bib]: Charles Greenbacker; Kathleen McCoy
UDel: Generating Referring Expressions Guided by Psycholinguistc Findings
W09-2820 [bib]: Samir Gupta; Sivaji Bandopadhyay
JUNLG-MSR: A Machine Learning Approach of Main Subject Reference Selection with Rule Based Improvement