GREC-MSR

Input and output data

The GREC Corpus (version 2.0) consists of about 2,000 introductory sections in Wikipedia articles. In each text, three broad categories of Main Subject Reference (MSR) have been annotated (13,000 REs in total). The GREC-MSR shared task version of the corpus was randomly divided into 90% training data (of which 10% were randomly selected as development data) and 10% test data.

Fig 1: GREC-MSR training/dev set example

<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE TEXT SYSTEM "reg08-grec.dtd">

<TITLE>Jean Baudrillard</TITLE>

<REFEX REG08-TYPE="name" EMPHATIC="no" HEAD="nominal" CASE="plain">Jean Baudrillard</REFEX>

<ALT-REFEX>

<REFEX REG08-TYPE="name" EMPHATIC="no" HEAD="nominal" CASE="plain">Jean Baudrillard</REFEX>

<REFEX REG08-TYPE="name" EMPHATIC="yes" HEAD="nominal" CASE="plain">Jean Baudrillard himself</REFEX>

<REFEX REG08-TYPE="pronoun" EMPHATIC="yes" HEAD="pronoun" CASE="nominative">he himself</REFEX>

<REFEX REG08-TYPE="pronoun" EMPHATIC="yes" HEAD="rel-pron" CASE="nominative">who himself</REFEX>

</ALT-REFEX>

</REF>

(born June 20, 1929) is a cultural theorist, philosopher, political commentator, sociologist, and photographer.

<ALT-REFEX>

<REFEX REG08-TYPE="name" EMPHATIC="no" HEAD="nominal" CASE="genitive">Jean Baudrillard's</REFEX>

<REFEX REG08-TYPE="pronoun" EMPHATIC="no" HEAD="rel-pron" CASE="genitive">whose</REFEX>

</ALT-REFEX>

</REF>

work is frequently associated with postmodernism and post-structuralism.</PARAGRAPH>

</TEXT>

Figure 1 shows one of the texts in the GREC-MSR training/development data set. REFs indicate an instance of referring, REFEX is the selected RE and ALT-REFEX is a list of alternative REs for the referent. ALT-REFEX lists were generated for each text by an automatic method which collects all the (manually annotated) REs for the referent in the text and adds several defaults: pronouns and reflexive pronouns in all subdomains; and category nouns (e.g. the river), in all subdomains except people. Outputs generated by GREC-MSR systems are in the same format as the inputs, except that there are no ALT-REFEX lists and there is exactly one REFEX for each REF.

[GREC-MSR'09 Participants' Pack] (including GREC-MSR'09 training/development data)
[GREC-MSR'09 test data]

Task definition

The GREC-MSR Task is to develop a method for selecting one of the REFEXs in the ALT-REFEX list, for each REF in each TEXT in the test sets. The test data inputs are identical to the training/development data, except that REF elements contained only an ALT-REFEX list, not the preceding ‘selected’ REFEX. The main objective in the 2009 GREC-MSR Task was to get the word strings contained in REFEXs right (whereas in REG’08 it was the REG08-TYPE attributes).

Evaluation

For the GREC-MSR Shared Tasks we created an evaluation tool which computes the following metrics: (i) Accuracy of REFEX word strings, i.e. the proportion of REFEX word strings selected by a participating system that are identical to the one in the corpus; (ii) Accuracy of REG08-Type, i.e. the proportion of REFEXs selected by a participating system that have a REG08-TYPE value identical to the one in the corpus; (iii) String-edit distance; (iv) BLEU-3; and (v) NIST-5. In the case of the latter 3 string-comparison metrics, we assessed just the REs selected by peer systems (leaving out the surrounding text). In the human evaluations, we assessed Fluency, Clarity and Coherence of REs within the textual context, as described in Belz et al. (2009).

1. [geval package] which computes the metrics above

Documentation

Detailed documentation for the GREC-MSR shared task can be found in the GREC-MSR'09 participants' pack: [GREC-MSR'09 Participants' Pack].

Previous results

W08-1127: Anja Belz; Eric Kow; Jette Viethen; Albert Gatt

The GREC Challenge 2008: Overview and Evaluation Results

W08-1128: Bernd Bohnet

IS-G: The Comparison of Different Learning Techniques for the Selection of the Main Subject References

W08-1129: Iris Hendrickx; Walter Daelemans; Kim Luyckx; Roser Morante; Vincent Van Asch

CNTS: Memory-Based Learning of Generating Repeated References

W08-1130: Emily Jamison; Dennis Mehay

OSU-2: Generating Referring Expressions with a Maximum Entropy Classifier

W09-2816 [bib]: Anja Belz; Eric Kow; Jette Viethen; Albert Gatt

The GREC Main Subject Reference Generation Challenge 2009: Overview and Evaluation Results

W09-2818 [bib]: Benoit Favre; Bernd Bohnet

ICSI-CRF: The Generation of References to the Main Subject and Named Entities Using Conditional Random Fields

W09-2819 [bib]: Charles Greenbacker; Kathleen McCoy

UDel: Generating Referring Expressions Guided by Psycholinguistc Findings

W09-2820 [bib]: Samir Gupta; Sivaji Bandopadhyay

JUNLG-MSR: A Machine Learning Approach of Main Subject Reference Selection with Rule Based Improvement

Page updated

Google Sites

Report abuse