eval-model-and-analysis

This page gives more details about the evaluation of the analysis rules than was reported on in the paper. The test-set creation is described, and links to the test-set and annotations are provided.

Test-set creation

100 sentences from the corpus, that represent the corpus challenges, were selected. In total, the sentences contained 311 propositions.

Propositions are composed of an actor, a predicate and a negotiation point related to the actor via the predicate.

The sentences in the reference set contain structures like:

multiple actors and multiple predicates: E.g. constructions like The EU, supported by China and opposed by the US, supported the creation of ....
negation
nominal predicates: e.g. constructions like France's proposal to ......

Annotation choices

As can be expected, the corpus contains sentences whose agent is a generic noun phrase like the delegates, or most delegates, or a moderator role like the Chair, etc. Whereas the system does output propositions with such actors, for our evaluation we did not annotate such actors, and the system was configured so that it does not output propositions for those actors (based on an attribute in the data-model that encodes the actor type).

Actor mentions were normalized to the DBpedia entity representing them. E.g. a mention like The EU appears as European_Union in the reference set

Evaluation method

As described in the paper, an output was considered correct if all of its components matched the reference exactly.

To compare the reference and system results, all characters were lowercased, and trailing or leading whitespace and punctuation was stripped from all results. These modifications are immaterial to the task, but avoid assigning an error just because there was a trailing space in the manual annotation, for example.

Links to test-set

Page updated

Google Sites

Report abuse