TTIC 2018 CollaboWrite Workshop - Post-modifier Generation

Post-modifier Generation

The goal is to develop systems that can insert useful content into an existing sentence. We scope this task to generating post-modifiers for entities.

Task Definition

The input is a sentence mentioning a news event about an entity and a list of knowledge base entries involving the entity. The entity and the slot where the post-modifier needs will be annotated as part of the input.

The output is an appropriate post-modifier phrase that can be inserted into the sentence (immediately after the entity).

Example 1

Input:

- Sentence: Barack Obama hailed the MeToo movement as a critical grassroots effort that needs the support of the society at large.

- Entity: Barack Obama

- KBID: FB123412

Output:

a father of two girls

The output will be inserted as a clause right after the entity mention in the input sentence. So the above output is to be interpreted as below:

Barack Obama, a father of two girls, hailed the MeToo movement as a critical grassroots effort that needs the support of the society at large.

Example 2

Input:

- Sentence: Barack Obama criticized the 5-4 supreme court decision and said that this unravels the Title IX protections in an unprecedented judicial overreach.

- Entity: Barack Obama

- KBID: FB123412

Output:

the 44th president of the US

The output will be inserted as a clause right after the entity mention in the input sentence. So the above output is to be interpreted as below:

Barack Obama, the 44th president of the US, criticized the 5-4 supreme court decision and said that this unravels the Title IX protections in an unprecedented judicial overreach.

Evaluation

The evaluation measures how well the generated post-modifier matches the original post-modifier and whether the generated text fits with rest of the sentence. We will use the following measures:

KB coverage
- General: # of KB claims covered by the PM
- Gold PM: # of relevant KB claims covered by the PM (where relevant = set of claims that have overlap with gold PM)
Exact match
BLEU + METEOR
Word embedding based similarity measure (cosine of averaged word vectors)
Coherence of the modifier in the local context (language modeling + similarity w/ context?)

[We will release a script soon.]

Dataset:

The dataset consists of 30,691 sentences extracted from the Gigaword and CNN-DailyMail collections. Additional details below.

[TODO filter out clearly unnecessary fields]

[dataset]

Baseline Models:

We have built a baseline, a seq2seq model with attention using the Open NMT (https://github.com/OpenNMT/OpenNMT-py).

[Link to code]

Model details:

Single 2-layer biLSTM that encodes the sentence + claims.
How is the input encoded?
- Sentence first + special tokens to mark relation and values in the claims.
- <rel> ... </rel> for relation, <value> ... </value> for value, and <and> is used if there are more than one value.
- Example: Last night German magazine Der Spiegel quoted aides to Herman Van Rompuy . <rel> member of political party </rel> <value> Christian Democratic and Flemish <and> European People's Party </value> <rel> occupation </rel> <value> politician <and> economist </value>
What is the attention doing?
- Standard multiplicative attention that produces weights over the hidden states for the input encoder states (http://opennmt.net/OpenNMT-py/onmt.modules.html#attention).
Decoder will generate post-modifier alone with beam decoding (beam size = 5).

Google Sites

Report abuse