TTIC 2018 CollaboWrite Workshop - Collaborative Sentence Generation

Collaborative Sentence Generation

Motivation

The goal is to develop systems that can collaborate with a human writer to generate sentences. We scope this to a missing sentence generation task and to use a particularly poor simulation of a human writer. In particular, we remove a sentence at random from a small story and ask the system to generate the missing sentence. A simulated writer will provide feedback to the system until it generates the missing sentence exactly (or until a fixed set of iterations). The writer will provide two forms of feedback:

(1) Keywords -- At each step the simulated writer will sample one word at random from the sentence (without replacement) and add it to the set of keywords that the sentence generator should use in its output.

(2) Accepts -- The writer will accept words in the generated sentence that appear at the correct position in the missing sentence. In the subsequent iterations the generator will have to keep the words in the accepted positions unchanged.

Example: Consider the following story from the ROCStory collection where we removed the second sentence:

Story with missing sentence

Morgan and her family lived in Florida.
[token 1] [token 2] [token 3] [token 4] [token 5] [token 6] .
They decided to evacuate to a relative's house.
They arrived and learned from the news that it was a terrible storm.
They felt lucky they had evacuated when they did.

Missing sentence:

They heard a hurricane was coming.

Shuffled Keywords:

{heard, hurricane, coming, they, a, was}

System Iterations:

Iteration 1:

Keywords = {}, Accepts = {}

Generated sentence = There was a thunderstorm in town.

Iteration 2:

Keywords = {heard}, Accepts={a},

Generated Sentence = They heard a thunderstorm in town.

Iteration 3:

Keywords = {heard, hurricane}, Accepts={They, heard, a}

Generated Sentence = They heard a hurricane flooded town.

Iteration 4:

Keywords = {heard, hurricane, coming}, Accepts={They, heard, a, coming}

Generated Sentence = They heard a hurricane is coming.

Iteration 5:

Keywords = {heard, hurricane, coming, was}, Accepts={They, heard, a, coming}

Generated Sentence = They heard a hurricane was coming.

Task Definition:

We propose a simplified task that gets at the core of an iterative language generation system. The task is to generate the missing sentence where some of the words in the sentence have been filled in (i.e., simulating the accepts, refer to Accepted Words below), and some keywords are provided (i.e., simulating guidance from the writer).

Here is an example input/output.

INPUT:

Story

Morgan and her family lived in Florida.  [token 1] [token 2] [token 3] [token 4] [token 5] [token 6]. They decided to evacuate to a relative's house.  They arrived and learned from the news that it was a terrible storm. They felt lucky they had evacuated when they did.

Keywords

thunderstorm, a

Accepted Words

[They] [heard] [token 3] [token 4] [was] [token 6]

OUTPUT:

They heard a hurricane was coming.

Dataset:

The dataset consists of 52,665 stories obtained from the ROCStory collection. More details and link below.

[dataset]

Evaluation:

#new correct positions / #available positions
#BLEU of output - #BLEU of input (where unaccepted input slots are marked UNK)
Group results based on # of accepts and #keywords in input.

Baseline Models:

We provide a seq2seq baseline whose architecture is shown below:

[TBD]