Collaborative & Knowledge-backed Language Generation

Toyota Technological Institute at Chicago

July 23rd-July 27th, 2018

We are excited to announce the TTIC Workshop on Collaborative and Knowledge-Backed Language Generation!

The goal of this workshop is to promote research in collaborative and knowledge-backed language generation. Neural approaches have led to improvements in a range of generation tasks including translation, summarization, and poetry/story generation. These advances have led to systems that can generate locally coherent sentences. Neural approaches also allow us to easily introduce additional information to enable knowledge-backed generation. We now have the building blocks to start investigating collaborative and knowledge-backed writing systems.

Collaborative writing systems seek to combine the complementary strengths of human and automatic language generation systems. Human writers also draw upon many forms of knowledge including factual knowledge about the topic, prototypical knowledge such as how typical events unfold (e.g., a scientist making a discovery is likely to publish it and perhaps get recognition for it later), and other forms of commonsense knowledge (e.g., a man who bought flowers and a ring is going to be sad if he loses the ring). A current challenge is to build generation models that can make use of these additional forms of knowledge. An example use case is collaborative news writing, in which a journalist has information about a specific news event and the generation model can consult large knowledge bases automatically to insert relevant knowledge about the entities involved into the text, and even prompt the writer to fill in additional information that is relevant in the discourse.

Broad Research Challenges: Collaborative writing introduces a new set of research questions for language generation. Here is a sampling of some important questions.

  1. How can the user actively control the system’s output? Controlling or guiding generated language to satisfy multiple objectives is difficult and requires new models.
  2. This setting also asks us to go beyond sequential models of text generation. Can we build dynamic architectures where composition grows around anchors words/phrases that user prefers to keep in some positions? Decoding and recurrent state updates need to be sampled to allow for exploration.
  3. How can we keep parts of the generated text stable? The text should not change too much to avoid load on the user.
  4. How can we generate long texts that integrate well with existing text? Generating long texts is still challenging. Strong results are generally limited to short texts. With longer sentences fluency fades and grammaticality falters. With multiple sentences semantic coherence nearly disappears.
  5. Text generation is often modeled as a single sequential pass to produce text (modulo beam decoding), a rigid approach where choices in earlier stages of the generation have a disproportionate impact. We need better decoding strategies.

Workshop Format

The workshop has three components.

(i) Three-day Hackathon: The goal is to produce sentence generation systems that could be part of a collaborative writing system. We will release datasets for two tasks.

(ii) Shared Task Design: We will form a working group that build towards the design of a shared task focusing on evaluation and datasets.

(iii) Invited talks: Leading NLP researchers and industry folks interested in this area will pitch their current work, possible research directions, challenges in evaluating such systems, and discuss how to attract some research focus into this area. Possibly work towards a future shared-task and workshop associated with the *CL conferences.