QBLink

A Dataset for Sequential Open-Domain Question Answering

QBLink is a dataset for sequential question answering where the questioner asks multiple related questions about the same concept one-by-one. After each question, the answerer provides an answer before the next question is asked. The dataset is designed to evaluate the ability of question-answering systems to leveragea the additional context in the form of a history of previously asked questions and answers.

The dataset consists of 18,644 sequences (56,000 question-answer pairs). Each sequence starts with a lead-in that defines the topic of the questions, followed up three questions and answers. Two examples are provided below.

Example 1

Lead-in: Only twenty­one million units in this system will ever be created.

Question 1: Name this digital payment system whose transactions are recorded on a “block chain”.

Answer: Bitcoin

Question 2: Bitcoin was invented by this person, who, according to a dubious Newsweek cover story, is a 64­ year ­old Japanese­American man who lives in California.

Answer: Satoshi Nakamoto

Question 3: This online drugs marketplace, Chris Borglum’s one­time favorite, used bitcoins to conduct all of its transactions. It was started in 2011 by Ross Ulbricht using the pseudonym Dread Pirate Roberts.

Answer: Silk Road

Example 2

Lead-in: He signed the Goldwater­Nichols Act after the Packard Commission investigated the Department of Defense.

Question 1: Name this Republican president who firmly advocated supply­side economics. He was an actor and a governor of California before becoming president.

Answer: Ronald Wilson Reagan

Question 2: In March of 1981, this man shot President Reagan in order to impress the actress Jodie Foster. His acquittal due to insanity angered many Americans and led to the Insanity Defense Reform Act of 1984.

Answer: John Warnock Hinckley, Jr.

Question 3: Codenamed Operation Urgent Fury, the invasion of this Caribbean island was launched by Reagan in order to protect American students and topple a Communist government set up after the deposition of Maurice Bishop.

Answer: Invasion of Grenada

Dataset Download

The dataset is pre-partitioned into training, development and testing subsets.

Each file is an array of sequences. Each sequence has the following fields: 

Here is a JSON representation of an example sequence

{

  "id": 1, 

  "tournament": "2014 PACE NSC", 

  "lead_in": "The speaker of this poem declares \"I miss Europe with its ancient parapets!\" 

             before describing \"sidereal archipelagos\" and \"islands whose delirious skies 

             are open to the sea-wanderer\". For 10 points each:", 

  "category": "Literature", 

  "sub_category": "Literature European", 

  "q1": {

      "quetsion_text": "Name this poem which opens with its title object relating the murder        

                        of its crew, after which it runs into the \"furious lashing of the 

                        tides\".",

      "raw_answer": "The Drunken Boat [or Le Bateau Ivre]",

      "wiki_page": "Random_walk"

       },

  "q2": {

      "quetsion_text": "The Drunken Boat was written by this French poet of A Season in Hell. 

                       He engaged in a torrid affair with Paul Verlaine and abandoned poetry 

                       by age 20.", 

      "raw_answer": "Arthur Rimbaud [or Jean Nicolas Arthur Rimbaud]",

      "wiki_page": "Arthur_Rimbaud"

        }, 

  "q3": {

      "quetsion_text": "Rimbaud wrote of a \"sublime Trumpet full of strange piercing 

                       sounds\" and the \"divine shudderings of viridian seas\" in a poem 

                       assigning colors to these entities. Georges Perec's novel A Void only 

                       includes four of them.", 

      "raw_answer": "the vowels [accept A, E, I, O, and U in any order]",

      "wiki_page": "Vowel"

       }

}

Contact us for any questions: 

Ahmed Elgohary <elgohary@cs.umd.edu>, Chen Zhao <chenz@cs.umd.edu>, and Jordan Boyd-Graber <jbg@cs.umd.edu>

EMNLP'18 Paper Bibtex:

@inproceedings{Elgohary:Zhao:Boyd-Graber-2018,

  Title = {Dataset and Baselines for Sequential Open-Domain Question Answering},

  Author = {Ahmed Elgohary and Chen Zhao and Jordan Boyd-Graber},

  Booktitle = {Empirical Methods in Natural Language Processing},

  Year = {2018},

  Location = {Brussels, Belgium}

}