Shared Task

Task

The basic overview of interactions are:

1. (After September 1) Get a user ID and API key here: http://qb.entilzha.io/register

2. Client connects to the server and obtains a list of question IDs

3. For each question, client requests as many words as it wants (by word id) and then submits an answer

Evaluation Server

We have a server to accept your answers to a set of questions. Before starting, read the documentation for the server and client code. Next, see this simple demo system for an example of how to receive question text from the server and submit answers. The documentation also contains information on how to run the server locally, as well as a list of all API calls that can be made from the client.

Currently, the server is loaded with a validation set (the questions that we used in our matchup with Ken Jennings), and users are allowed to submit an unlimited number of answers per question. However, users are only permitted only a single submission per question for the test set. Check out the leaderboard to see how well you're doing on the validation set!

Training Data / Examples

Download quiz bowl question data to train and validate your system. This data also comes with preprocessed text versions of the Wikipedia pages associated with each answer in the training set. We encourage the use of external data in addition to what we have provided.

Test Set

The test set will have possible answers from any Wikipedia page. However, many of the answers will likely be in the train set (the same things get asked about again and again). You should expect around 80% of test questions to be about answers in the train set; an example test set can be found here. The questions will be written by quiz bowl writers based on the standard high school distribution.

IMPORTANT: The sample code provided answers all available questions in the 'dev' fold. You can answer 'dev' questions as many times as you like. However, 'test' questions can only be answered once. So be very careful when querying the text of test questions and providing your answers.

Evaluation

We will evaluate systems (and humans) in pairwise competition. The system that gives a correct answer first (i.e., after requesting the fewest number of words) gets 10 or 15 points (15 points are available for early, correct buzzes). A system that gives an incorrect answer first will lose 5 points. There is no penalty for guessing at the end of a question. The system with the higher overall score wins. We reserve the right to combine systems for our exhibition match against a human team.

Problems

If you have trouble with the code, please file an issue. For more general questions, join our mailing list and ask them there.