Call for Participation: TREC 2016 LiveQA Track
Real-Time Answers to Real User Questions
David Carmel, Yahoo Labs, Haifa
Eugene Agichtein, Emory University
Donna Harman, NIST, donna.harman@nist.gov
Dan Pelleg and Yuval Pinter, Yahoo Labs, Haifa
OVERVIEW
We invite participation in the 2016 version of LiveQA track, focusing on "live" question answering for real-user questions. Real user questions, sampled from the stream of most recent questions submitted on the Yahoo Answers site that have not yet been answered by humans, will be sent to the participant systems. The systems will provide an answer in real time.
This is the second year of the track, which successfully concluded in 2015. Last year, 22 systems participated, developed by 14 institutions from around the globe (Australia, China, the Middle East, Europe, and North America). The human questions were collected from a few predefined categories of Yahoo Answers (Health, Sport, Computer & Internet, Pets...). We plan to continue using questions from Yahoo Answers, with the question categories similar to last year’s.
TASKS
Track coordinators will provide a test system that polls the participant by sending one thousand newly submitted real user questions submitted to the YA site. The track participants will develop a web service that receives as input a YA question object, and responds with an answer of up to 1000 characters in length, within 1 minute, and include the source(s) of the answer. This is the main required task. This year we are also introducing new optional pilot subtasks on question understanding, for which the systems will categorize the posted question, as well as identify the “essential” parts of the question.
INFRASTRUCTURE
This year we will again provide initial implementation of the server based on NanoHttpd, including a definition of the API to be implemented by participants.
*Resources and Data:
* YA data is publicly available and many datasets have already been downloaded by the research community and can be reused for this task. Among them is the a large collection of 4,483,032 question and answer pairs provided by Yahoo Lab on the WebScope site http://webscope.sandbox.yahoo.com/catalog.php?datatype=l . The specific dataset is L6. Participants are welcome to download this dataset and use it for training.
* QRELS from previous system responses from LiveQA 2015 will be made available to active participants, including (potentially) a baseline system implementation.
* An API for Yahoo Answers, and a list of resolved questions, representing the categories that we will use during on-line testing phase. Participants can download the suggested questions, with all their metadata (answers, user feedback, etc.), using the provided APIs, and use them for training.
*Support for registered participants to experiment and technically validate their answering services against a test client. We will publish the full details of the validation server and the schedule of “dry runs” in the winter of 2016.
EVALUATION
*Runs: During the main evaluation phase, all participant systems should be ready to be invoked by the testing service. At a pre-arranged time, the testing system will start calling the registered systems with the new posted unresolved questions, and will collect the participants’ answers to construct the question-answers pool. Participant systems are allowed to ignore some of the questions (returning an empty element instead of the answer string).
* Metrics: Each question with the corresponding pool of answers will be examined and evaluated by TREC editors. The answers will be judged on a 5-level Likert scale (1- non-relevant and non-useful, 5- perfect answer) based on their relevance and responsiveness to the question.
Main metrics are the same as last year, and will include: average answer score, precision (fraction of answered questions with a score above a threshold), and coverage (fraction of all questions answered). We will also experiment with new (pilot) metrics on subset of questions:
- Readability: grammaticality, politeness.
- Conciseness: penalize by length compared to human answers
- Question annotation: type (accuracy), essential question subset detection.
2016 TIMELINE
- February: Announcement of the list of question categories and filtering criteria to be used in the challenge. YA API to be ready for participants.
- February: Skeleton server and answering service API definition released.
- March: Testing system is ready. Participants can experiment with the answering service against the testing system.
- May 31st: LiveQA challenge: Participant answering services go live and ready to answer questions for a period of 24 hours.
REGISTRATION
TREC registration deadline is in February 2016. Register through the TREC website (http://trec.nist.gov/pubs/call2016.html ). Official registration is required to participate.
Additionally, to join the informal mailing list, express interest, share resources, etc, use the web form below to add your contact information: http://bit.ly/1RKivn9