Answer Verification

Answer Verification (AV): We conducted the question answering (QA) subtask at NTCIR-16 QA Lab-PoliInfo-3 to facilitate users’ understanding of assembly arguments about their topic of interest. Short answers generated automatically in QA included many facts of the arguments, but there were mistakes with the answers. A short answer that contains a wrong part, even if the rest is correct, is regarded as a fake answer, so fact checking of generated answers is necessary. Because answers submitted in QA were evaluated for the truth by the participants and using the questions and answers of QA as training data, we could build a simple binary classifier that returns true or false when a question and its answer are given. However, the training data size was too small to build a robust and reliable classifier. Therefore, we will conduct the answer verification (AV) subtask to expand the training data set and improve the fact-checking classifier.

  AV consists of two iterative stages. The first stage is for training-data expansion to generate fake answers, including both what looks true at first glance but is false and what looks false at first glance but is true. Participants create fake answers that a baseline classifier will misjudge. We also allow the participants to create fake answers themselves. At the end of the first stage, we create a test data set for the second stage by collecting the training data set and fake answers participants submitted. In the second stage, participants build their classifiers that can determine the test data set correctly. Using the classifiers as new baselines, participants return to the first stage. In other words, AV means that participants improve their classifier while creating fake answers that other participants’ classifiers will misjudge. Like a competitive game among participants, we plan to expand the training data set and improve the fact-checking classifier.

  AV is closely related to the second question answering (QA2) subtask at NTCIR-17 QA Lab-PoliInfo-4. Answers submitted in QA2 are regarded as tentatively correct answers and may be used for training-data expansion. We also use human evaluation results of QA2 to evaluate AV.


Stage 1 (fake-answer generation)

Input

Output

Evaluation


Stage 2 (fact-checking classification)

Input

Output

Evaluation


20221112_NTCIR17第1回説明会資料AnswerVerification.pdf