Fact Validation

Abstract

Given a document set and a sentence (t2), your system identifies whether the document set entails t2 or not.

Normally, a text (t1) and a hypothesis (t2) are given in recognizing textual entailment task.

In Fact Validation subtask, however, t1 is not given.

Your system need searching for a text passage corresponding to t1 in the given document set.

And, based on the search result, the system must determine whether the document set entails t2.

If t2 is entailed by some sentences in the document set, the statement described in t2 can be judged as "fact."

See also "Entrance Exam" page.

Test Collection

The following test collections are provided for participants.

The data written in red are newly created for RITE-VAL.

The data written in black are the same data for RITE-1 or RITE-2.

See also "NTCIR-11 Test Collections: data sets for NTCIR-11 Workshop Participants" page at NTCIR-11 website.

Data Format

The file format for Fact Validation subtask is the same for NTCIR-10 RITE-2 Exam Search subtask.

<?xml version="1.0" encoding="UTF-8"?>

<t2>パルテノン神殿の建つ丘は，アクロポリスと呼ばれている。</t2>

</pair>

<t2>パルテノン神殿は，ヘレニズム文化の影響下で建設された。</t2>

</pair>

:

</dataset>

In contrast with System Validation subtask, a <pair> element includes no <t1> element.

A <pair> element in task data for system training purposes has a @label attribute, while one in task data for formal run does not.

Evaluation Method

Binary Classification (for EN and JA)

Your system answers "Y" or "N" for a given sentence (t2).

If a human reading a document set would infer that the set entails t2, answer "Y."
Otherwise, answer "N."

Performance of the system is evaluated by macro F1 for the two labels.

The evaluation tool distributed at RITE-VAL and RITE-2 websites is used for this evaluation.

This is a mandatory task for Fact Validation EN and JA subtasks.

Result submission format:

"t2 ID" [SPACE] "Label" [SPACE] "Confidence"

:

Example file obeying the format:

1 Y 0.852

2 N 0.994

3 Y 0.789

4 Y 1.000

:

Multi-Classification (for CS and CT)

Your system answers "E", "C", or "U" for a given sentence (t2).

If a human reading a document set would infer that the set entails t2, answer "E."
If a human reading a document set would infer that the set contradicts with t2, answer "C."
Otherwise, answer "U."

Performance of the system is evaluated by macro F1 for the three labels.

The evaluation tool distributed at RITE-VAL and RITE-2 websites is used for this evaluation.

Result submission format:

"t2 ID" [SPACE] "Label" [SPACE] "Confidence"

:

Example file obeying the format:

1 E 0.852

2 C 0.994

3 E 0.789

4 E 1.000

:

Search for a t1 Text (optional task for EN and JA)

Your system lists candidate documents corresponding to t1 for a given sentence (t2).

The documents are retrieved from the given document set.

For each t2, your system can make a list of up to five "Document ID"s.

Task organizers judges validation of the candidate documents for each t2 which the system answered "Y" for.

Whether the candidate documents include a valid text for t2 is examined.

Precision and recall are used for evaluation.

This is an optional task for Fact Validation EN and JA subtasks.

Result submission format:

"t2 ID" [SPACE] "Document ID" [SPACE] "Document ID" [SPACE] … "Document ID"

:

Example file obeying the format:

1 45 224 334 1040

2 3 1482

3 30 781 315 709 33

4 11 33 1204 1132 553

:

Evaluation Tool

rite2eval version 3.0 (Java)