Linguistic Phenomena

A sentence pair (t1 and t2) in a part of System Validation subtask dataset has a category label related to a linguistic phenomenon.

The two tables at this page list such category labels for Japanese subtask. (category labels for Chinese subtask)

Only a linguistic phenomenon indicated by the category label is involved in the decision whether t1 entails t2 in the pair.

A single linguistic phenomenon involved in entailment decision can affect more than one part of t1.

Insertion of a comma (,) is not regarded as a linguistic phenomenon now.

Therefore, a linguistic phenomenon and insertion of commas can be involved in entailment decision.

We have made a list of category labels on the basis of ones proposed in the following related work.

Bentivogli et al. (2010) Building Textual Entailment Specialized Data Sets: a Methodology for Isolating Linguistic Phenomena Relevant to Inference.
Sammons et al. (2010) “Ask not what Textual Entailment can do for You…”

Category label "*:phrase" such as "entailment:phrase" and "disagree:phrase" is related to miscellaneous linguistic phenomena.

These labels are so-called "the others" labels and a sentence pair where expressions in t1 correspond to expressions in t2 with complicated alignment is annotated with one of the labels.

These labels can be subdivided into several sub-categories in the future.