Multi3-NLU++: Dataset from MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue
ACES: Dataset and code from ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics
Ant: Evaluation dataset for the open domain
Sports Entailment Evaluation: Evaluation scripts used in Blindness to Modality Helps Entailment Graph Mining
Temporal Entailment Sports Dataset: Evaluation dataset used in Incorporating Temporal Information in Entailment Graph Mining
MoNTEE: Modality-aware event extraction system used in Modality and Negation in Event Extraction
ParCor 1.0: Parallel Pronoun-Coreference Corpus (English and German). From the paper ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT
PROTEST: A test suite to support the evaluation of pronoun translation. From the paper PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation
WMT 2016 cross-lingual pronoun prediction: Resources from the WMT 2016 shared task including the training, development and test data, and evaluation script
WMT 2018 test suite for evaluating pronoun translation for English-German MT systems: Resources from the evaluation of WMT 2018 systems