Evaluating Human Explanations in Natural Language Inference

Project 5: Evaluating Human Explanations in Natural Language Inference

All the Explanations are Wrong, but Some are Useful

Abstract

How can human-provided explanations be exploited to study and improve the consistency of NLP methods? In this project, you will develop a system that explains natural language inference predictions using free-form textual explanations, and you will analyze its failure cases.

Description

Natural language inference (NLI) is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise” (e.g. The premise “The dog’s barking woke up the cat” entails (i.e. supports) the hypothesis “the feline was sleeping”). In recent years, the field of NLP interpretability started exploring the usage of natural language explanations, notably for the NLI task with the e-SNLI dataset, to force models to provide users with interpretable insights on their predictions. In this project, you will explore when generating explanations fails, and what can be done to make explanation generation systems more robust.

We provide you with the e-SNLI dataset, in which triplets of (premise, hypothesis, label) are accompanied by a human-provided explanation in the training set, and by three human-provided explanations in the validation and test sets.

Your main goal for this project is to develop or fine-tune a system to generate natural language explanation, evaluate its performance on the test set with at least one text-based and one neural metric and perform error analysis on low-scoring examples. Since multiple explanations are available for each test pair, a typical way to perform evaluation is to consider the top score across all available references for each example. Some text-based metrics that are commonly used are BLEU and ROUGE variants (Rouge1, Rouge2, RougeL). Some neural metrics used for NLG evaluation are BLEURT, BartScore, and Comet.

Ideas for research directions:

Jointly predicting NLI labels and generating explanations: Instead of training the model to generate only the explanation, make it also predict the NLI label (e.g. “premise: The dog is running in the park. hypothesis: An animal is in the park” can produce something like “entailment. a dog is an animal”). Evaluate the accuracy of the model in predicting the correct NLI label. Are generated explanations always aligned with the predicted NLI label? Perform an analysis.
Create templated augmentations: The paper by (Camburu et al. 2020) suggests that most explanations in the available data follow some label-dependent template (see Appendix A for examples, e.g. taking the explanation “All dogs are animals” for the original label “entailment”, we can generate “Dogs belong to the category of animals” or “ There are no dogs that are not animals” using templates and Regexes). Produce templated augmentations of the examples for which explanation generation performs poorly and add them to the training set. Does the performance of the model improve on the selected examples? Does the performance degrade in other contexts?
[Challenge 🏆] Create paraphrased augmentations: Instead of using a template to generate augmented explanations as in the previous point, use a paraphrasing model that takes an explanation as an input and generates related explanations. Evaluate the performances of an out-of-the-box paraphrase generation model in generating explanations matching the templates (e.g. any model trained on PAWS can be a good bet), and how effective these augmentations are to improve the performance of the explanation generation model. Then, try to fine-tune the paraphrase generation model on a set of explanation pairs (you decide how to define and create this) and repeat the evaluations. Is the fine-tuning of the paraphraser effective in this setting?

Materials

A HuggingFace dataset associated with the data is available on the Dataset Hub.
Refer to the dataset card on the Dataset Hub for all information on features.

References

Christopher Potts. “Natural Language Inference’, Stanford University (2019).

Camburu, Oana-Maria et al. “e-SNLI: Natural Language Inference with Natural Language Explanations.” NeurIPS (2018).

Liu, Hui et al. “Towards Explainable NLP: A Generative Explanation Framework for Text Classification.” ACL (2019).

Camburu, Oana-Maria et al. “Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations.” ACL (2020).

Sundararajan, Mukund et al. “Axiomatic Attribution for Deep Networks.” ArXiv abs/1703.01365 (2017): n. pag.

Zhang, Yuan et al. “PAWS: Paraphrase Adversaries from Word Scrambling.” ArXiv abs/1904.01130 (2019): n. pag.

Page updated

Report abuse