Evaluating Human Explanations in Natural Language Inference