Below are dozens of examples from the Stanford Sentiment Treebank (SST) test set comparing our method (DkNN Leave-One-Out) to traditional leave one out (Softmax Leave-One-Out) and traditional gradient based saliency maps (Vanilla Gradient)
Below are natural language inference examples from SNLI using our DkNN Leave-One-Out approach. Some were selected to show certain behavior, for example see the two examples of "wearing" a certain color shirt, showing an incorrect prediction based on an apparent artifact . Others are selected from the first examples in SNLI validation test.
Calibrated Interpretation Results: For clarity and space considerations in the paper, we omitted the results for the Calibrated leave-one-out interpretation method. In the figure below, you can see the examples from the paper compared for all interpretations.
Character Level DkNN: We also conducted preliminary experiments into character level DkNN. Character level DkNN achieved comparable accuracy to standard inference procedures. For example, a character level BiLSTM on SST went from 82.3% accuracy 82.2% accuracy when using DkNN.
All of the interpretation techniques produced overall less interpretable results for the character level models (fairly obvious as humans do not operate at the character level). The importance value is often distributed roughly uniformly over the input characters because each character is not significantly influential on its own. Nonetheless, certain characters showed consistent, interpretable sentiment values across the evaluation data. For instance, the question mark ? appears as strong negative sentiment. Upon inspection, that character is highly correlated with negative reviews (e.g., ``couldn't someone take rob schneider and have him switch bodies with a funny person?''). Three character level interpretations are shown in the figure below, though we note these are cherry-picked, unlike the word level examples for SST and SNLI presented above.