Character Level DkNN: We also conducted preliminary experiments into character level DkNN. Character level DkNN achieved comparable accuracy to standard inference procedures. For example, a character level BiLSTM on SST went from 82.3% accuracy 82.2% accuracy when using DkNN.
All of the interpretation techniques produced overall less interpretable results for the character level models (fairly obvious as humans do not operate at the character level). The importance value is often distributed roughly uniformly over the input characters because each character is not significantly influential on its own. Nonetheless, certain characters showed consistent, interpretable sentiment values across the evaluation data. For instance, the question mark ? appears as strong negative sentiment. Upon inspection, that character is highly correlated with negative reviews (e.g., ``couldn't someone take rob schneider and have him switch bodies with a funny person?''). Three character level interpretations are shown in the figure below, though we note these are cherry-picked, unlike the word level examples for SST and SNLI presented above.