Explainability Methods for Neural Networks (WS 2020/2021)
Lecturer: Daria Pylypenko
Time: Thursday 14:15-15:45
First session: 05.11.2020
Location: MS Teams
Deep Neural Networks (DNNs) are very powerful. However, they are often considered "opaque", in the sense that it is not always easy to see how they come to a particular decision. Mathematically, of course, DNNs are fully transparent: they are just a mix of matrix algebra operations and non-linearities. But they are not always easy to interpret in terms of concepts that are broadly accessible to humans. The seminar will focus on methods that can help explain how neural networks "reason" while performing certain tasks, what they learn and which information they use for making predictions. The aim is to examine general methods for interpreting neural network based models, with a focus on methods that can be applied to NLP tasks.
Students will make presentations about research papers.
Prerequisites: familiarity with neural networks (feedforward, recurrent, convolutional), backpropagation, calculus, and linear algebra.
Prior registration
Maximal number of participants: 20 (10 from CS, 10 from CoLi).
CS students:
Please register here: https://seminars.cs.uni-saarland.de/seminars2021.
CoLi students (Language Science and Technology, Computerlinguistik, LCT):
Please send an email to: daria dot pylypenko at uni-saarland dot de
Deadline: October 26th 23:59 CET Extended: November 4th 23:59 CET Booked.
Schedule
Attention
Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
Vilém Zouhar - 19.11.2020
Probing
What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties (Conneau et al., 2018)
Christian Cayralat - 19.11.2020
LIME
“Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribeiro et al., 2016)
Jannis Morsch - 26.11.2020
SHAP
A Unified Approach to Interpreting Model Predictions (Lundberg and Lee, 2017)
Sharmila Upadhyaya - 26.11.2020
Perturbations for NLP (Omission)
Representation of Linguistic Form and Function in Recurrent Neural Networks (Kádár et al., 2017)
Annegret Janzso - 03.12.2020
Meaningful Pertubation
Interpretable Explanations of Black Boxes by Meaningful Perturbation (Fong and Vedaldi, 2018)
---
Sensitivity Analysis and Activation Maximization
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps (Simonyan et al., 2014)
Yogesh Kumar Baljeet Singh - 10.12.2020
Sensitivity analysis: Application to NLP
Visualizing and Understanding Neural Models in NLP (Li et al., 2016)
Sohaib Arshid - 10.12.2020
Deconvolution and Perturbations (Occlusion)
Visualizing and understanding Convolutional Networks (Zeiler and Fergus, 2013)
Sangeet Sagar - 17.12.2020
Deconvolution for text
Textual Deconvolution Saliency (TDS): a deep tool box for linguistic analysis (Vanni et al., 2018)
Priyanka Das - 17.12.2020
LRP: Theory
Layer-Wise Relevance Propagation: An Overview (Montavon et al., 2019)
Chapter from Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (see Additional Literature below)
Leonie Lapp - 07.01.2020
DeepLIFT
Learning Important Features Through Propagating Activation Differences (Shrikumar et al., 2017)
---
Integrated gradients
Axiomatic Attribution for Deep Networks (Sundararajan et al., 2017)
Pin-Jie Lin - 14.01.2021
Activation maximization with GANs
Synthesizing the preferred inputs for neurons in neural networks via deep generator networks (Nguyen et al., 2016)
Enea Duka - 21.01.2021
Activation maximization: Application to NLP
Interpretable Textual Neuron Representations for NLP (Poerner et al., 2018)
Daniel Biondi - 21.01.2021
Generating textual explanations: for image classification
Generating Visual Explanations (Hendricks et al., 2016)
Anar Amirli - 28.01.2021
Generating textual explanations: for text classification
Towards Explainable NLP: A Generative Explanation Framework for Text Classification (Liu et al., 2019)
Janaki Viswanathan - 28.01.2021
Influence functions
Understanding black-box predictions via influence functions (Koh and Liang, 2017)
Rricha Jalota - 04.02.2021
Influence functions for NLP
Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions (Han et al., 2020)
Joanna Dietinger - 04.02.2021
LRP: Application to NLP
“What is Relevant in a Text Document?”: An Interpretable Machine Learning Approach (Arras et al., 2016)
Hafeez Ullah - 04.02.2021
Additional literature
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning
Samek, Wojciech; Montavon, Grégoire; Vedaldi, Andrea; Hansen, Lars Kai and Müller, Klaus-Robert
Available for download here within the MPI or UdS ip-range. The paper copy can be obtained from the semester reserve at the Campus-Bibliothek für Informatik und Mathematik.
Christoph Molnar