Explainability Methods for Neural Networks (WS 2020/2021)

Lecturer: Daria Pylypenko

Time: Thursday 14:15-15:45

First session: 05.11.2020

Location: MS Teams

Deep Neural Networks (DNNs) are very powerful. However, they are often considered "opaque", in the sense that it is not always easy to see how they come to a particular decision. Mathematically, of course, DNNs are fully transparent: they are just a mix of matrix algebra operations and non-linearities. But they are not always easy to interpret in terms of concepts that are broadly accessible to humans. The seminar will focus on methods that can help explain how neural networks "reason" while performing certain tasks, what they learn and which information they use for making predictions. The aim is to examine general methods for interpreting neural network based models, with a focus on methods that can be applied to NLP tasks.

Students will make presentations about research papers.

Prerequisites: familiarity with neural networks (feedforward, recurrent, convolutional), backpropagation, calculus, and linear algebra.

Prior registration

Maximal number of participants: 20 (10 from CS, 10 from CoLi).

CS students:

Please register here: https://seminars.cs.uni-saarland.de/seminars2021.

CoLi students (Language Science and Technology, Computerlinguistik, LCT):

Please send an email to: daria dot pylypenko at uni-saarland dot de

Deadline: October 26th 23:59 CET Extended: November 4th 23:59 CET Booked.

Schedule

  1. Attention

Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)

Vilém Zouhar - 19.11.2020


  1. Probing

What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties (Conneau et al., 2018)

Christian Cayralat - 19.11.2020


  1. LIME

“Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribeiro et al., 2016)

Jannis Morsch - 26.11.2020


  1. SHAP

A Unified Approach to Interpreting Model Predictions (Lundberg and Lee, 2017)

Sharmila Upadhyaya - 26.11.2020


  1. Perturbations for NLP (Omission)

Representation of Linguistic Form and Function in Recurrent Neural Networks (Kádár et al., 2017)

Annegret Janzso - 03.12.2020


  1. Meaningful Pertubation

Interpretable Explanations of Black Boxes by Meaningful Perturbation (Fong and Vedaldi, 2018)

---


  1. Sensitivity Analysis and Activation Maximization

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps (Simonyan et al., 2014)

Yogesh Kumar Baljeet Singh - 10.12.2020


  1. Sensitivity analysis: Application to NLP

Visualizing and Understanding Neural Models in NLP (Li et al., 2016)

Sohaib Arshid - 10.12.2020


  1. Deconvolution and Perturbations (Occlusion)

Visualizing and understanding Convolutional Networks (Zeiler and Fergus, 2013)

Sangeet Sagar - 17.12.2020


  1. Deconvolution for text

Textual Deconvolution Saliency (TDS): a deep tool box for linguistic analysis (Vanni et al., 2018)

Priyanka Das - 17.12.2020


  1. LRP: Theory

Layer-Wise Relevance Propagation: An Overview (Montavon et al., 2019)

Chapter from Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (see Additional Literature below)

Leonie Lapp - 07.01.2020


  1. DeepLIFT

Learning Important Features Through Propagating Activation Differences (Shrikumar et al., 2017)

---


  1. Integrated gradients

Axiomatic Attribution for Deep Networks (Sundararajan et al., 2017)

Pin-Jie Lin - 14.01.2021


  1. Activation maximization with GANs

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks (Nguyen et al., 2016)

Enea Duka - 21.01.2021


  1. Activation maximization: Application to NLP

Interpretable Textual Neuron Representations for NLP (Poerner et al., 2018)

Daniel Biondi - 21.01.2021


  1. Generating textual explanations: for image classification

Generating Visual Explanations (Hendricks et al., 2016)

Anar Amirli - 28.01.2021


  1. Generating textual explanations: for text classification

Towards Explainable NLP: A Generative Explanation Framework for Text Classification (Liu et al., 2019)

Janaki Viswanathan - 28.01.2021


  1. Influence functions

Understanding black-box predictions via influence functions (Koh and Liang, 2017)

Rricha Jalota - 04.02.2021


  1. Influence functions for NLP

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions (Han et al., 2020)

Joanna Dietinger - 04.02.2021


  1. LRP: Application to NLP

“What is Relevant in a Text Document?”: An Interpretable Machine Learning Approach (Arras et al., 2016)

Hafeez Ullah - 04.02.2021

Additional literature

  • Explainable AI: Interpreting, Explaining and Visualizing Deep Learning

Samek, Wojciech; Montavon, Grégoire; Vedaldi, Andrea; Hansen, Lars Kai and Müller, Klaus-Robert

Available for download here within the MPI or UdS ip-range. The paper copy can be obtained from the semester reserve at the Campus-Bibliothek für Informatik und Mathematik.

Christoph Molnar