Explainability Methods for NNs

Explainability Methods for Neural Networks (WS 2020/2021)

Lecturer: Daria Pylypenko

Time: Thursday 14:15-15:45

First session: 05.11.2020

Location: MS Teams

Deep Neural Networks (DNNs) are very powerful. However, they are often considered "opaque", in the sense that it is not always easy to see how they come to a particular decision. Mathematically, of course, DNNs are fully transparent: they are just a mix of matrix algebra operations and non-linearities. But they are not always easy to interpret in terms of concepts that are broadly accessible to humans. The seminar will focus on methods that can help explain how neural networks "reason" while performing certain tasks, what they learn and which information they use for making predictions. The aim is to examine general methods for interpreting neural network based models, with a focus on methods that can be applied to NLP tasks.

Students will make presentations about research papers.

Prerequisites: familiarity with neural networks (feedforward, recurrent, convolutional), backpropagation, calculus, and linear algebra.

Prior registration

Schedule

Additional literature

Prior registration

Maximal number of participants: 20 (10 from CS, 10 from CoLi).

CS students:

Please register here: https://seminars.cs.uni-saarland.de/seminars2021.

CoLi students (Language Science and Technology, Computerlinguistik, LCT):

Please send an email to: daria dot pylypenko at uni-saarland dot de

Deadline: ~~October 26th 23:59 CET~~ ~~Extended: November 4th 23:59 CET~~ Booked.

Schedule

Attention

Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)

Vilém Zouhar - 19.11.2020

Probing

What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties (Conneau et al., 2018)

Christian Cayralat - 19.11.2020

LIME

“Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribeiro et al., 2016)

Jannis Morsch - 26.11.2020

SHAP

A Unified Approach to Interpreting Model Predictions (Lundberg and Lee, 2017)

Sharmila Upadhyaya - 26.11.2020

Perturbations for NLP (Omission)

Representation of Linguistic Form and Function in Recurrent Neural Networks (Kádár et al., 2017)

Annegret Janzso - 03.12.2020

Meaningful Pertubation

Interpretable Explanations of Black Boxes by Meaningful Perturbation (Fong and Vedaldi, 2018)

---

Sensitivity Analysis and Activation Maximization

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps (Simonyan et al., 2014)

Yogesh Kumar Baljeet Singh - 10.12.2020

Sensitivity analysis: Application to NLP

Visualizing and Understanding Neural Models in NLP (Li et al., 2016)

Sohaib Arshid - 10.12.2020

Deconvolution and Perturbations (Occlusion)

Visualizing and understanding Convolutional Networks (Zeiler and Fergus, 2013)

Sangeet Sagar - 17.12.2020

Deconvolution for text

Textual Deconvolution Saliency (TDS): a deep tool box for linguistic analysis (Vanni et al., 2018)

Priyanka Das - 17.12.2020

LRP: Theory

Layer-Wise Relevance Propagation: An Overview (Montavon et al., 2019)

Chapter from Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (see Additional Literature below)

Leonie Lapp - 07.01.2020

DeepLIFT

Learning Important Features Through Propagating Activation Differences (Shrikumar et al., 2017)

---

Integrated gradients

Axiomatic Attribution for Deep Networks (Sundararajan et al., 2017)

Pin-Jie Lin - 14.01.2021

Activation maximization with GANs

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks (Nguyen et al., 2016)

Enea Duka - 21.01.2021

Activation maximization: Application to NLP

Interpretable Textual Neuron Representations for NLP (Poerner et al., 2018)

Daniel Biondi - 21.01.2021

Generating textual explanations: for image classification

Generating Visual Explanations (Hendricks et al., 2016)

Anar Amirli - 28.01.2021

Generating textual explanations: for text classification

Towards Explainable NLP: A Generative Explanation Framework for Text Classification (Liu et al., 2019)

Janaki Viswanathan - 28.01.2021

Influence functions

Understanding black-box predictions via influence functions (Koh and Liang, 2017)

Rricha Jalota - 04.02.2021

Influence functions for NLP

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions (Han et al., 2020)

Joanna Dietinger - 04.02.2021

LRP: Application to NLP

“What is Relevant in a Text Document?”: An Interpretable Machine Learning Approach (Arras et al., 2016)

Hafeez Ullah - 04.02.2021

Additional literature

Explainable AI: Interpreting, Explaining and Visualizing Deep Learning

Samek, Wojciech; Montavon, Grégoire; Vedaldi, Andrea; Hansen, Lars Kai and Müller, Klaus-Robert

Available for download here within the MPI or UdS ip-range. The paper copy can be obtained from the semester reserve at the Campus-Bibliothek für Informatik und Mathematik.

Interpretable Machine Learning. A Guide for Making Black Box Models Explainable

Christoph Molnar

Page updated

Google Sites

Report abuse