Machine Learning Explainability and Robustness: Connected at the Hip

KDD 2021 Tutorial


Time: 4:00 - 7:00 PM EDT, Aug 14, 2021

Location: Virtual on Zoom (see KDD webpage)

Slides: View Online

Recording: (comming soon)

Tutorial Description

This tutorial examines the synergistic relationship between explainability methods for machine learning and a significant problem related to model quality: robustness against adversarial perturbations. We begin with a broad overview of approaches to explainable AI, before narrowing our focus to post-hoc explanation methods for predictive models. We discuss perspectives on what constitutes a good explanation in various settings, with an emphasis on axiomatic justifications for various explanation methods. In doing so, we will highlight the importance of an explanation method's faithfulness to the target model, as this property allows one to distinguish between explanations that are unintelligible because of the method used to produce them, and cases where a seemingly poor explanation points to model quality issues. Next, we introduce concepts surrounding adversarial robustness, including state-of-the-art adversarial attacks as well as a range of corresponding defenses. Finally, building on the knowledge presented thus far, we present key insights from the recent literature on the connections between explainability and adversarial robustness. We show that many commonly-perceived issues in explanations are actually caused by a lack of robustness. At the same time, we show that a careful study of adversarial examples and robustness can lead to models whose explanations better appeal to human intuition and domain knowledge.



  • Robustness cpatures Conceptual Soundness

  • Explainability and Model Robustness

Section I: Foundation of Good Explanations

  • Background of XAI Methods

  • Evaluation Criteria for XAI Methods

  • Demo: TruLens library for Model Explanations

Section II: Foundation of Adversarial Robustness

  • Adversarial Attacks for Deep Neural Networks

  • Building Robust Deep Nueral Networks

Section III: Connecting Explainability and Robustness

  • Can We Trust Explanations

  • Why Robust Models are More Explainable - A Feature Perspective

  • Why Robust Models are More Explainable - A Geometric Perspective

  • Demo: Boundary-based Explanations


The target audience of this tutorial are ML researchers, practitioners, and policy makers who are interested in explainablity and robustness in deep learning or traditional statistical ML models. As a result, basic understanding of the theories and architectures of common models, such as Convolution Neural Networks (CNN) and decision trees, are strongly recommended. Since we will include live-demos of some explanation tools, basic understanding of usage and applications of machine learning frameworks such as scikit-learn or tensorflow is recommended but not required.


Anupam Datta Carnegie Mellon University

Anupam Datta is a Professor of Electrical & Computer Engineering and Computer Science at Carnegie Mellon University, Co-founder and Chief Scientist of Truera, Director of the Accountable Systems Lab. He received his Ph.D. of Computer Science from Stanford University. His research focuses on enabling real-world complex systems to be accountable for their behavior, especially as they pertain to privacy, fairness, and security.

Matt Fredrikson Carnegie Mellon University

Matt Fredrikson is an Assistant Professor of Computer Science at Carnegie Mellon University, where his research aims to make machine learning systems more accountable and reliable by addressing fundamental problems of security, privacy, and fairness that emerge in real-world settings.

Shayak Sen Truera

Shayak Sen is Co-founder and Chief Technology Officer of Truera, a startup providing enterprise-class platform that delivers explainability for Machine Learning models. Shayak obtained his PhD in Computer Science from Carnegie Mellon University where his research aims to make machine learning and big data systems more explainable, privacy compliant, and fair.

Klas Leino Carnegie Mellon University

Klas Leino is a PhD candidate in the Accountable Systems Lab at Carnegie Mellon University, advised by Matt Fredrikson. His research primarily concentrates on demystifying deep learning and understanding its weaknesses and vulnerabilities in order to improve the security, privacy, and transparency of deep neural networks

Kaiji Lu Carnegie Mellon University

Kaiji Lu is a fourth-year Ph.D. student in Electrical and Computer Engineering at Carnegie Mellon University. His research focuses on explainability, fairness and transparency of deep learning models, particular those with applications in Natural Language Processing (NLP).

Zifan Wang Carnegie Mellon University

Zifan Wang is a third-year student in the Accountable System Lab at Carnegie Mellon University, co-advised by Anupam Datta and Matt Fredrikson. His concentrations include explanation tools for deep nueral networks and its applicaiton for Computer Vision tasks.


We present a new library, TruLens, containing attribution and interpretation methods for deep neural networks. To quickly play around with the TruLens library written for pytorch or Tensorflow Keras models, check out the following CoLab notebooks:

With support of Trulens, we also include the demo of

Shared-source Libraries

  • TruLens: cross-framework library for deep learning explainability.

  • GloRo Nets: library for training networks that are provably robust to small-norm adversarial examples.

  • Boundary Attributions: implementation of boundary-based explanations.