Machine Learning Explainability and Robustness: Connected at the Hip
KDD 2021 Tutorial
This tutorial examines the synergistic relationship between explainability methods for machine learning and a significant problem related to model quality: robustness against adversarial perturbations. We begin with a broad overview of approaches to explainable AI, before narrowing our focus to post-hoc explanation methods for predictive models. We discuss perspectives on what constitutes a good explanation in various settings, with an emphasis on axiomatic justifications for various explanation methods. In doing so, we will highlight the importance of an explanation method's faithfulness to the target model, as this property allows one to distinguish between explanations that are unintelligible because of the method used to produce them, and cases where a seemingly poor explanation points to model quality issues. Next, we introduce concepts surrounding adversarial robustness, including state-of-the-art adversarial attacks as well as a range of corresponding defenses. Finally, building on the knowledge presented thus far, we present key insights from the recent literature on the connections between explainability and adversarial robustness. We show that many commonly-perceived issues in explanations are actually caused by a lack of robustness. At the same time, we show that a careful study of adversarial examples and robustness can lead to models whose explanations better appeal to human intuition and domain knowledge.
Robustness cpatures Conceptual Soundness
Explainability and Model Robustness
Section I: Foundation of Good Explanations
Background of XAI Methods
Evaluation Criteria for XAI Methods
Demo: TruLens library for Model Explanations
Section II: Foundation of Adversarial Robustness
Adversarial Attacks for Deep Neural Networks
Building Robust Deep Nueral Networks
Section III: Connecting Explainability and Robustness
Can We Trust Explanations
Why Robust Models are More Explainable - A Feature Perspective
Why Robust Models are More Explainable - A Geometric Perspective
Demo: Boundary-based Explanations
The target audience of this tutorial are ML researchers, practitioners, and policy makers who are interested in explainablity and robustness in deep learning or traditional statistical ML models. As a result, basic understanding of the theories and architectures of common models, such as Convolution Neural Networks (CNN) and decision trees, are strongly recommended. Since we will include live-demos of some explanation tools, basic understanding of usage and applications of machine learning frameworks such as scikit-learn or tensorflow is recommended but not required.
Anupam Datta Carnegie Mellon University
Anupam Datta is a Professor of Electrical & Computer Engineering and Computer Science at Carnegie Mellon University, Co-founder and Chief Scientist of Truera, Director of the Accountable Systems Lab. He received his Ph.D. of Computer Science from Stanford University. His research focuses on enabling real-world complex systems to be accountable for their behavior, especially as they pertain to privacy, fairness, and security.
Matt Fredrikson Carnegie Mellon University
Matt Fredrikson is an Assistant Professor of Computer Science at Carnegie Mellon University, where his research aims to make machine learning systems more accountable and reliable by addressing fundamental problems of security, privacy, and fairness that emerge in real-world settings.
Shayak Sen Truera
Shayak Sen is Co-founder and Chief Technology Officer of Truera, a startup providing enterprise-class platform that delivers explainability for Machine Learning models. Shayak obtained his PhD in Computer Science from Carnegie Mellon University where his research aims to make machine learning and big data systems more explainable, privacy compliant, and fair.
Klas Leino Carnegie Mellon University
Klas Leino is a PhD candidate in the Accountable Systems Lab at Carnegie Mellon University, advised by Matt Fredrikson. His research primarily concentrates on demystifying deep learning and understanding its weaknesses and vulnerabilities in order to improve the security, privacy, and transparency of deep neural networks
Kaiji Lu Carnegie Mellon University
Kaiji Lu is a fourth-year Ph.D. student in Electrical and Computer Engineering at Carnegie Mellon University. His research focuses on explainability, fairness and transparency of deep learning models, particular those with applications in Natural Language Processing (NLP).
Zifan Wang Carnegie Mellon University
Zifan Wang is a third-year student in the Accountable System Lab at Carnegie Mellon University, co-advised by Anupam Datta and Matt Fredrikson. His concentrations include explanation tools for deep nueral networks and its applicaiton for Computer Vision tasks.
We present a new library, TruLens, containing attribution and interpretation methods for deep neural networks. To quickly play around with the TruLens library written for pytorch or Tensorflow Keras models, check out the following CoLab notebooks:
With support of Trulens, we also include the demo of