Beyond Sparsity: Tree-based Regularization of Deep Models for Interpretability

Mike Wu, Stanford University; Michael Hughes, Harvard University; Sonali Parbhoo, University of Basel; Finale Doshi-Velez, Harvard

Abstract: The lack of interpretability remains a key barrier to the adoption of deep models in many healthcare applications. In this work, we explicitly regularize deep models so human users might step through the process behind their predictions in little time. Specifically, we train deep time-series models so their class-probability predictions have high accuracy while being closely modeled by decision trees with few nodes. On two clinical decision-making tasks, we demonstrate that this new tree-based regularization is distinct from simpler L2 or L1 penalties, resulting in more interpretable models without sacrificing predictive power.


Network Analysis for Explanation

Hiroshi Kuwajima, DENSO CORPORATION; Masayuki Tanaka, National Institute of Advanced Industrial Science and Technology

Safety critical systems strongly require the quality aspects of artificial intelligence including explainability. In this paper, we analyzed a trained network to extract features which mainly contribute the inference. Based on the analysis, we developed a simple solution to generate explanations of the inference processes.

Using prototypes to improve convolutional networks interpretability

Thalita Drumond, INRIA Bordeaux Sud-Ouest; Thierry Vieville, INRIA Bordeaux Sud-Ouest; Frederic Alexandre, INRIA Bordeaux Sud-Ouest

We propose a method that allows to interpret the data representation obtained by CNN, through introducing prototypes in the feature space, that are later classified into a certain category. This way we can see how the feature space is structured in link with the classes and the related task.

Predict Responsibly: Increasing Fairness by Learning To Defer

David Madras, University of Toronto; Richard Zemel, University of Toronto; Toniann Pitassi, University of Toronto

Machine learning systems, which are often used for high-stakes decisions, suffer from two mutually reinforcing problems: unfairness and opaqueness. Many popular models, though generally accurate, cannot express uncertainty about their predictions. Even in regimes where a model is inaccurate, users may trust the model's predictions too fully, and allow its biases to reinforce the user's own. In this work, we explore models that learn to defer. In our scheme, a model learns to classify accurately and fairly, but also to defer if necessary, passing judgment to a downstream decision-maker such as a human user. We further propose a learning algorithm which accounts for potential biases held by decision-makers later in a pipeline. Experiments on real-world datasets demonstrate that learning to defer can make a model not only more accurate but also less biased. Even when operated by biased users, we show that deferring models can still greatly improve the fairness of the entire pipeline.

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Qingkai Liang, MIT; Fanyu Que, Boston College; Eytan Modiano, MIT

Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs only use on-policy data for dual updates, which results in sample inefficiency and slow convergence. In this paper, we propose a policy search method for CMDPs called Accelerated Primal-Dual Optimization (APDO), which incorporates an off-policy trained dual variable in the dual update procedure while updating the policy in primal space with on-policy likelihood ratio gradient. Experimental results on a simulated robot locomotion task show that APDO achieves better sample efficiency and faster convergence than state-of-the-art approaches for CMDPs.

Deep Reinforcement Learning for Sepsis Treatment

Aniruddh Raghu, University of Cambridge; Matthieu Komorowski, Imperial College London; Pete Szolovits, MIT; Marzyeh Ghassemi, MIT; Leo Celi, MIT; Imran Ahmed, University of Cambridge

Sepsis is a leading cause of mortality in intensive care units and costs hospitals billions annually. Treating a septic patient is highly challenging, because individual patients respond very differently to medical interventions and there is no universally agreed-upon treatment for sepsis. In this work, we propose an approach to deduce treatment policies for septic patients by using continuous state-space models and deep reinforcement learning. Our model learns clinically interpretable treatment policies, similar in important aspects to the treatment policies of physicians. The learned policies could be used to aid intensive care clinicians in medical decision making and improve the likelihood of patient survival.

Analyzing Feature Relevance for Linear Reject Option SVM using Relevance Intervals

Christina Göpfert, CITEC

When machine learning is applied in safety-critical or otherwise sensitive areas, the analysis of feature relevance can be an important tool to keep the size of models small, and thus easier to understand, and to analyze how different features impact the behavior of the model. In the presence of correlated features, feature relevances and the solution to the minimal-optimal feature selection problem are not unique. One approach to solving this problem is identifying feature relevance intervals that symbolize the range of relevance given to each feature by a set of equivalent models. In this contribution, we address the issue of calculating relevance intervals -- a unique representation of relevance -- for reject option support vector machines with a linear kernel, which have the option of rejecting a data point if they are unsure about its label.

The Neural LASSO: Local Linear Sparsity for Interpretable Explanations

Andrew Ross, Harvard University; Erika Lage, Harvard University; Finale Doshi-Velez, Harvard

Neural networks often perform better on prediction problems than simpler classes of models, but their behavior is difficult to explain. This makes it difficult to trust their predictions in safety critical domains. Recent work has focused on explaining their predictions using local linear approximations, but these explanations can be complex when they depend on many features and it is unclear if they can be used to understand global trends in model behavior. In this work, we train neural networks to have sparse local explanations by applying L1 penalties to their input gradients. We show explanations of these networks depend on fewer inputs while their performance remains comparable across datasets and architectures. We illustrate how our approach encourages a different kind of sparsity than L1 weight decay. In a case study with ICU data, we observe that gradients vary smoothly over the input space, which suggests they can be used to gain insight into the global behavior of the model.

Safe Policy Search with Gaussian Process Models

Kyriakos Polymenakos, University of Oxford; Stephen Roberts, Oxford; Alessandro Abate , University of Oxford

We propose a method to optimise the parameters of a policy which will be used to safely perform a given task in a data-efficient manner. We train a Gaussian process model to capture the system dynamics, based on the PILCO framework. Our model has useful analytic properties, which allow closed form computation of error gradients and estimating the probability of violating given state space constraints. During training, as well as operation, only policies that are deemed safe are implemented on the real system, minimising the risk of failure.

Detecting Bias in Black-Box Models Using Transparent Model Distillation

Sarah Tan, Cornell University; Rich Caruana, Microsoft Research; Giles Hooker, Cornell University; Yin Lou, Airbnb, Inc.

Black-box risk scoring models permeate our lives, yet are typically proprietary and opaque. We propose a transparent model distillation approach to detect bias in such models. Model distillation was originally designed to distill knowledge from a large, complex model (teacher model) to a faster, simpler model (student model) without significant loss in prediction accuracy. We add a third restriction - transparency. We use data sets with two labels to train on: risk score from a black-box model, as well as actual outcome the risk score was intended to predict. For a particular class of student models - interpretable tree additive models (GA2Ms) - we provide confidence intervals for the difference between the risk score and actual outcome models. This presents a new method for detecting bias in black-box risk scores by assessing if contributions of protected features to the risk score are statistically different from their contributions to the actual outcome.

Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

Jack Lanchantin, University of Virginia; Ritambhara Singh, University of Virginia; Beilun Wang, University of Virginia; Yanjun Qi, University of Virginia

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models using three visualization methods: saliency maps, temporal output scores, and class optimization. In addition to providing insights as to how each model makes its prediction, the visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

Data masking for privacy-sensitive learning

Anh Pham, Oregon State University; Shalini Ghosh, SRI International; Vinod Yegneswaran, SRI International

We study the problem of data release with privacy, where data is made available with privacy guarantees while keeping the usability of the data as high as possible. This is important in healthcare and other domains with sensitive data. In particular, we propose a method of masking sensitive parts of private data while ensuring that a learner trained using the masked data is similar to the learner trained on the original data, to maintain usability. We provide theoretical guarantees about the lower bound of the distance between the masked and the true underlying data, and illustrate the effectiveness of the proposed method of data masking for privacy-sensitive learning on synthetic and real-world data.

CLEAR-DR: Interpretable Computer Aided Diagnosis of Diabetic Retinopathy

Devinder Kumar, University of Waterloo; Graham Taylor, University of Guelph; Alexander Wong, University of Waterloo

One of the main limitations with current Computer Aided Diagnosis (CAD) approaches is that it is very difficult to gain insight or rationale as to how decisions are made, thus limiting their utility to clinicians. In this study, we propose CLEAR-DR, a novel interpretable CAD system based on the notion of CLass-Enhanced Attentive Response (CLEAR) for the purpose of clinical decision support for Diabetic Retinopathy (DR).In addition to disease grading, the CLEAR-DR system also produces a visual interpretation of the decision-making process to provide better insight and understanding into the decision-making process of the system. We demonstrate the effectiveness and utility of the proposed CLEAR-DR system of enhancing the interpretability of diagnostic grading results for the application of diabetic retinopathy grading.

Manipulating and Measuring Model Interpretability

Forough Poursabzi-Sangdeh, University of Colorado Boulder; Daniel G. Goldstein, Microsoft Research; Jake Hofman, Microsoft Research; Jennifer Wortman Vaughan, Microsoft Research; Hanna Wallach, Microsoft Research

Despite recent interest in interpretable machine learning methods, there is still disagreement around what interpretability means. We believe that this is because interpretability is not something that can be directly manipulated or measured. Rather, interpretability is a latent property that can be influenced by different manipulable factors and that impacts different measurable outcomes. We therefore argue that to understand interpretability, it is necessary to manipulate and measure the influence that different factors have on real people's abilities to complete tasks. We run a large-scale randomized experiment, varying two factors that are thought to make models more or less interpretable -- the number of features and whether the model is clear or black box -- and measuring how these changes impact lay people's decision making. We view this experiment as a first step toward a larger agenda aimed at quantifying and measuring the impact of factors that influence interpretability.


We invite submissions of full papers on machine learning applications in safety critical domains, with a focus on healthcare and biomedicine. Research topics of interest include, but are not restricted to the following list:

  • Feature extraction/selection for more interpretable models
  • Reinforcement learning and safety in AI
  • Interpretability of neural network architectures
  • Learning from adversarial examples
  • Transparency and its impact
  • Trust in decision making
  • Integration of medical experts knowledge in machine learning-based medical decision support systems
  • Decision making in critical care and intensive care units
  • Human safety in machine learning systems
  • Ethics in robotics
  • Privacy and anonymity vs. interpretability in automated individual decision making
  • Interactive visualisation and model interpretability


  • Submission deadline: 29th of October, 2017
  • Acceptance notification: 14th of November, 2017
  • Travel award application (registration ticket): 16th November
  • Camera ready due: 28th of November, 2017


All submissions will be made in PDF format, with a limit to four pages, including figures and tables, excluding references, in NIPS style. Formatting instructions are provided in the NIPS website. The reviewing process will be blind.

The workshop allows re-submissions of already published work, and double submission.

Submissions can be made through the CMT system [HERE] (Note that you must create a CMT account in case you do not have one already)

Accepted papers (plus the optional supplementary material) will be made available on the workshop website as non archival reports.

All accepted papers will be presented at the workshop during the poster sessions. The posters dimensions are 36 x 48 in. (91cm x 122cm).

A selected number of accepted papers will be presented during the oral session. The remaining accepted papers will be allocated a slot during the poster spotlight session.


We offer a limited number of free registration tickets to be assigned to workshop participants, awarded as a refund (4 tickets provided by the NIPS Fundation, extra tickets provided by our sponsor Mind Foundry). We particularly encourage applications from PhD students.

Our sponsor Mind Foundry will provide a BEST PAPER AWARD which will be announced during the workshop.