ICML 2016, will bring together researchers who study the interpretability of predictive models, develop interpretable machine learning algorithms, and develop methodology to interpret black-box machine learning models (e.g., post-hoc interpretations). They will exchange ideas on these and allied topics, including:
We have a very nice room in a very nice building, but that also means extra security check at the lobby. ;)
Title: Friends Don’t Let Friends Deploy Models They Don’t Understand
Deploying unintelligible black-box machine learned models is risky --- high accuracy on a test set is NOT sufficient. Unfortunately, the most accurate models usually are not very intelligible (e.g., random forests, boosted trees, and neural nets), and the most intelligible models usually are less accurate (e.g., linear or logistic regression). This tradeoff limits the accuracy of models that can be deployed in mission-critical applications such as healthcare where being able to understand, validate, edit, and ultimately trust the learned model is important. We’re developing a learning method based on generalized additive models (GAMs) that is as accurate as full complexity models, but as intelligible as linear/logistic regression models. I'll present two case studies where these high-performance generalized additive models (GA2Ms) yield state-of-the-art accuracy on healthcare problems while remaining intelligible. In the pneumonia case study, the intelligible model uncovers surprising patterns in the data that previously prevented other black-box models from being deployed, but because it is intelligible and modular allows these patterns to easily be recognized and removed. In the 30-day hospital readmission case study, we show that the same methods scale to large datasets containing hundreds of thousands of patients and thousands of attributes while remaining intelligible and providing accuracy comparable to the best (unintelligible) machine learning models.
Title: Simplicity in human concept learning
The closest parallel to machine learning in human cognition is what psychologists call concept learning, the process by which human learners induce categories from objects they observe. In this talk I will discuss the role of the simplicity principle (Occam’s razor) in psychological models of concept learning and categorization. Despite some notice in the early days of concept learning research, for several decades simplicity criteria played very little role in dominant models of human categorization, which was instead dominated by “exemplar models” based on similarity comparisons with numerous stored examples. However it can be shown that exemplar models overfit training data relative to human learners, in some cases dramatically; that is, they allow categorization hypotheses that are overly complex compared to the human solution. Instead, human learning seems to rely heavily on various kinds of simplicity principles, which “regularize” human induction in a way that makes it both more cognitively tractable and also, perhaps, more effective. Machine learning research may benefit by pursuing closer parallels with human concept learning with regard to complexity minimization and associated computational procedures.
Title: Counterfactual Inference for Consumer Choice Across Many Products
Authors: Susan Athey, David Blei, Robert Donnelly, Francisco Ruiz, and Dustin Tran
In this paper, we develop a model of consumer choice across a large number of products. In contrast to most of the economics and marketing literature, which focuses on choices among a small set of substitutable products in a narrow category, we analyze choices across a large number of products that are not close substitutes for one another, such as different categories of products in a grocery store (e.g. potato chips, lemon-lime soft drinks, or organic apples). Our goal is to make counterfactual inferences about how consumer choices and welfare would change if, for example, prices or product availability change for a category. Our model differs from most economic models in that we model preferences for a large number of products in a single model; we attempt to capture preferences about many products in a lower-dimensional utility function where consumers have preferences about characteristics of products. Our model is designed for settings where the same consumers are observed over time making consumption choices about a large set of products. In our model, a consumer's utility from consuming a product is determined by individual-specific latent preferences for latent product characteristics and an idiosyncratic shock. Some product characteristics and user characteristics may also be observed. We also allow for shocks to utility that are common across a group of users, and vary by product and time period (e.g., date), to incorporate the idea that demand for products may depend on factors such as holidays. We show how to evaluate the assumptions required for our parameter estimates to have a causal interpretation.
Title: Interpretability and Measurement
Title: Provenance and Contracts in Machine Learning
This talk poses two questions. The first question is: Why did the model make a certain prediction? I will discuss the importance of making a prediction via the correct means, which not only provides human interpretability but also more robust generalization. For example, a question answering system should not only be able to answer the question but to justify the answer with the proper provenance. The second question is: How should we reason about a model's behavior? The implicit contract in machine learning is that if the training data looks like the test data, then we will get good generalization. But this contract is often broken in practice. We discuss two alternative contracts: one based on the ability to say "don't know", which allows us to obtain 100% precision when the model is well-specified, and the other based on leveraging conditional independence structure, which allows us to perform unsupervised risk estimation.
List of Papers [download ALL manuscripts (one big PDF)]
The latest trend in machine learning is to use very sophisticated systems involving deep neural networks with many complex layers, kernel methods, and large ensembles of diverse classifiers. While such approaches produce impressive, state-of-the art prediction accuracies, they give little comfort to decision makers, who must trust their output blindly because very little insight is available about their inner workings and the provenance of how the decision was made.
Therefore, in order for predictions to be adopted, trusted, and safely used by decision makers in mission-critical applications, it is imperative to develop machine learning methods that produce interpretable models with excellent predictive accuracy. It is in this way that machine learning methods can have impact on consequential real-world applications.
Workshop: June 23, 2016ICML format up to 4 pages in length with 1 additional page containing only acknowledgements and references. The review process will be single blind and thus the submissions need not be anonymized.
We invite submissions of full papers (maximum 4 pages excluding references and acknowledgements) as well as works-in-progress, position papers, and papers describing open problems and challenges. Papers must be formatted using the ICML template and submitted online via: