Rich Caruana


Friends Don’t Let Friends Deploy Black-Box Models: The Importance of Intelligibility in Machine Learning


Abstract

In machine learning often a tradeoff must be made between accuracy and intelligibility: the most accurate models usually are not very intelligible (e.g., deep nets, boosted trees and random forests), and the most intelligible models usually are less accurate (e.g., logistic regression, small trees and decision lists). This tradeoff often limits the accuracy of models that can safely be deployed in mission-critical applications such as healthcare where being able to understand, validate, edit, and ultimately trust a learned model is important. We have been working on a learning method based on generalized additive models that is as accurate as full complexity models, but more intelligible than linear models. This makes it easy to understand what a model has learned, and also makes it easier to edit the model when it learns inappropriate things. Making it possible for experts to understand and then repair a model is critical because most data has unanticipated landmines. In the talk I’ll present a case study where these high-accuracy GAMs discover surprising patterns in the data that would have made deploying a black-box model risky. I’ll also briefly show how we’re using these models to detect bias in domains where fairness and transparency are paramount.

Bio

Rich Caruana is a Principal Researcher at Microsoft. Before joining Microsoft, Rich was on the faculty in Computer Science at Cornell, at UCLA's Medical School, and at CMU's Center for Learning and Discovery. Rich's Ph.D. is from CMU, where he worked with Tom Mitchell and Herb Simon. His thesis on Multi-Task Learning helped create interest in a new subfield of machine learning called Transfer Learning. Rich received an NSF CAREER Award in 2004 (for Meta Clustering), best paper awards in 2005 (with Alex Niculescu-Mizil), 2007 (with Daria Sorokina), and 2014 (with Todd Kulesza, Saleema Amershi, Danyel Fisher, and Denis Charles), co-chaired KDD in 2007 (with Xindong Wu). His current research focus is on learning for medical decision making, intelligible modeling, deep learning, and computational ecology.