Explainable AI in Industry
(KDD 2019 Tutorial)
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we will present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we will focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning application domains such as search & recommendation systems, sales, lending, and fraud detection. Finally, based on our experiences in industry, we will identify open problems and research directions for the data mining / machine learning community.
Tutorial Outline and Description
The tutorial will consist of two parts: foundations including motivation, definitions, models, algorithms, tools, and evaluation for explainability in AI/ML systems (1.5 hours) and case studies across different companies, spanning application domains such as search & recommendation systems, sales, lending, and fraud detection (1.5 to 2 hours).
- Need for Transparency and Explainability in AI
- Model Validation: Validation metrics, such as classification accuracy, are an incomplete description of most real-world tasks.
- Scientific Consistency (beyond statistical consistency)
- Feature Importance
- Model Internals
- Explaining by Examples
- Intrinsic Interpretable Models
- Explaining Model Behavior Globally. A global surrogate model is an interpretable model that is trained to approximate the predictions of a black box model.
- Explaining Model Behavior Locally. Local surrogate models are interpretable models that are used to explain individual predictions of black box machine learning models.
- Example-based explanation methods select particular instances of the dataset to explain the behavior of machine learning models or to explain the underlying data distribution.
- Explaining Model Differences
Explainability By Design
- Designing explainable models for prediction.
- LIME and its variants such as xLIME
- Feature Importance (Random Forest)
Evaluation of Explainability
- Coverage / Representativeness
- Human friendliness (concepts easier for humans to understand)
Case Studies (including practical challenges and lessons learned during deployment in industry)
We will present case studies across different companies, spanning application domains such as search & recommendation systems, sales, lending, and fraud detection.
This tutorial is aimed at attendees with a wide range of interests and backgrounds, including researchers interested in knowing about model interpretability and explainability in AI, key regulations / laws, and explainability notions / techniques as well as practitioners interested in implementing explainable models for web-scale machine learning and data mining applications. We will not assume any prerequisite knowledge, and present the intuition underlying various explainability notions and techniques to ensure that the material is accessible to all KDD attendees.
Related Tutorials / References
Krishna Gade is the founder and CEO of Fiddler Labs, an enterprise startup building an explainable AI engine to address problems regarding bias, fairness, and transparency in AI. An entrepreneur and engineering leader with a strong technical experience of creating scalable platforms and delightful consumer products, Krishna previously held senior engineering leadership roles at Facebook, Pinterest, Twitter, and Microsoft. He has given several invited talks at prominent practitioner forums, including a talk on addressing bias, fairness, and transparency in AI at Strata Data Conference, 2019.
Sahin Cem Geyik has been part of the Careers/Talent AI teams at LinkedIn over the past three years, focusing on personalized and fairness-aware recommendations across several LinkedIn Talent Solutions products. Prior to LinkedIn, he was a research scientist at Turn Inc., an online advertising startup which was later acquired by Amobee, a subsidiary of Singtel. He received his Ph.D. degree in Computer Science from Rensselaer Polytechnic Institute in 2012, and his Bachelor degree in Computer Engineering in 2007 at Bogazici University, Istanbul/Turkey. Sahin worked on various research topics in ML spanning over Online Advertising Models and Algorithms, Recommender and Search Systems, Fairness-aware ML, and Explainability. He also has performed extensive research in Systems domain, which resulted in multiple publications in Ad-hoc/Sensor Networks and Service-Oriented Architecture fields. Sahin has authored papers in several top-tier conferences and journals such as KDD, WWW, INFOCOM, SIGIR, ICDM, CIKM, IEEE TMC, IEEE TSC, and presented his work in multiple external venues.
Krishnaram Kenthapadi is part of the AI team at LinkedIn, where he leads the fairness, transparency, explainability, and privacy modeling efforts across different LinkedIn applications. He also serves as LinkedIn's representative in Microsoft's AI and Ethics in Engineering and Research (AETHER) Committee. He shaped the technical roadmap and led the privacy/modeling efforts for LinkedIn Salary product, and prior to that, served as the relevance lead for the LinkedIn Careers and Talent Solutions Relevance team, which powers search/recommendation products at the intersection of members, recruiters, and career opportunities. Previously, he was a Researcher at Microsoft Research Silicon Valley, where his work resulted in product impact (and Gold Star / Technology Transfer awards), and several publications/patents. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006, and his Bachelors in Computer Science from IIT Madras. He serves regularly on the program committees of KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. He received the CIKM best case studies paper award, the SODA best student paper award, and the WWW best paper award nomination. He has published 35+ papers, with 2500+ citations and filed 130+ patents. He has taught tutorials and presented lectures on privacy & fairness in industry at forums such as KDD 2018 and WSDM 2019, instructed a course on artificial intelligence at Stanford, and given several talks on his research work.
Varun Mithal is an AI researcher at LinkedIn, where he works on jobs and hiring recommendations. Prior to joining LinkedIn, he received his PhD in Computer Science from University of Minnesota-Twin Cities, and his Bachelors in Computer Science from Indian Institute of Technology, Kanpur. He has developed several algorithms to identify rare classes and anomalies using unsupervised change detection as well as supervised learning from weak labels. His thesis also explored machine learning models for scientific domains that incorporate physics-based constraints and makes them interpretable for domain scientists. He has published 20 papers with 350+ citations. His work has appeared in top-tier data mining conferences and journals such as IEEE TKDE, AAAI, and ICDM.
Ankur Taly is the Head of Data Science at Fiddler labs, where he is responsible for developing and evangelizing core explainable AI technology. Previously, he was a Staff Research Scientist at Google Brain where he carried out research in explainable AI, and was most well-known for his contribution to developing and applying Integrated Gradients (220+ citations) — a new interpretability algorithm for Deep Networks. His research in this area has resulted in publications at top-tier machine learning conferences (ICML 2017, ACL 2018), and prestigious journals like the American Academy of Ophthalmology (AAO) and Proceedings of the National Academy of Sciences (PNAS). He also given invited talks (Slides, Video) at several academic and industrial venues, including, UC Berkeley (DREAMS seminar), SRI International, Dagstuhl seminar, and Samsung AI Research. Besides explainable AI, Ankur has a broad research background, and has published 25+ papers in several other areas including Computer Security, Programming Languages, Formal Verification, and Machine Learning. He has served on several conference program committees (PLDI 2014 and 2019, POST 2014, PLAS 2013), taught guest lectures at graduate courses, and instructed a short course on distributed authorization at the FOSAD summer school in 2016. Ankur obtained his Ph.D. in computer science from Stanford University in 2012 and a B. Tech in CS from IIT Bombay in 2007.