Lecture time: TTh 11:30 am - 12:50 pm
Location: Bill & Melinda Gates Center (CSE2) G04
Instructor: Su-In Lee
Teaching assistant: Patrick Yu
This course is about explainable artificial intelligence (XAI), a subfield of machine learning that provides transparency for complex models. Modern machine learning relies heavily on black-box models like tree ensembles and deep neural networks; these models provide state-of-the-art accuracy, but they make it difficult to understand the features, concepts, and data examples that drive their predictions. As a consequence, it's difficult for users, experts, and organizations to trust such models, and it's challenging to learn about the underlying processes we're modeling.
In response, some argue that we should rely on inherently interpretable models in high-stakes applications, such as medicine and consumer finance. Others advocate for post hoc explanation tools that provide a degree of transparency even for complex models. This course explores both perspectives, and we'll discuss a wide range of tools that address different questions about how models make predictions. We'll cover many active research areas in the field, including feature attribution, counterfactual explanations, instance explanations, and human-AI collaboration.
The course is structured as follows:
Introduction and motivation (1 week)
Feature importance: removal-based explanations including Shapley values, propagation-based explanations, unifying principles, evaluation metrics (4 weeks)
Other explanation paradigms: inherently interpretable models, concept explanations, counterfactual explanations, instance explanations, neuron interpretation (3 weeks)
Human-AI collaboration (1 week)
Applications in medicine (1 week)