Program

23 September 2022 (all times are in BST)

08:30 - 08:55 Registration
08:55 - 09:00 Introduction
09:00 - 09:40 Ruth Byrne - How people reason about counterfactual explanations for decisions of AI systems
09:40 - 10:20 Gabriel Recchia - Towards aligning language models with human intent on difficult-to-evaluate tasks
10:20 - 11:00 Matt Groh - Identifying the Context Shift between Test Benchmarks and Production Data
11:00 - 11:20 Break
11:20 - 12:00 Claire Woodcock - The impact of explanations on layperson trust in AI applications
12:00 - 12:40 Tomer Ullman - Loopholes as a window into understanding and alignment
12:40 - 14:00 Lunch
14:00 - 14:40 Hana Chockler - Causal explanations: why? how? where? for whom?
14:40 - 15:20 Umang Bhatt - Algorithmic Transparency in Machine Learning
15:20 - 15:40 Break
15:40 - 16:20 Sangeet Khemlani - Tracking spatial relations in the real world
16:20 - 17:00 Tobias Gerstenberg - Varieties of Explanation
17:00 - 17:45 Panel discussion: What can cognitive scientists and machine learning researchers learn from each other when it comes to (explainable) AI? Panelists: Andrew Lampinen, Hana Chockler, Noah Goodman, Ruth Byrne; Moderator: Marko Tešić

24 September 2022 (all times are in BST)

08:30 - 09:00 Registration
09:00 - 09:40 Noah Goodman - Language use as reasoning
09:40 - 10:20 Henry Ashton & Matija Franklin - Understanding intent is non-negotiable for XAI
10:20 - 11:00 Yotam Amitai - Interactive Explanations of Agent Behavior
11:00 - 11:20 Break
11:20 - 12:00 Inga Ibs & Claire Ott - Leveraging Human Optimization Strategies for Explainable AI
12:00 - 12:40 Katie Collins & Cathy Wong - Learning from the structure and flexibility of human explanation for more trustworthy AI
12:40 - 13:00 Closing remarks

Titles and Abstracts

Gabriel Recchia
- Title: Towards aligning language models with human intent on difficult-to-evaluate tasks
- Abstract: For many tasks where training requires human evaluations, evaluation is likely to become more difficult as a model’s performance improves. For example, a human overseer training a language model via reward learning may prefer answers to questions that are true and useful, but as the model's performance on question-answering tasks improves, it may become increasingly difficult for the human to determine which of the model's answers are more true and useful than others. One set of approaches is to ensure that models generate explanations which (appear to) reflect reasoning processes that are easier to evaluate than the model’s final answers. This method can be complemented by training approaches which increase the probability that these chains reflect the actual processes by which a model arrives at its answers, and which make it easier to detect errors in these chains of reasoning. I discuss preliminary work in this direction as well as plans for future research.
Umang Bhatt
- Title: Algorithmic Transparency in Machine Learning
- Abstract: Algorithmic transparency exposes model properties to various stakeholders for purposes that include understanding, improving, and contesting predictions from machine learning (ML) models. To date, most research into algorithmic transparency has predominantly focused on explainability, which provides reasons for a model’s behavior. We first discuss research exploring how stakeholders view and use explainability. We report a gap between explainability in practice and the goal of transparency. Explanations primarily serve internal stakeholders; however, understanding model behavior alone may not be enough for stakeholders to gauge whether a model is wrong or lacks sufficient context to solve the task at hand. Drawing from multiple disciplines, we argue for considering a complementary form of transparency by estimating and communicating the uncertainty associated with model predictions. We then discuss the role of transparency in human-machine teams, specifically in the presence of model errors. We show how transparency information can be used to facilitate appropriate levels of trust in set-valued classifiers. We conclude with a call to re-evaluate the need for transparency when building models that incorporate stakeholder feedback into the learning procedure.
Ruth Byrne
- Title: How people reason about counterfactual explanations for decisions of AI systems
- Abstract: People readily create explanations about how an outcome could have turned out differently, if some aspects of the situation had been different. Their counterfactual explanations differ in important ways from causal explanations. First I consider recent experimental discoveries of differences in how people reason about counterfactuals and causals, including differences in the tendency to focus on controllable actions, and differences in the tendency to consider multiple possibilities. I discuss the implications of these discoveries for the use of counterfactual explanations for decisions by Artificial Intelligence (AI) systems in eXplainable AI (XAI). Next I describe current empirical findings of differences in people’s subjective preferences for counterfactual and causal explanations for decisions by AI systems, in familiar and unfamiliar domains, and their objective accuracy in predicting such decisions. I suggest that experimental evidence from cognitive science can enrich future developments of psychologically plausible automated explanations in XAI.
Claire Woodcock
- Title: The impact of explanations on layperson trust in AI applications
- Abstract: Trust is essential for AI to be successful in society. Take AI healthcare for example. If a smartphone app tells a user not to go to the emergency room, they must trust what they are being told to do in order to follow the instruction. Without that trust, the benefits AI is heralded to bring just won’t deliver. How do we cultivate trust? There are many components but a good explanation is essential. Using a healthcare symptom checker as a case study, I will demonstrate how today’s commercial AI apps are using an explanation format that isn’t natural to laypeople: one that explains how the model derived its decision. I then go on to compare trust engendered by this kind of explanatory presentation against causal explanations across a large sample of laypeople. Tune in to my talk to hear what I found.
Tomer Ullman
- Title: Loopholes as a window into understanding and alignment
- Abstract: Exploiting a loophole takes advantage of the ambiguity of language to do what someone says but not what they want. It's a familiar facet of fable, law, and everyday life. Engaging with loopholes requires a nuanced understanding of goals, social. ambiguity, and value alignment. Scientifically, the development of loopholes can help us better understand human communication, and design better human-AI interactions. However, cognitive research on this behavior remains scarce. I'll discuss several recent experiments that explore the development of loopholes-behavior, together with a proposal for a formal framework and its possible relevance to AI.
Sangeet Khemlani
- Title: Tracking spatial relations in the real world
- Abstract: I describe a novel visual reasoning system that perceives the world by dynamically constructing and updating spatial mental models. These models represent the iconic spatial structure of observations encoded in images and streaming video. The system can be queried with natural language spatial relations, "e.g., focus on what is to the left of the ___" to focus attention on portions of the input imagery in real-time. The system is built on mReasoner, a computational cognitive model of thinking and reasoning. I describe how it can be used to investigate dynamic spatial thinking, and how it's been used for recent applications on an embodied robotic platform.
Tobias Gerstenberg
- Title: Varieties of Explanation
- Abstract: As humans, we spend much of our time going beyond the here and now. We dwell on the past, long for the future, and ponder how things could have turned out differently. In this talk, I will argue that people's knowledge of the world is organized around causally structured mental models, and that much of human thought can be understood as cognitive operations over these mental models. Specifically, I will highlight the pervasiveness of counterfactual thinking in human cognition. Counterfactuals are critical for how people make causal judgments, how they explain what happened, and how they hold others responsible for their actions.
Hana Chockler
- Title: Causal explanations: why? how? where? for whom?
- Abstract: In this talk, I will (briefly) introduce the theory of actual causality as defined by Halpern and Pearl. This theory turns out to be extremely useful in various areas of computer science due to a good match between the results it produces and our intuition. Moreover, applying actual causality to the problem of computing explanations of black box AI systems' decisions was shown to lead to results that are consistent with our intuition and are useful for understanding how AI systems make their decisions. I will show the results on black box image classifiers and discuss future directions. Then, I will discuss preliminary findings in applying causal explanations to other domains, specifically to healthcare and argue that the problem of explainability in healthcare applications is far from solved. The talk is based on a number of papers, and, while not strictly limited to my own research, the topics are quite broad and have been a subject of very active research, so I will be mostly talking about my work with different co-authors. The talk is reasonably self-contained.
Henry Ashton and Matija Franklin
- Title: Understanding intent is non-negotiable for XAI
- Abstract: Understanding the intent behind an other’s actions is a crucial process that humans undertake on a routine basis. The presence or absence of intent is an important modifier when considering the culpability of an actor’s actions and effects. For certain behaviours like deception, the presence or absence of intent defines the very wrongness of the behaviour. In other behaviours which are ambiguous, because they are incomplete or interrupted, understanding their aim is required in order to label them acceptable or not. The pivotal role that intent judgement plays in criminal law reflects the importance that society places on the concept. When the actor is not human but an algorithm, this importance of intent raises a number of problems. Firstly, what does intent mean for an algorithm? Do the folk definitions of intent that experimental psychologists have studied for 25 years translate to the behaviour and cognition of silicon folk? Secondly, even if intent can be quantified in an algorithm, does it really matter when that algorithm has caused some harm? One role of intent in law is to determine the punishment of caused wrongs, but since the punishment of algorithms is conceptually difficult, do people care about the intent in robots the same way they do with humans? We present the results of a series of experiments which attempt to answer some of these questions. We conclude by discussing what the importance of intent might mean for algorithm design and under what conditions intent can be said to exist.
Katie Collins and Cathy Wong
- Title: Learning from the structure and flexibility of human explanation for more trustworthy AI
- Abstract: People are flexible explainers. We can readily conceive of plausible explanations for a range of situations from the mundane (why is the fire alarm going off? how did the grass get wet?) to the completely arbitrary (how did these zoo animals get out? why is there a crater here?). Prior work in cognitive science has long suggested that people have structured – often generative – internal models through which we combine conceptual knowledge of the world with new observations. Many forms of reasoning, like coming up with explanations, can then be viewed as inference over these kinds of richly structured models. We advance this account of human explanation generation and highlight the relevance to the AI community. First, we highlight a contrast between real human behavior and a purely distributional approach of the kind taken by large language models (LLMs) in a linguistic explanatory domain for novel problems. We find that LLMs are not actually robust generators of explanations, particularly for increasingly out-of-distribution problems – which we argue is due in part to the lack of an explicit world model of the kind humans may possess. Next, we present recent work suggesting another avenue towards reverse engineering these rich latent representations: eliciting soft, probabilistic labels from human annotators about inherently ambiguous observations. We show that using these soft labels improves the training efficiency and downstream performance of supervised AI.
Noah Goodman
- Title: Language use as reasoning
- Abstract: tba
Matt Groh
- Title: Identifying the Context Shift between Test Benchmarks and Production Data
- Abstract: Machine learning models are often brittle on production data despite achieving high accuracy on benchmark datasets. Benchmark datasets have traditionally served dual purposes: first, benchmarks offer a standard on which machine learning researchers can compare different methods, and second, benchmarks provide a model, albeit imperfect, of the real world. The incompleteness of test benchmarks (and the data upon which models are trained) hinder robustness in machine learning, en- able shortcut learning, and leave models systematically prone to err on out-of-distribution in general and adversarially perturbed data in specific. The mismatch between a single static benchmark dataset and a production dataset that such a benchmark is designed to emulate is traditionally described as dataset shift and distribution shift with sub-categories including covariate shift, prior probability shift, and concept shift. These data distribution shifts are both over-specified and under-specified for addressing the data generating process underlying the mismatch between two datasets. In this paper, we argue that context shift – semantically meaningful changes in the underlying data generation process of different samples – needs to become a focus in applied machine learning to increase robustness and generalizability of models. This abstract informs further work where we outline three methods for identifying context shift that would otherwise lead to model prediction errors: (1) how human intuition and expert knowledge can identify semantically meaningful features upon which models systematically fail, (2) how dynamic benchmarking with a focus on capturing the data generation process can promote generalizability through corroboration, and (3) how clarifying a model’s limitations can reduce unexpected errors. Robust machine learning is focused on model performance beyond benchmarks, and this abstract frames a research program where we consider three model organism domains – facial expression recognition, deepfake detection, and medical diagnosis – to highlight how implicit assumptions in benchmark tasks lead to errors in practice. By paying close attention to the role of context in prediction task, researchers can design more comprehensive benchmarks, reduce context shift errors, and increase generalization performance.
Yotam Amitai
- Title: Interactive Explanations of Agent Behavior
- Abstract: As reinforcement learning methods increasingly amass accomplishments, the need for comprehending their solutions becomes more crucial. Most explainable reinforcement learning (XRL) methods generate a static explanation depicting their developers' intuition of what should be explained and how. In contrast, literature from the social sciences proposes that meaningful explanations are structured as a dialog between the explainer and the explainee, suggesting a more active role for the user and her communication with the agent. In this paper, we present ASQ-IT – an interactive tool that presents video clips of the agent acting in its environment based on queries given by the user that describe temporal properties of behaviors of interest. Our approach is based on formal methods: queries in ASQ-IT’s user interface map to a fragment of Linear Temporal Logic over finite traces (LTLf), which we developed, and our algorithm for query processing is based on automata theory. We provide experimental results from a user-study aimed at testing ASQ-IT’s usability, and report positive outcomes from both objective performance and self-reported ability of participants to use our tool.
Inga Ibs and Claire Ott
- Title: Leveraging Human Optimization Strategies for Explainable AI
- Abstract: Optimisation is an essential tool for making decisions in complex planning domains such as energy systems. AI methods are indispensable for finding optimal solutions in such domains and explanations for these results are crucial for ensuring trust and applicability. However, even solutions for problems formalised as Linear Programs (LP) and Mixed Integer Linear Programs (MILP), which can be obtained with white-box algorithms, may be hard to interpret and explain. Algorithms for solving LPs – such as interior point or simplex methods – are mathematically well understood and known to find the optimal solution. Unfortunately, this knowledge does not necessarily yield a satisfying explanation as to why a solution is optimal or which aspects of a problem are relevant to finding one. Furthermore, problems often involve many variables and interactions, which are hard to disentangle even for experts. In this talk, we will present an optimisation paradigm – the furniture factory – that enables the analysis of optimisation strategies and problem representations of participants and provides a testbed for methods which generate explanations automatically. A computer game based on this paradigm was used in an experiment on strategy use and representations of optimisation problems. Insights from this experiment can be transferred to explanation methods, which can be verified and tested in further experiments. We will discuss how the paradigm can aid the investigation of goal-oriented explanations and their alignment with human understanding of policies.