Explaining RL Decisions with Trajectories

Shripad Deshmukh1, Arpan Dasgupta1,2, Balaji Krishnamurthy1, Nan Jiang3, Chirag Agarwal1, Georgios Theocharous4, Jayakumar Subramanian1

1Media and Data Science Research, Adobe 2IIIT Hyderabad 3University of Illinois Urbana-Champaign 4Adobe Research

Idea Overview

The verifiability of AI decisions is critical in building accountability and transparency in the decision-making process. In the present work, we focus on attributing an RL agent's decision back to its relevant past experiences. For instance, if a surgical RL robot wants to perform a new gesture while operating, we would like to understand the gesture in terms of relevant data instances from which the agent has learnt to execute such an action. By understanding how the robot makes decisions, we can ensure that it's making safe and effective choices in the operating room.

Proposed Solution

In this work, we restrict ourselves to trajectory attribution in offline RL. We achieve the attribution via simple sensitivity analysis of offline trajectory data for RL decision-making, i.e., understanding changes in the RL agent's decisions as its training data changes. To accomplish this analysis at scale,

We use state-of-the-art RL transformers like decision transformer and trajectory transformer to first form clusters of high-level behaviours, which can later be removed to observe the overall change in the agent's behaviour.
We also present a collective embedding for behaviour clusters based on set encoding. These embeddings for different training datasets can then be used to measure inter-data distances.

Finally, the cluster with the maximum impact on original decision-making but with the least data distance w.r.t. the original dataset would then be attributed for the agent's action. The following figure depicts our approach step-by-step:

Trajectory Attribution in Offline RL. First, we encode trajectories in offline data using sequence encoders and then cluster the trajectories using these encodings. Also, we generate a single embedding for the data. Next, we train explanation policies on variants of the original dataset and compute corresponding data embeddings. Finally, we attribute decisions of RL agents trained on entire data to trajectory clusters using action and data embedding distances.

Experimental Analysis

We test our methodology on four environments: Gridworld, Seaquest, Breakout and HalfCheetah. For training purposes, we use SAC and BCQ algorithms. The results are as follows:

Qualitative Results

Grid-world Trajectory Attribution. RL agent suggests taking action ‘right’ in grid cell (1,1). This action is attributed to trajectories (i), (ii) and (iii) (We denote gridworld trajectory by annotated ∧,∨,>,< arrows for ‘up’, ‘down’, ‘right’, ‘left’ actions respectively, along with the time-step associated with the actions (0-indexed)). We can observe that the RL decisions could be influenced by trajectories distant from the state under consideration, and therefore attributing decisions to trajectories becomes important to understand the decision better.

Seaquest Trajectory Attribution. The agent (submarine) decides to take ‘left’ for the given observation under the provided context. Top-3 attributed trajectories are shown on the right (for each training data traj., we show 6 sampled observations and the corresponding actions). As depicted in the attributed trajectories, the action ‘left’ is explained in terms of the agent aligning itself to face the enemies coming from the left end of the frame.

Breakout Trajectory Attribution. The agent proposes taking ‘RIGHT’ in the given observation frame. The corresponding attribution result shows how the ball coming from left would be played if moved to right.

HalfCheetah Trajectory Attribution. The figure shows agent suggesting torques on different hinges for current position of the cheetah frame. The decision is influenced by the runs of cheetah shown on the right (we show 5 sampled frames for one trajectory). The attributed trajectories explain the torques in terms of the cheetah getting up from the floor.

Quantitative Results

Frequency of behaviour cluster attribution across Seaquest RL agents trained with different algorithms (SAC and BCQ):

Attribution comparison between 2 RL algorithms: We observe that at the aggregate level, the high-level behaviours have a similar influence over decision-making of agents trained with two different offline RL algorithms.

Identifying the behaviours which influence the decision making of a trained Breakout agent:

Breakout Learning insights: Depletion of life due to the ball falling out and corner-shots that lead to forming a tunnel influence the decision-making significantly, around 26% and 31% of the time, respectively.

Human Study

We conducted trajectory attribution exercise for human subjects on a simple grid navigation task. We found that humans wrongly attributed factors influencing an RL decision 30% of the time on average. They also missed out on important trajectories responsible for the action.

This highlights the need for data attribution based XRL techniques.

Cite using BibTex

@inproceedings{

deshmukh2023explaining,

title={Explaining {RL} Decisions with Trajectories},

author={Shripad Vilasrao Deshmukh and Arpan Dasgupta and Balaji Krishnamurthy and Nan Jiang and Chirag Agarwal and Georgios Theocharous and Jayakumar Subramanian},

booktitle={The Eleventh International Conference on Learning Representations },

year={2023},

url={https://openreview.net/forum?id=5Egggz1q575}

}