Towards Interpretable Reinforcement Learning Using Attention Augmented Agents
On this site you can find videos for the NeurIPS 2019 submission.
Layout of the Reinforcement Learning Videos
Figure 2: Basic Attention Patterns
![](https://www.google.com/images/icons/product/drive-32.png)
Seaquest
![](https://www.google.com/images/icons/product/drive-32.png)
Star Gunner
Figure 3: Reaction to Novel States
![](https://www.google.com/images/icons/product/drive-32.png)
Reaction to inserted enemies
The reaction of an agent trained on Seaquest to the introduction of a new enemy (fish).
Note that the fish is inserted at the pixel level, not at the engine level, so the agent can't actually interact with it.
Figure 4: Forward Planning / Scanning
![](https://www.google.com/images/icons/product/drive-32.png)
Ms Pacman
![](https://www.google.com/images/icons/product/drive-32.png)
Alien
Figure 5: Trip Wires
![](https://www.google.com/images/icons/product/drive-32.png)
Breakout
![](https://www.google.com/images/icons/product/drive-32.png)
Space Invaders
Figure 6: What/Where
![](https://www.google.com/images/icons/product/drive-32.png)
Enduro: What-Where
This video shows the attention maps on enduro colored depending on whether the query is more spatial-based (blue), more content-based (red), or balanced between the two (white). We do this by computing for each pixel the sum of the logits in the spatial channels and in the content channels and taking the difference of the logits. We truncate the difference in the range [-log(10), log(10)] and then weight each pixel by the attention weights for the frame. In each frame the minimum value is assigned bright blue and the maximum is assigned bright red.
Figure 7: Saliency Analysis
![](https://www.google.com/images/icons/product/drive-32.png)
Ms Pacman: Saliency Maps
In this video we show the saliency maps on Ms Pacman for a baseline agent and for our attention agent. For the attention agent, we also show the most similar attention map on the same frame. The frames are not aligned between the agents (because the agents act with different policy), but they go through a range of similar situations.