Hard Attention Control by Mutual Information Maximization

Abstract

Biological agents have adopted the principle of attention to limit the rate of incoming information from the environment. The question arises that if an artificial agent has access to only a limited view of its surroundings, how can it use attention to effectively solve tasks? We propose an approach for learning how to control a hard attention window by maximizing the mutual information between the environment state and the attention location at each step. The agent employs an internal world model to make predictions and focuses attention towards where the predictions may be wrong. Attention is trained jointly with a dynamic memory architecture that stores partial observations and keeps track of the unobserved state. We demonstrate that our approach is effective in predicting the full state from a sequence of partial observations. We also show that the agent's internal representation of the surroundings, a live mental map, can be used for control in two partially observable reinforcement learning tasks.

Videos

Reconstructions of full state and glimpse agent policy in PhysEnv environment.

movie.mp4

| Observation | Reconstruction | Next Full State (unobserved) | Attention Probabilities |

Initially, the reconstruction is blank as no observation has been made. The first thing that the attention focuses on is the goal (red ball). Since the first goal location of the episode is always in the same place, the attention control has learned over time where to look for it. Next, without ever observing the agent (dark blue), the reconstruction can predict its location. This is because the agent also spawns at the same spot in every episode. Since the memory structure has access to the agent's actions, it has learned over many training episodes to predict its position without observations, and instead devote attention to focusing on the less predictable, randomly initialized enemies (light blue).

By the time the first goal is collected, the attention has spotted and tracked four enemy locations already. In the immediate next frame, the reconstruction shows us that the goal's next location is predicted to be slightly below the first location. In the next frame, the attention control selects this new possible location to check whether the goal is really there or not. It turns out that that location was empty and the guess for where the goal is moves to the far left of the state. This time, the attention confirms the presence of the goal there and it is written into memory. By this time, all enemies have been spotted and are being actively tracked by the attention. The attention continues alternating between high surprise locations, such as enemies colliding with each other or the walls.

Reconstructions of full state and glimpse agent policy in gridworld environment.

movie2.mp4

| Observation | Reconstruction | Full State (unobserved) | Attention Probabilities |