Explaining Recurrent Attention Models

joint work with Kausik Sivakumar and Harsh Goel

August - December 2021

GitHub

Recurrent Attention Model (RAM) is a Recurrent Neural Network (RNN) model that focus on specific locations (glimpses) of the input image and build an accurate representation to reconstruct the image. We built a Variational Auto Encoder on the hidden state of the RNN and visualized MNIST image reconstructions from the glimpses. We utilized expected information gain for reward shaping in the underlying reinforcement learning algorithm and showed empirically that the new reward structure outperforms prior method on the MNIST data set.

MNIST image reconstruction

(leftmost original MNIST image, followed by reconstructed output from different glimpses) 

MNIST cluttered image reconstruction

(leftmost original MNIST image, second cluttered image, followed by reconstructed output from different glimpses of cluttered image) 

Comparison of Information gain between HardAttReshaped (ours) and HardAtt. Our method performs better on MNIST. On cluttered MNIST, our method's performance decreases with more glimpses. This might be because of more noisy information used for reward shaping.