Home

Pre-training with Non-expert Human Demonstration for Deep Reinforcement Learning

Code in Github

Human Demonstration Data

Grad-CAM

We pre-train a network using non-expert human demonstration data. The pre-trained network is used to initialized A3C's network. We learn A3C's final policy after training over 50 million training steps. From A3C's final policy, we use it to play a game. The game is played until 5,000 steps or when the game ends, whichever comes first. During the game, we collect all the game state frames in sequence. This game state frames are used to generate the Grad-CAM videos using different points of the policy: 1) Final policy. After agent is trained over 50 million steps, starting from a pre-trained network. We want to investigate what features are retained and what features the RL agent considers the most important. 2) Initialize from pre-trained network. Initialize the deep RL's network from pre-trained network and investigate what features are learned from pre-training and how it compares to the features in the final policy. 3) Orthogonal initializer. Using Tensorflow's orthogonal random initializer for deep RL's network.

Pre-training with Non-expert Human Demonstration for Deep Reinforcement Learning

Grad-CAM

Grad-CAM Videos

Asterix

Breakout

Ms Pacman

Name This Game

Pong

Space Invaders