Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks

Fabio Pardo, Vitaly Levdik, Petar Kormushev

AAAI 2020

Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular cases so far. To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once. We demonstrate the accuracy and generalization qualities of the proposed method on randomly generated mazes and Sokoban puzzles. In the case of on-screen goal coordinates the resulting mapping from frames to distance-maps directly informs the agent about which places are reachable and in how many steps. As an example of application we show that replacing the random actions in epsilon-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge and Super Mario All-Stars games.

Montezuma's Revenge room 1: exploration comparison

Random exploration

montezuma_random_slow.mp4

Proposed Q-map exploration

montezuma_qmap_slow.mp4

Super Mario All-Stars level 1.1: training of the proposed DQN + Q-map agent

Episode 1

mario_1_1_DQN_Qmap_seed_0_episode_1.mp4

Episode 12

mario_1_1_DQN_Qmap_seed_0_episode_12.mp4

Episode 112

mario_1_1_DQN_Qmap_seed_0_episode_112.mp4

Episode 1120

mario_1_1_DQN_Qmap_seed_0_episode_1120.mp4

Super Mario All-Stars level 1.1: best episodes

DQN

mario_DQN_best_seed_2.mp4

proposed DQN + Q-map

mario_DQN_Qmap_best_seed_2.mp4

Super Mario All-Stars level 2.1: training of the proposed DQN + Q-map agent

Episode 1

mario_2_1_DQN_Qmap_seed_0_episode_1.mp4

Episode 12

mario_2_1_DQN_Qmap_seed_0_episode_12.mp4

Episode 101

mario_2_1_DQN_Qmap_seed_0_episode_101.mp4

Episode 990

mario_2_1_DQN_Qmap_seed_0_episode_990.mp4

Super Mario All-Stars level 2.1: training of the proposed DQN + Q-map agent with pre-training on level 1.1

Episode 1

mario_2_1_DQN_Qmap_pretrained_seed_0_episode_1.mp4

Episode 11

mario_2_1_DQN_Qmap_pretrained_seed_0_episode_11.mp4

Episode 91

mario_2_1_DQN_Qmap_pretrained_seed_0_episode_91.mp4

Episode 1245

mario_2_1_DQN_Qmap_pretrained_seed_0_episode_1245.mp4