An investigation of model-free planning

Arthur Guez*, Mehdi Mirza*, Karol Gregor*, Rishabh Kabra*, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

* Equal contribution

DeepMind, London, UK

https://arxiv.org/abs/1901.03559

Solving complex planning domains

Ms Pacman (Atari)

Boxworld

Sokoban

Asteroids (Atari)

DRC (D,N) architecture

`D` denotes the number of ConvLSTM modules stacked together. `N` denotes the number times the network is repeated at each agent time-step.

This simple model-free agent exhibits strong performance on complex planning domains (see videos above) and several behavioral characteristics of planning.

What are the behavioral characteristics of a good planner?

(1) It should generalize to novel scenarios.

DRC(3,3) versus baselines on Gridworld.

DRC(3,3) versus baselines on Sokoban (unfiltered).

(2) It should be able to learn from limited amounts of data.

(3) It should make effective use of additional computation time.

Think you can plan? Try Boxoban

These are a selection of the 3332 hard levels described in our paper, with simplified graphics.

Instructions:

Left-click on the green character to start playing. The goal is to push the brown boxes on top of the red targets. Use the arrow keys to move the agent wisely.

Press 'N' to skip to a random new level, 'R' to restart the current level, or 'U' (if you want to cheat!) to undo your last move.

If you don't see the game to the left, you'll need to enable javascript in your browser.

Full dataset:

https://github.com/deepmind/boxoban-levels