An investigation of model-free planning
Arthur Guez*, Mehdi Mirza*, Karol Gregor*, Rishabh Kabra*, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap
* Equal contributionDeepMind, London, UK
Solving complex planning domains
Ms Pacman (Atari)
Boxworld
Sokoban
Asteroids (Atari)
DRC (D,N) architecture
`D` denotes the number of ConvLSTM modules stacked together. `N` denotes the number times the network is repeated at each agent time-step.
This simple model-free agent exhibits strong performance on complex planning domains (see videos above) and several behavioral characteristics of planning.
What are the behavioral characteristics of a good planner?
(1) It should generalize to novel scenarios.
(2) It should be able to learn from limited amounts of data.
(3) It should make effective use of additional computation time.
Think you can plan? Try Boxoban
These are a selection of the 3332 hard levels described in our paper, with simplified graphics.
Instructions:
Left-click on the green character to start playing. The goal is to push the brown boxes on top of the red targets. Use the arrow keys to move the agent wisely.
Press 'N' to skip to a random new level, 'R' to restart the current level, or 'U' (if you want to cheat!) to undo your last move.
If you don't see the game to the left, you'll need to enable javascript in your browser.
Full dataset: