Representation Learning for Hierarchical RL
Videos of successfully trained policies and their learned representations. Square at upper left shows learned representations (green is current observation representation, blue is trajectory of representations, magenta is goal chosen by higher-level policy).
Our method is able to learn good representations as well as successful hierarchical policies on top of these representations, regardless of whether the raw state observation is position based or image based.
Tasks are navigation tasks. In 'Block' tasks, the directive is to get small red block to a target location (green arrow). In other tasks, the directive is to get the agent (simulated ant or point mass) to the target location.
Ant Block
Ant Block Maze
Ant Maze
Ant Maze (Images)
Point Maze
Point Maze (Images)
Ant Fall
Ant Fall (Images)
Ant Push
Ant Push (Images)