Representation Learning for Hierarchical RL

Videos of successfully trained policies and their learned representations. Square at upper left shows learned representations (green is current observation representation, blue is trajectory of representations, magenta is goal chosen by higher-level policy).

Our method is able to learn good representations as well as successful hierarchical policies on top of these representations, regardless of whether the raw state observation is position based or image based.

Tasks are navigation tasks. In 'Block' tasks, the directive is to get small red block to a target location (green arrow). In other tasks, the directive is to get the agent (simulated ant or point mass) to the target location.

ant_block1.mp4

Ant Block

ant_block_maze1b.mp4

Ant Block Maze

ant_maze.mp4

Ant Maze

ant_maze_images.mp4

Ant Maze (Images)

point_maze.mp4

Point Maze

point_maze_images.mp4

Point Maze (Images)

ant_fall1.mp4

Ant Fall

ant_fall_images1b.mp4

Ant Fall (Images)

ant_push1.mp4

Ant Push

ant_push_images1b.mp4

Ant Push (Images)