Active Hierarchical Exploration

with Stable Subgoal Representation Learning

  1. Subgoal Representation Learning

The following videos show the subgoal representation learning process in the Ant Maze (Images) task. Each frame contains 2D state embedding of 5 trajectories in a U-shape maze (red for the start, blue for the end). Notably, the representations are learned from low-resolution images.

Ant Maze task

(a) HESS

(b) w/o stability regularization

2. Hierarchical Policy Learning

Along with the stable subgoal representation learning, we develop an active hierarchical exploration strategy to seek out novel and promising latent states. To the best of our knowledge, our method is the first to support the concurrent learning of the hierarchical policies and the subgoal representation in long-horizon continuous-control tasks with sparse rewards. The small red squares denote subgoals in the representation space.

AntMaze.mp4
AntMazeImage.mp4

Ant Maze

Ant Maze (Images)

AntPush.mp4
AntPushImage.mp4

Ant Push

Ant Push (Images)

AntFourRooms.mp4
PointMaze.mp4

Ant FourRooms

Point Maze

hurdle.mp4
slope.mp4

Cheetah Hurdle

Cheetah Ascending