The video shows the agent's plan throughout the episode. The green dots indicate the stored subgoals. The black dot shows the agent. The yellow dot with index 0 marked next to it shows the final goal. The agent plans hierarchically by recursively generating subgoals (s_1,s_2,s_3,...). The iteration ends at the first reachable subgoal, which is then used as the worker's goal. The subgoals are shown below with two colors, red and blue. The red subgoals are the agent's direct predictions and can be prone to error. Therefore, the goal buffer is used to snap to a nearby stored state, represented as blue subgoals.Â
AntMaze Large
AntMaze Giant
HumanoidMaze Large
HumanoidMaze Giant