Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation

Suraj Nair, Chelsea Finn

Stanford University | Google Brain

Paper / Code

Hierarchical Visual Foresight (HVF)

  • We propose hierarchical visual foresight (HVF), a novel technique for subgoal generation for long horizon planning of visuomotor tasks.
  • HVF works by combining video prediction and generative models to produce subgoal images, conditioned on the goal image.
  • The subgoal images are directly optimized to produce easy to plan subsegments, and as a result HVF identifies semantically meaningful substeps in long horizon tasks.
  • We find that by using subgoals for planning, we see nearly a 200% improvement on long horizon desk manipulation tasks

Simulated Desk Manipulation

  • We demonstrate our method on long horizon manipulation of objects and a sliding door on a desk
  • HVF identifies semantically meaningful images as sub tasks, such as grasping the door handle, or pushing the block and repositioning the arm
  • Using subgoals enables a significant performance improvement on all 4 tasks

Additional Results

Maze Navigation

  • We also evaluate our method on visual maze navigation
  • Agent (green block) must navigate through narrow gaps in walls to reach goal specified by goal image
  • The longer the task, the more HVF outperforms standard visual foresight and "Time Agnostic Prediction" (Jayaraman et al)

BAIR Robot Dataset

  • We show qualitative performance of HVF on real cluttered images from the BAIR robot dataset (Ebert et al)
  • HVF produces semantically meaningful subgoal images - for example reaching to objects when the goal state involves them being manipulated