Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors

Supplementary Material

Video Results

Pick&Place

Human 3.6 Million

9 Rooms

25 Rooms

Visual Control on Navigation

Pick&Place

Ground Truth

GCP-tree

GCP-sequential

Deep Voxel Flow

(Liu'17)

CIGAN

(Wang'19)

Reconstructions on the Pick&Place data. The sequences contain 80 64x64 frames.

GCP-tree

GCP-sequential

Prior samples from GCP on the Pick&Place data. Each column represents different conditioning information. The sequences contain 80 64x64 frames.

Human 3.6 Million

Ground Truth

GCP-tree

GCP-sequential

Deep Voxel Flow

(Liu'19)

CIGAN

(Wang'19)

Reconstructions on the H36 dataset. The sequences contain 500 64x64 frames.

GCP-tree

GCP-sequential

Prior samples from GCP on the Human 3.6M dataset. Each column represents different conditioning information. The sequences contain 500 64x64 frames.

9 Rooms

Ground Truth

GCP-tree

GCP-sequential

Deep Voxel Flow (Liu'17)

CIGAN (Wang'19)

Reconstructions on the 9-room data. The sequences contain 100 32x32 frames.

GCP-tree

GCP-sequential

Prior samples from GCP on the 9-room data. Each column represents different conditioning information. The sequences contain 100 32x32 frames.

25 Rooms

Ground Truth

GCP-tree

GCP-sequential

Deep Voxel Flow (Liu'17)

CIGAN (Wang'19)

Reconstructions on the 25-room data. The sequences contain 200 32x32 frames.

GCP-tree

GCP-sequential

Prior samples from GCP on the 25-room data. Each column represents different conditioning information. The sequences contain 200 32x32 frames.

Visual Control on Navigation

Comparison of visual planning & control approaches. Execution traces of Visual Foresight (left), GCP-tree with non-hierarchical planning (middle) and GCP-tree with hierarchical planning (right) on two 25-room navigation tasks. Visualized are start and goal observation for all approaches as well as predicted subgoals for hierarchical planning.