DMLab

Below on the left, we visualize 300 frame videos, with 264 generated frames conditioned on 36, without action conditioning.

On the right, we visualize the 3D scenes produced by these videos. 3D auxiliary data is used only for evaluation, and generation is done using RGB frames.

TECO (ours)

Copy of samples_dl_maze_t_git.mp4

Copy of 3d_alone_t_git.mp4

Latent LDM

Copy of samples_dl_maze_vdm.mp4

Copy of 3d_alone_vdm.mp4

Perceiver-AR

Copy of samples_dl_maze_perceiver_ar.mp4

Copy of 3d_alone_perceiver_ar.mp4

CW-VAE

Copy of samples_dl_maze_cwvae.mp4

Copy of 3d_alone_cwvae.mp4

FitVid

Copy of samples_dl_maze_fitvid.mp4

Copy of 3d_alone_fitvid.mp4

Below on the left, we visualize 300 frame videos, with 164 generated frames conditioned on 144 with action conditioning.

On the right, we visualize the 3D mazes constructed from our video predictions, with 264 generated frames conditioned on 36 without action conditioning. Video prediction uses only RGB frames

Copy of teco_dl_maze.mp4

Copy of teco_dmlab_3d_all.mp4

Page updated

Google Sites

Report abuse