We generate 264 frames condition on 36 with with action-conditioning
TECO (ours)
 Copy of samples_habitat_t_git.mp4
Copy of samples_habitat_t_git.mp4Latent FDM
 Copy of samples_habitat_vdm.mp4
Copy of samples_habitat_vdm.mp4Perceiver-AR
 Copy of samples_habitat_perceiver_ar.mp4
Copy of samples_habitat_perceiver_ar.mp4We generate 164 frames conditioned on 144 with action-conditioning
 Copy of teco_habitat.mp4
Copy of teco_habitat.mp4