We generate 264 frames condition on 36 with with action-conditioning
TECO (ours)
Latent FDM
Perceiver-AR
We generate 164 frames conditioned on 144 with action-conditioning