Convolutional State Space Models for
Long-Range Spatiotemporal Modeling

Jimmy T.H. Smith, Shalini De Mello, Jan Kautz, Scott W. Linderman, Wonmin Byeon

Below are randomly sampled trajectories for the different models on each dataset. We show 16 samples for each model and compare them to the ground truth. The green box represents conditioning on the context window (and is the same as the ground truth), while the red box represents the generated samples. For best comparison, we recommend setting the video players to the 1080p resolution for the highest quality videos.

Moving-MNIST Samples

1200 frames generated conditioned on 100

mnist.mp4

DMLab Samples

156 frames generated conditioned on 144 (action-conditioned)

dmlab_action_big.mp4

264 frames generated conditioned on 36 (no action-conditioning)

dmlab_big.mp4

Minecraft Samples

156 frames generated conditioned on 144 (action-conditioned)

minecraft_action.mp4

264 frames generated conditioned on 36 (action-conditioned)

minecraft.mp4

Habitat Samples

156 frames generated conditioned on 144 (action-conditioned)

habitat_action.mp4

264 frames generated conditioned on 36 (no action-conditioning)

habitat.mp4

Page updated

Google Sites

Report abuse

Convolutional State Space Models for Long-Range Spatiotemporal Modeling

Convolutional State Space Models for
Long-Range Spatiotemporal Modeling