Kalman Variational Auto-Encoder

A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

Marco Fraccaro*¹, Simon Kamronn*¹, Ulrich Paquet², Ole Winther¹

¹Technical University of Denmark, ²DeepMind

* indicates equal contribution

Model

The Kalman variational auto-encoder is a framework for unsupervised learning of sequential data that disentangles two latent representations: an object’s representation, coming from a recognition model, and a latent state describing its dynamics. The recognition model is represented by a convolutional variational auto-encoder and the latent dynamics model as a linear Gaussian state space model (LGSSM).

Paper: NIPS 2017

Code: https://github.com/simonkamronn/kvae

Data: Google Drive

Main result

As shown in the paper, the KVAE can be trained end-to-end, and is able learn a recognition and dynamics model from the videos. The model can be used to generate new sequences, as well as to do missing data imputation without the need to generate high-dimensional frames at each time step.

Learning non-linear dynamics

The dynamic parameter network is able to learn the appropriate mixture of multiple linear dynamics in each step by only observing the low-dimensional latent representation.

Videos of missing data imputation and long-term generation

We demonstrate the model’s ability to separately learn a recognition and dynamics model from video, and use it to impute missing data and perform long-term generation in four different environments.

All videos and data used for training are available here.

Box

Imputation where 30% of the frames are dropped at random. The red ball is the ground truth.

kvae_box_true_smoothing.mp4

Long-term generation with 4 frame initialization.

kvae_box_long_generation.mp4

Gravity

Imputation where 30% of the frames are dropped at random. The red ball is the ground truth.

kvae_gravity_true_smoothing.mp4

Long-term generation with 4 frame initialization.

kvae_gravity_long_generation.mp4

Polygon

Imputation where 30% of the frames are dropped at random. The red ball is the ground truth.

kvae_polygon_true_smoothing.mp4

Long-term generation with 4 frame initialization.

kvae_polygon_long_generation.mp4

Pong

Imputation where 30% of the frames are dropped at random. The red ball is the ground truth.

kvae_pong_true_smoothing.mp4

Long-term generation with 4 frame initialization.

kvae_pong_long_generation.mp4

Google Sites

Report abuse