Learning Latent Dynamics

In this post, I will summarize and present my thoughts on this paper, "Learning Latent Dynamics for Planning from Pixels" from Google [2]. The main contributions of this paper are, in my mind:

New loss function that measures consistent multi-step predictions
Application of sample-based planning using their learned latent model
Impressive experimental results in simulation

Ok, so let's start off with what a latent space or latent model is. The type of problem is a partially observable Markov decision process, or POMDP for short. See the diagram in Figure 1 for the classical graphical model.

S represents a latent state, O is an observation, U is an action and R is a reward. Other terms for latent state include hidden state, and embedding. The observation may also be called a projection, and the reward my also be called (inversely) a cost.

The idea is the state of our world is represented by a set of random variables, h1, h2, ... and we then claim that our robots observations of the world are a function of the current world state, and importantly, only the current world state. You don't need any information about the history of the world in order to generate that observation. We might say that O2 is independent of S1 give S2.

In this paper, the authors propose learning a set of functions which describes this POMDP. They learn a mapping from S to O, which can be thought of as "rendering" the world state into a 2D image. They also learn a function to predict reward from state, and a transition function. Finally, they learn a mapping back from a history of 2D images and actions to the world state.

Together these functions are a complete model of the environment. Assuming this model is correct, then the final task is to select actions that our model predicts would yield high reward. This is where the planning program comes in. First, note that while it is technically possible to use value iteration to solve for the optimal task using this model and a specific goal, this is intractable [1]. So instead the paper uses a sampling-based model predictive control technique called CEM (Cross Entropy Method) which I don't understand and won't attempt to explain here. The paper doesn't focus on the planning side of things, but rather the model learning. How can we learn all of these functions from data?

The approach is rather simple. You begin with a small dataset of state, action, state, reward (S1, U1, R1, S2) which you might collect with random actions. From this data, the authors train a set of deep neural networks all chained together in a large computational graph using the "Latent Overshooting" objective described in Equation (7) in the paper. Of course there are many details that really make this work, but for the rest of this post I want to instead pose some questions approach and discuss some potential follow-up studies.

Do we really need a latent state?
How well does the learned model generalize
Can we transfer the learned model?

1. I would have liked to see results in this paper showing how a pure video-predictive methods like [3] perform on these tasks. My assumption is that they do not work, but it would be useful to see a comparison.

2. In this paper, they show results from some interesting tasks, such as using a finger-like robot to spin a rotating bar. I am wondering what would happen if the bar was moved 5cm from its position? Assuming all the training data was collected with just one bar and one bar position, it is possible that it would be unable to perform the task if the bar was moved or the size of the bar was changed.

3. This idea brings me to a deeper question, which is how can we in general learn models that are useful for a wide variety of tasks? My main complaint about traditional system-identification and modeling approaches in engineering is that they are specific to one robot or one system. I desire an algorithm that allows a robot to essentially do the work of learning a model for us. Many people, including this paper, have shown that deep neural networks (DNN) can be fit to complex dynamical systems, but what remains to be seen is the ability to transfer this to new tasks. Why is this important? To me, it is essential because presently the amount of data needed to train & fit these DNNs to complex systems has meant they can only be done in simulation where hundreds of thousands of trajectories can be collected. If we want to operate on real robots, we need to be able to do it with fewer examples. Therefore, I would like to explore transferring the learned models from simulation to reality, and from tasks where data is plentiful to those where it is precious.

Citations & Resources

Thanks for reading, check back on the 1st next month (March 1st, 2019) for the next post.

Report abuse