Planning in Latent Space

In this post, I will discuss the challenges with combining motion planning techniques with learned latent-space dynamics models.

Introduction

There are a number of papers which have proposed latent-space dynamics models. Usually, the robot makes observations in the form of images, and there is a neural network which is used to make the observations into a latent space and dynamics are then learned in that latent space from example trajectories in the environment. How then do we use these models? We can view this as a planning problem, and for the manipulation tasks I am interested, a motion planning problem.

Let's assume our model learning method gives us the following three functions:

1. Forward model

2. Steering Function

3. Heuristic (cost-to-goal)

All of these models are learned, so we are not going to assume they are perfectly accurate. But surely that's not a big problem... right?

Is this a hard planning problem?

As I see it, the main challenges are:

Errors from modeling can be exacerbated
The models are non-holonomic, under-actuated, and in general no steering function or BVP solver exists
Setting thresholds and step sizes is not obvious

One issue with most of the learned dynamics models is that they are deterministic, or one-step, or both. Of course the most successful approaches have used stochastic, probabilistic, and/or multi-step models but in general how to learn these models is an open question. If your model is deterministic, then if your prediction is off, and you then take that to be true next state and do another step of prediction into the future, then your error is likely to be even worse. In this way, small modeling errors can be exacerbated as planning rolls them forward.

This second issue is less obvious and to me this is the most interesting part. Imagine a robot manipulating a piece of cloth, and let's say we (arbitrarily) pick the latent representation of our cloth to have 16 dimensions. The robot grippers probably together have a combined 12 degrees of freedom. See why this might be a problem? If we are at some point in space, and our heuristic tells us we need to move in some direction to decrease cost, then because we have fewer controls than state dimensions we might not be able to move in that direction.

Here's a simple example with a two-link rope. How can we move the end of the rope (green dot) to the goal? Because the rope is soft, you can't actually directly move it left or right at all. The gray shaded region might be the directions in which the green point can move given that you're pulling at the other end of the rope. This means the system is underactuated.

We can also say this the system is non-holonomic, in this case for the same reason that it is underactuated although that's not true in general. The takeaway here is that in general, latent models are likely to be underactuated and may also be non-holonomic.

Non-holonomic motion planning is its own sub-field with plenty of active research, which is why I list it as one of main reasons I have not seen any papers which use traditional motion planning techniques with learned latent space models.

Finally, there is a third issue that. It may not be as deep and theoretical as the others, but in practice is it very annoying. Anyone who is familiar with any kind of planning algorithm will know that you always need to set some parameters like step sizes and time steps. For discrete search, you might choose a step size in position (5cm) or a bound on your environment (10 by 10 meters). However, when you are planning in a latent space it is unclear how one should set these values. The naive approach would be to take two points in the observation space which you believe to be "1 unit" apart, map them to the latent space, and look at the Euclidean distance between the vectors. But does this actually make sense? Why Euclidean? How do you pick which specific observations to use? I don't have an answer to this.

Citations & Resources

[1] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. (2018). Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. Retrieved from http://arxiv.org/abs/1803.11347

[2] Hafner, D., Brain, G., Lillicrap, T., Fischer, I., Research, G., Villegas Google, R., … Google Brain, D. (n.d.). Learning Latent Dynamics for Planning from Pixels.

[3] Srinivas, A., Jabri, A., Abbeel, P., Levine, S., & Finn, C. (2018). Universal Planning Networks. https://doi.org/10.1016/S0169-5150(97)00025-X

Thanks for reading, check back on the 1st next month (April 1st, 2019) for the next post.

Report abuse