trajectory-prediction

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

Nathan Lambert, Albert Wilcox, Howard Zhang, Kristofer Pister, Roberto Calandra

code: Github, paper: Arxiv, video: YouTube

Abstract:

Accurately predicting the dynamics of robotic systems is crucial to make use of model-based control. A common way to estimate dynamics is by modeling the one-step ahead prediction and then use it to recursively propagate the predicted state distribution over long horizons. Unfortunately, this approach is known to compound even small prediction errors, making long-term predictions inaccurate. In this paper we propose a new parametrization to supervised learning on state-action data to stably predict at longer horizons -- that we call a trajectory-based model. This trajectory-based model takes an initial state, a time index, and control parameters as inputs, and predicts a state at that time. Our results in simulated and experimental robotic tasks show accurate long term predictions, improved sample efficiency, and ability to predict task reward.

Model Explanation

The new trajectory-based models (T ) focuses model capacity on how trajectories act over time rather than discrete steps. This is captured by training the model to predict the evolution from a starting state, s_n, subject to control parameters, θ_π, and a future-time index, t, as in Eq. (1) and Fig. 1b. The trajectory-based formulation conveys substantial benefits in stable, uncertainty-aware estimates of trajectories and data-efficiency of model learning. The labelled dataset in D grows by re-labelling every sub-sequence from state s in a trajectory τ = {s }^Lto the final state s . By training on below .

For prediction propagation, by only passing in a time index t from a current state, a planner can evaluate the future with one forward pass, alleviating the computational burden and compounding, all sub-trajectories, the model gains two strengths: 1) it can predict into the future from any state, not just those given as initial states from a environment, and 2) the number of training points grows proportional to the square of trajectory length: n_trajL²points from n trajectories of length L, shown multiplicative error associated with evaluating many steps of one-step models. By propagating time directly, the uncertainty of a probabilistic, trajectory-based model is proportional to the variation in dynamics in the training set (i.e., the model is more uncertain in areas of rapid movement and can shrink when motion converges). The uncertainty propagation is drawn in Fig. 2a and in experiment in Fig. 2b, and compared to one-step models that have diverging uncertainty as the states encountered leave the training distribution.

Video:

Experiments:

Long Term Predictions - with LSTM and PE included.

Reviewer 2 comment - predictions per epoch (trajectory-based)

Reacher Prediction Example

Sample Efficiency

The trajectory-based models accrue labeled data at a rate proportional to the trajectory length squared. Taking a slice of the below plot at a set training trajectory length will give a visualization of prediction accuracy vs number of training points, where the trajectory-based model is superior.

Google Sites

Report abuse