Learning a Cost Function

Our aim here is to learn a cost function of the kind C𝝍(z_{t 𝝐T}, z_goal), where the cost accepts the states z_{t 𝝐T} of the generated trajectory and goal state z_goal . The cost C is parametrized by learnable parameters 𝝍 which are trained using bi-level optimization.

Cost Function Parametrizations

Weighted Cost

This simple cost function parametrization provides a constant learnable weight per (x, y) dimension of each of the K key points. This cost function thus has 2K parameters.

Time Dependent Cost

This cost extends the previous formulation to provide a learnable weight for each time step t. This adds more flexibility to the cost and allows to capture time-dependent importance of specific keypoints. This cost function has 2TK parameters, which scale linearly with the horizon length.

RBF Weighted Cost

This cost, like the previous one is also time dependent but allows us to more easily scale to longer time horizons, with 2JK parameters, and J<T. Kernels are uniformly spaced in time and b is chosen to create some overlap between neighboring kernels.

Gradient Based Bi-Level Optimization

Based on previous works [1, 2], we employ gradient based bi-level optimization to learn the parameters 𝝍 of the cost function. This involves the following interleaving optimization processes:

REFERENCES

[1] S. Bechtle, A. Molchanov, Y. Chebotar, E. Grefenstette, L. Righetti, G. Sukhatme, and F. Meier. Meta-learning via learned loss. arXiv preprint arXiv:1906.05374, 2019.

[2] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017.

Page updated

Google Sites

Report abuse