# Learning Implicit Priors for Motion Optimization

## Julen Urain*, An T. Le*, Alexander Lambert*, Georgia Chalvatzaki, Byron Boots, Jan Peters

*Authors contributed equally.

**Abstract:** In this paper, we focus on the problem of integrating Energy-based Models (EBM) as guiding priors for motion optimization. EBMs are a set of neural networks that can represent expressive probability density distributions in terms of a Gibbs distribution parameterized by a suitable energy function. Due to their implicit nature, they can easily be integrated as optimization factors or as initial sampling distributions in the motion optimization problem, making them good candidates to integrate data-driven priors in the motion optimization problem. In this work, we present a set of required modeling and algorithmic choices to adapt EBMs into motion optimization. We investigate the benefit of including additional regularizers in the learning of the EBMs to use them with gradient-based optimizers and we present a set of EBM architectures to learn generalizable distributions for manipulation tasks. We present multiple cases in which the EBM could be integrated for motion optimization and evaluate the performance of learned EBMs as guiding priors for both simulated and real robot experiments.

**Supplementary Materials**

## I. Stochastic Gaussian Process Motion Planning

## II. Planar navigation experiments

We begin by testing our framework on a simple planar navigation problem, where a holonomic robot must reach a goal location while avoiding obstacles. We assume that the start, goal, and obstacle locations are known for a given planning problem, but the obstacle geometries (ex. size, shape) are unknown. We want to learn an implicit distribution that captures the collision-free trajectories which lead to a particular goal. Here, we investigate two possible sources of empirical data: (1) sparsely populated point distributions in free-space and (2) a set of expert trajectory distributions. The former can be seen as a stand-in for free-space measurements taken from a depth sensor, e.g., a lidar.

### 1. Data source: sparsely populated point-distributions

**Figure 1: **Demonstration data as red points are generated by uniform point-based sampling with rejection.

**Figure ****2****: **Learned EBM energy landscape with corresponded demonstration data. Blue lines are optimized trajectories given the learned EBM priors.

### 2. Data source: sets of expert trajectory distributions

**Figure ****3****: **Expert trajectory distributions for randomly sampled goal locations along the top and right side, with start location in the bottom-left corner.

**Figure ****4****: **Roll-out of learned behavioral cloning (BC) model corresponded to expert trajectories data. The BC models suffer from covariate shift issue.

**Figure ****5****: **Learned energy landscape with corresponded expert trajectories data. Blue lines are optimized trajectories given the learned EBM priors.

## III. Planar manipulator experiments

This experiment evaluates how object-centric EBMs helps solving the task of grasp and insert object into a walled-cubby. We consider a 3-Dofs simulated planar manipulator where the task space is 2D. The objective is to find a smooth trajectory in the joint space from initial joint configuration to grasp the white cube and insert the white cube into the cubby while avoiding collisions. Note that the white cube in our case is a visual mesh having no collision model.

Noisy (Left) and fair (Right) trajectory roll-outs of BC models learned from the same expert trajectories generated by a planner.

The first baseline employs Gaussian distributions as priors to define the object’s grasp and insertion pose potential in the task space. These videos show the fail cases due to the planner being unable to find a solution guiding inserting the object into the cubby.

The second baseline employs Gaussian priors similar to the first baseline. We additionally warm start the optimization with initial trajectories sampled from the BC model above. However, the noisy initial trajectories sometimes make the planner fails.

**Figure ****6****: **Learned grasp and insert EBM priors. (Left) free-space point sampling and (right) expert trajectory distributions, with multi-goal planning solutions depicted by blue trajectories. Discontinuities and implicit obstacle surfaces are well captured using sparse free-space point samples during training, whereas distributions of trajectory-based demonstrations can be captured neatly by the EBMs. The latter provides a convenient “guiding” energy function for a new context, improving samples optimized by StochGPMP.

When employing learned implicit priors as EBMs above, the trajectory optimizer finds successful solutions in most cases. Note that all experiment instances (baselines & our method) are run with the same 100 optimization steps and trajectory horizon of 64. The priors are combined with other handcrafted costs such as smoothness, joint limit, and obstacle avoidance to completely define the trajectory optimization objective.

## IV. Robot pouring amid obstacles

In this experiment, we evaluate the integration of learned EBM components for solving a pouring task amid obstacles. We consider a 7-DoFs Kuka-LWR robot manipulator. Given the robot’s initial joint configuration, we aim to find a trajectory that moves the robot to pour in an arbitrarily positioned pot and recover back to the initial joint configuration.

Videos show real robot performance of pouring amid obstacles task, where each trial has random green pot and obstacle poses. The learned pouring EBM prior is combined with other handcrafted costs such as smoothness, joint limit, and obstacle avoidance to completely define the trajectory optimization objective. All experiment instances are shown in real-time.