Experiments

REACHING Experiments

Experimental Data

In this experiment, we generate a set of 15 demonstrations automatically by moving the robot's end effector from one end to the other. For the purposes of the experiment, we make sure that the arm moves mostly in the X-direction, with little or no movement in the Y-axis. We compare with the below baselines:

Baselines

We compare our learned costs to two different baselines

  • DEFAULT COST: A euclidean distance between the keypoint trajectory and goal keypoints.

  • WEIGHTED ABBEEL COST: The IRL Apprenticeship algorithm by [1]

Results

IRL training and test evaluation: (a) and (c) show the IRL Loss during training of the parametrized costs from 1 and 10 demos. Figures (b, d) show the relative distance to the goal keypoint achieved at test time when optimizing the action trajectory with the learned costs and baselines. Results are averaged across 3 seeds. Note that our learned costs perform much better than the baselines. While the Weighted Abbeel Cost is parametrized similarly to our weighted cost and learns a similar distribution of cost function parameters (below), the weights are too small and a manual tuning of the learning rate used in cost function optimization would be required to get as good a performance as our learned costs.

Learned Cost Parameters

We plot here the average of the learned cost parameters. Note that the parameters corresponding to the Y-dimension of the keypoints receive less weight. Additionally, weights corresponding to keypoint 4 - a stationary keypoint in the background of the image are also learned to be low - de-emphasizing its contribution to the predicted trajectory as it should.

Placing Experiments

Experimental Data

For this experiment, we acquire video demonstrations of a human placing a water bottle on the shelf. Using the keypoint detector, we are able to obtain the keypoint trajectory corresponding to the demonstration video. We then use this video demonstration to learn different cost functions that can be optimized by the robot to accomplish the demonstrated task.

We place the robot in two different starting configurations that are very different from the starting pose of the human demo to show our method's effectiveness.

Start Pos 1

Start Pos 2

Results for Start Position 1

We show the robot trajectory that results from executing actions optimized with different costs. A table highlighting the relative distance to the goal keypoint (shown as hollowed circles over the shelf in the video) is also shown. Note that the costs with time-dependent parametrizations have the best performance, while the default cost, not having learned from the demos ends up colliding with the shelf in order to take the shortest path.

Weighted

Time Dependent

RBF Weighted

Default

Results for Start Position 2

In our experiments with a different starting configuration for the robot, we note that all of the learned costs perform considerable better than the default cost.

Weighted

Time Dependent

RBF Weighted

Default

Learned Cost Parameters

We note that the parameters of the time-dependent cost functions (in 2nd and 3rd plots) learn to emphasize the distance from the goal in the X direction during the first half of the motion and Y-direction in the latter half.

REFERENCES

[1] P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, page 1, 2004.