CS 188 Project

Stanley Wei & Ryu Adams

Our Project

Project Goal: We implemented a DMP-based algorithm and neural network to solve the Robosuite Nut Assembly Task using expert demonstrations, then compared the training time and accuracy of each of the methods.

To do this to implement two separate methods for learning a policy from data:

DMPs: We implemented a Cartesian-space variant of dynamic motion primitives that is capable of generating both position and orientation trajectories.
A neural network capable of directly mapping sensor inputs to control outputs

Success Criteria

Criteria for a successful policy: Trained policy can successfully & consistently complete the Square Assembly task, including for targets with varying orientations.

Goal: Develop successful policies (i.e. meeting the above criteria) using separate DMP and deep learning approaches, and compare the results.

Metrics for comparison: success rate, average time to complete the task

Pictured: The Robosuite Nut Assembly task

CS 188 Project - 1749881687730.mp4

Approach 1: DMPs

Our first approach used the rotation-matrix formulation of Cartesian-space DMPs presented in [1] to learn from demonstrated position & orientation trajectories. 6 independently PID controllers were then used for linear x/y/z and roll/pitch/yaw to follow the DMP-generated trajectories.

DMP Orientation (Rotation Matrix) Coefficient Trajectories

Approach 2: Deep Learning

Our second approach used a deep neural network to perform behavior cloning using expert demonstrations. The trained NN uses end-effector and target object poses as input and outputs an action vector for each time step.

Results

Between our two approaches, the DMP had a higher success rate, but took significantly longer to complete the task than the neural network.

We had a few guesses as to why:

The neural network generally took very efficient, direct paths to target, whereas the DMP would often take winding, snaky, or otherwise inefficient paths (due to the imitated demonstration likely taking such a path originally).
The runtime behavior of the DMP is significantly more parameter-dependent than the neural network; as such, the DMP is much more susceptible to having bad parameters/gains adversely affect performance, as was likely the case for our DMP.

References

A. Ude, B. Nemec, T. Petrić and J. Morimoto, "Orientation in Cartesian space dynamic movement primitives," 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 2014, pp. 2997-3004, doi: 10.1109/ICRA.2014.6907291.

Page updated

Report abuse