Framework

Outline of the introduced in this work residual learning from demonstration framework. We collect demonstrations using an HTC Vive tracker and extract an initial full pose policy using Dynamical Movement Primitives (ran on 100Hz). The produced by the DMPs desired control command is corrected by an additional residual policy trained with model-free RL (ran on 10Hz). The resulted term is then fed into a real time impedance controller (ran on 500Hz) used by a Franka Panda that performs peg, gear or Lan cable insertion in the physical world.

Jacobian Transpose Impedance Controller

The utilised Cartesian impedance controller with null space correction and without any dynamics model. We used a publicly available implementation available on GitHub

Types of residual perturbations with Gaussian noise, η, applied on a DMP policy for a simple Archimedian spiral. Perturbing directly in task space (green) results in local exploration that is important for contact-rich manipulation.

In this work we learn from expert demonstration a base policy, applied directly in task space. We use two separate formulations and namely positional and orientational. The learned behaviour is then executed using a suitable hardware controller, i.e. positional or impedance, that produces the final motions of the robot arm. We improve the generality of LfD using an additional residual policy by employing state-of-the-art model-free RL.

The orientation action is described in terms of the real value α and a 3D vector r.