Juyan Zhang, Dana Kulić and Michael Burke
Department of Electrical and Computer Systems Engineering, Faculty of Engineering, Monash University
Image generated by ChatGPT
Manipulation tasks often consist of subtasks, each representing a distinct skill. Mastering these skills is essential for robots, as it enhances their autonomy, efficiency, adaptability, and ability to collaborate with humans. Learning from demonstrations allows robots to rapidly acquire new skills without starting from scratch, with demonstrations typically sequencing skills to achieve tasks. Behaviour cloning approaches to learning from demonstration commonly rely on mixture density network output heads to predict robot actions. In this work, we first reinterpret the mixture density network as a library of feedback controllers (or skills) conditioned on latent states. This arises from the observation that a one-layer linear network with different initializations is functionally equivalent to a classical feedback controller with different gains. We use these insights to derive a probabilistic graphical model that combines these elements, describing the skill discovery process as segmentation in the latent space, where each skill policy functions as a feedback control law in the latent space. Our approach significantly improves not only task success rate, but also robustness to observation noise when trained with perfect demonstrations. Our physical robot experiments further show that the induced robustness simplifies model deployment on robots.
Better performance: Based on the perfect demonstrations, our model improves upon MDN by 12% on average, as shown in Table IIa. Our model achieves an equivalent or better success rate on every task.
More robustness: We apply different noise scales of [0, 0.1, 0.2, 0.3, 0.5, 1, 2, 3]. Our model is also better than MDN by 8% on average.
Convergence to the set-point: Our model appropriately stops at the end of the task, thanks to our explicit feedback controller design, which leads to stable convergence to a latent set-point or goal.
Simpler deployment: Our induced robustness simplifies model deployment on the robot.
Demo Video
Our Model vs. MDN:
Our model stops when the task is finished whereas the MDN keeps drawing the letter.
All Demos
We also show all the written letters. Their execution time varies depending on the task complexity.
Task Performance:
Similar performance Upper bound: MDN can achieve a performance comparable to ours with enough training samples and the optimal number of skills.
Better sample efficiency: As the sample size increases, our model consistently performs well, matching or exceeding the success rates of MDN and BC.
Robustness:
More robust to sample size & skill number: our model performs significantly better compared with other models across all different noise levels.
The source of the robustness
Latent space: Adding the latent state (through the encoder and decoder structure) improves the robustness by around 3%.
Switching KL constraints: Adding the switching KL divergence increases the robustness across different skills, which indicates the potential for learning with larger numbers of skills.
High gain controller: A higher gain controller (the last row in Table Va) achieves better robustness, especially when the level of perturbation is large, as shown in Figure 5.
The Frechet distance measures the distance between generated trajectories and training data trajectories, which is reported for MDN, MDN + FB(MDN + Feedback control design) and MDN + FB + SW(MDN + Feedback control design + switching skill consistency)
Bounded Behaviour from feedback control design
The latent controller goals and gains anchor the policy close to the training data distribution
The switching skill further enhance this bounded behaviour