Diff-LfD: Contact-aware Model-based Learning from Visual Demonstration for Robotic Manipulation via Differentiable Physics-based Simulation and Rendering

Abstract:

 Learning from Demonstration (LfD) has emerged as an efficient technique for robots to acquire new skills through expert observation, significantly mitigating the need for laborious manual reward function design. This paper introduces a novel framework for model-based LfD in the context of robotic manipulation. Our proposed pipeline is underpinned by two primary components: self-supervised pose and shape estimation and contact sequence generation. The former utilizes differentiable rendering to estimate object poses and shapes from demonstration videos, while the latter iteratively optimizes contact points and forces using differentiable simulation, consequently effectuating object transformations. Empirical evidence demonstrates the efficacy of our LfD pipeline in acquiring manipulation actions from human demonstrations. Complementary to this, ablation studies focusing on object tracking and contact sequence inference underscore the robustness and efficiency of our approach in generating long-horizon manipulation actions, even amidst environmental noise. Validation of our results extends to real-world deployment of the proposed pipeline.

Bibtex

@inproceedings{

Zhu2023DiffLfD,

title={Diff-LfD: Contact-aware Model-based Learning from Visual Demonstration for Robotic Manipulation via Differentiable Physics-based Simulation and Rendering},

author={Zhu, Xinghao and Ke, Jinghan and Xu, Zhixuan and Sun, Zhixin and Bai, Bizhe and Lv, Jun and Liu, Qingtao and Zeng, Yuwei and Ye, Qi and Lu, Cewu and Tomizuka, Masayoshi and Shao, Lin},

booktitle={Conference on Robot Learning (CoRL)},

year={2023},

}

Video:

DiffLfDCoRL.mp4

Supplementary Materials:

supp_material.pdf

Comparison with NoPe-NeRF on synthesized videos

Given Videos

Ours

NoPe-NeRF




Fail (Blank Render Video)







Fail (Blank Render Video)







Fail (Blank Render Video)







Fail (Blank Render Video)







Fail (Blank Render Video)




Reconstruction Comparison: With and Without Diffusion

Reference

No diffusion

With diffusion

 Generalization of the Closed-Loop Policy for Unseen Objects

(Caption indicates the final tracking error for each test object)


Train object: cube

Test object: cylinder

3.8 degrees

Test object: ball

2.4 degrees

Test object: lemon

6.3 degrees

Test object: avocado

5.9 degrees

Ground Truth synthesized Videos and Estimated Shapes and Poses