Chen Yizhou¹, Xu Hang¹, Yu Dongjie¹, Ren Yi² , Pan Jia¹
¹The University of Hong Kong ²Huawei Technologies Co., Ltd
TL, DR: We marry bimanual visuomotor policies with long-horizon planning, addressing out-of-the-distribution (OOD) observations while complying with novel goals and constraints.
Imitation learning (IL), particularly when leveraging high-dimensional visual inputs for policy training, has proven intuitive and effective in complex bimanual manipulation tasks. Nonetheless, the generalization capability of visuomotor policies remains limited, especially when small demonstration datasets are available.
Accumulated errors in visuomotor policies significantly hinder their ability to complete long-horizon tasks. To address these limitations, we propose SViP, a framework that seamlessly integrates visuomotor policies into task and motion planning (TAMP). SViP partitions human demonstrations into bimanual and unimanual operations using a semantic scene graph monitor. Continuous decision variables from the key scene graph are employed to train a switching condition generator. This generator produces parameterized scripted primitives that ensure reliable performance even when encountering out-of-the-distribution observations.
Using only 20 real-world demonstrations, we show that SViP enables visuomotor policies to generalize across out-of-distribution initial conditions without requiring object pose estimators. For previously unseen tasks, SViP automatically discovers effective solutions to achieve the goal, leveraging constraint modeling in TAMP formulism. In real-world experiments, SViP outperforms state-of-the-art generative IL methods, indicating wider applicability for more complex tasks.
Peg-in-hole task (simulation, speed: 2X)
OOD Setup:
The peg and the socket are placed on the right half and left half of the table, respectively, with the position and orientation randomly initialized.
Unreachable Setup:
The peg and the socket are initialized on one side of the table, and the bimaual manipualtion cannot be directly operated.
Unsafe Setup:
An unexpected obstacle obstructs the space required by the bimanual manipualtion.
Overcoming unseen, random setups (real-world, speed: 4X)
Handing off an object from right to left
Grasp a screwdriver and pack it up
With a small amount of demonstrations, SViP can adapt to unseen object placements and accomplish gestures that the leader arm of the teleoperation system cannot reach.