Bi-KVIL
Keypoints-based Visual Imitation Learning of
Bimanual Manipulation Tasks
Abstract
Visual imitation learning has achieved impressive progress in learning unimanual manipulation tasks from a small set of visual observations, thanks to the latest advances in computer vision. However, learning bimanual coordination strategies and complex object relations from bimanual visual demonstrations, as well as generalizing them to categorical objects in novel cluttered scenes remain unsolved challenges. In this paper, we extend our previous work on keypoints-based visual imitation learning K-VIL to bimanual manipulation tasks. The proposed Bi-KVIL jointly extracts so-called Hybrid Master-Slave Relationships (HMSR) among objects and hands, bimanual coordination strategies, and sub-symbolic task representations. Our bimanual task representation is object-centric, embodiment-independent, and viewpoint-invariant, thus generalizing well to categorical objects in novel scenes. We evaluate our approach in various real-world applications, showcasing its ability to learn fine-grained bimanual manipulation tasks from a small number of human demonstration videos.
Symbolic: Master-Slave Relationship (MSR)
Subsymbolic: Keypoints-based Task Representation
Geometric Constraints
Comparison to K-VIL
K-VIL
unimanual
single-master slave pair
human demonstration videos of unimanual pouring
extracted task representation including
local frame, p2p, and p2c constraints
Single MSR
Reproduction with KAC
Bi-KVIL
bimanual
hybrid-master slave graph
human demonstration videos of bimanual pouring
extracted task representation including
local frame, p2p, p2c, and pose constraints
Hybrid MSR
Reproduction with Bi-KAC
Evaluation
Place Spoon on plate (Plsp)
Plsp_1 (6): the plate moves to the initial position of the spoon head while keeping the spoon head right on top of the center of the plate.
The first column shows the demonstration videos with the candidate points, masks, and hand meshes/skeletons overlaid. The middle column displays the hybrid master-slave relationship graph, with concrete types of geometric constraints annotating each master-slave pair. The last column shows examples of these geometric constraints in the aligned view centered to the extracted local frame on the master object. Legend are discussed in the paper.
Loosely-coupled/Asymmetric, left-dominant
Human demonstration videos
HMSR
Plsp_2 (6): Similar to PS_1(6), except the plates may start from different positions above the table.
Loosely-coupled/Asymmetric, left-dominant
Task reproduction with categorical objects in novel cluttered scenes and with ARMAR-6 robot.
The first row shows the scene status before execution. Bi-KVIL uses these images to adapt the extracted sub-symbolic constraints to the corresponding categorical objects in the scene. The sub-symbolic task representations are shown in the third row, including the local frame, keypoints, and the corresponding constraints, and, for clarity, purposes, the Movement Primitives (MPs) of the p2p constraints.
Plsp_3 (6): Similar to PS_1 (6), except that the spoon may be place anywhere on the plate. Note that the plate still moves to the initial position of the spoon head in this task.
Loosely-coupled/Asymmetric, left-dominant
Plsp_4 (6): the plate moves to anywhere on the table while keeping the spoon head right on top of the center of the plate.
Loosely-coupled/Asymmetric, left-dominant
Plsp_5 (6): place the spoon head on the center fo the plate with only one arm.
Uncoordinated unimanual
Pour Water (Pow) and Pour Beer (Pob)
Pow (8) with an upright cup
Loosely-coupled/Asymmetric, right-dominant
Human Demonstration Videos
HMSR
Task reproduction and details
The RGB image and correspondence detection of DON
perceived point cloud and TCP poses
The p2p constraint
The p2c constraint
The execution status of both hands
Pow (8) with a tilt cup
Loosely-coupled/Asymmetric, right-dominant
Pow (8) with a tilt cup
Loosely-coupled/Asymmetric, right-dominant
Pow (8) cup is taken from a far position
Loosely-coupled/Asymmetric, right-dominant
Pow (8) with multiple cup
Loosely-coupled/Asymmetric, right-dominant
Place Spoon and Plate (Plsp,pt)
place the plate right above the center of the tablemat, while placing the spoon right above the center of the plate
Plsp,pt (6), with the plates and spoons starting from arbitrary positions
Loosely-coupled/Asymmetric, left-dominant
Place Cutting board and Pan (Plcb,pa)
Plcb,pa (6): Transport the cutting board to the center of the pan while placing the pan at the center of a potmat.
Loosely-coupled/Asymmetric, left-dominant
Plcb,pa (8)
Loosely-coupled/Asymmetric, left-dominant
Place serving Tray (Plst)
Plst (6): the serving trays start from an arbitrary position above the table and end at the center of the tablemat.
Tightly-coupled Symmetric
Plst (6): the serving trays start from an arbitrary position above the table and end at an arbitrary position on the tablemat.
Tightly-coupled Symmetric
Place Spoon and Banana (Plsp,ba)
Plsp,ba (6): the left places the spoon on the plate while the right-hand places the banana on the tablemat. Both arms have no coordination.
Uncoordinated Bimnanual
Clean Table (CT)
CT (6): The right arm moves the brush, which is constrained by two p2l constraints.
Loosely-coupled/Asymmetric, right-dominant
Demonstrations
HMSR
the 1st p2l constraint (green)
The 2nd p2l constraint (blue)