BiDexHD

Learning Diverse Bimanual Dexterous Manipulation Skills From Human Demonstrations

【Paper】

Abstract

Bimanual dexterous manipulation is a critical yet underexplored area in robotics. Its high-dimensional action space and inherent task complexity present significant challenges for policy learning, and the limited task diversity in existing benchmarks hinders general-purpose skill development. Existing approaches largely depend on reinforcement learning, often constrained by intricately designed reward functions tailored to a narrow set of tasks. In this work, we present a novel approach for efficiently learning diverse bimanual dexterous skills from abundant human demonstrations. Specifically, we introduce BiDexHD (Bimanual Dexterity from Human Demonstrations), a framework that unifies task construction from existing bimanual datasets and employs teacher-student policy learning to address all tasks. The teacher learns state-based policies using a general two-stage reward function across tasks with shared behaviors, while the student distills the learned multi-task policies into a vision-based policy. With BiDexHD, scalable learning of numerous bimanual dexterous skills from auto-constructed tasks becomes feasible, offering promising advances toward universal bimanual dexterous manipulation. Our empirical evaluation on the TACO dataset, spanning 141 tasks across six categories, demonstrates a task fulfillment rate of 74.59% on trained tasks and 51.07% on unseen tasks, showcasing the effectiveness and competitive zero-shot generalization capabilities of BiDexHD.

BiDexHD: Unified and Scalable Bimanual Framework

The three-phase framework, BiDexHD, unifies constructing and solving tasks from human bimanual datasets instead of existing benchmarks, providing a robust and scalable solution to diverse challenging bimanual manipulation tasks.

In phase one, BiDexHD constructs each bimanual task from a human demonstration.
In phase two, BiDexHD learns diverse state-based policies from a generally designed two-stage reward function via multi-task reinforcement learning.
In phase three, A group of learned policies are then distilled into a vision-based policy for inference.

In particular, general two-stage multi-task reinforcement learning is adopted for acquiring expert state-based bimanual skills.

At stage zero, for each task, all joint poses are initialized at zero pose and a pair of tool objects are initialized at a fixed pose.
At stage one, the approaching reward encourages both hands to get close to their grasping centers, and the lifting reward along with the bonus incentivizes moving both objects to their reference poses respectively.
After simulation alignment, at stage two dual hands will manipulate objects under the guidance of the tracking reward.

Parallel Training in IsaacGym

All TACO tasks can be represented in the form of (verb, tool, target object).

parallel_1.mp4

(Pour in, Cup, Teapot)

parallel_2.mp4

(Empty, Bowl, Bowl)

Diverse Bimanual Skills of TACO tasks

(brush, brush, plate)_task5_17.5k~1.mp4

(Dust, Brush, Steamer)

(hit, hammer, box)_task7_8k~1.mp4

(Hit, Hammer, Box)

(put out, bowl, bowl)_task0_23.5k~1.mp4

(Put out, Bowl, Bowl)

(pour in some, bowl, bowl)_task0_17k~1.mp4

(Empty, Bowl, Bowl)

(pour in some, teapot, cup)_task1_1.5k~1.mp4

(Pour in, Teapot, Cup)

(screw, screwdriver, box)_task4_31.5k~1.mp4

(Screw, Screwdriver, Box)

Human Demonstrations v.s. Policy Deployment of TACO tasks

Demonstrations of (Empty,Bowl, Bowl) and (Pour in, Teapot, Cup).

BiDexHD-demo.mp4

Quantitative Results of TACO tasks

BC Failure modes of TACO tasks

We display two failure cases of vanilla behavior cloning (BC) of task "dust" and "empty", with similar issues observed in other tasks. We analyze there are two main reasons for the failure:

Limited Demonstrations: Only one demonstration is available for imitation learning, leaving large portions of the observation space unexplored. As a result, BC struggles with unvisited states.
Lack of Kinematics & Dynamics. Actions derived from retargeting approximate human demonstrations spatially and temporally but fail to take true kinematics and dynamics into accounts, resulting in fragile policies prone to failure and stationary states shown in the video.

In contrast, existing practices in IL-based bimanual manipulation usually require 20~50 high-quality teleoperation data (not retargeted human data) per task. In conclusion, data quality and quantity account for the bad performance of BC.

BC_dust_small.mp4

Dust

BC_empty_small.mp4

Empty

Diverse Bimanual Skills of Arctic tasks

arctic_mixer_holding.mp4

Mixer Holding

arctic_capsulemachine_grabbing.mp4

Capsulemachine Grabbing

arctic_box_flipping_0.8x.mp4

Box Flipping

arctic_ketchup_lifting.mp4

Ketchup Lifting

Page updated

Google Sites

Report abuse