CIMER-2024

Learning Prehensile Dexterity by Imitating and

Emulating State-only Observations

Yunhai Han, Zhenyang Chen and Harish Ravichandar

Georgia Institute of Technology

Abstract

When human acquire physical skills (e.g., tennis) from experts, we tend to first learn from merely observing the expert. But this is often insufficient. We then engage in practice, where we try to emulate the expert and ensure that our actions produce similar effects on our environment. Inspired by this observation, we introduce Combining IMitation and Emulation for Motion Refinement (CIMER) -- a two-stage framework to learn dexterous prehensile manipulation skills from state-only observations. CIMER's first stage involves imitation: simultaneously encode the complex interdependent motions of the robot hand and the object in a structured dynamical system. This results in a reactive motion generation policy that provides a reasonable motion prior, but lacks the ability to reason about contact effects due to the lack of action labels. The second stage involves emulation: learn a motion refinement policy via reinforcement that adjusts the robot hand's motion prior such that the desired object motion is reenacted. CIMER is both task-agnostic (no task-specific reward design or shaping) and intervention-free (no additional teleoperated or labeled demonstrations). Detailed experiments with prehensile dexterity reveal that i) imitation alone is insufficient, but adding emulation drastically improves performance, ii) CIMER outperforms existing methods in terms of sample efficiency and the ability to generate realistic and stable motions, iii) CIMER can either zero-shot generalize or learn to adapt to novel objects from the YCB dataset, even outperforming expert policies trained with action labels in most cases.

Video

CIMER Framework

CIMER is a two-stage framework to learn dexterous prehensile manipulation skills from state-only observations. CIMER's first stage involves imitation: simultaneously encode the complex interdependent motions of the robot hand and the object in a structured dynamical system. This results in a reactive motion generation policy that provides a reasonable motion prior, but lacks the ability to reason about contact effects due to the lack of action labels. The second stage involves emulation: learn a motion refinement policy to make adjustments to the motion prior of the robot hand such that the desired object motion is reenacted.

CIMER generates more realistic and stable motions

We qualitatively compare CIMER with another two baselines on three dexterous prehensile manipulation tasks, and observed that baselines tend to exploit the simulator and generate unrealistic or unsafe motions across these tasks. In stark contrast, CIMER could generate more realistic and stable motions. Below we show the exemplar rollouts, and the quantitative training performance could be found in the paper.

Pure RL Policies (Baseline1)

Tool Use

Object Relocation

Door Opening

SOIL Policies (Baseline2)

Tool Use

Object Relocation

Door Opening

CIMER Policies (Ours)

Tool Use

Object Relocation

Door Opening

CIMER outperforms Expert in Zero-shot generalization for relocating novel objects

We also evaluated CIMER and the expert (trained with action labels) on their ability to generalize to 17 novel objects, and observed that CIMER exhibited comparable or even superior performance to the expert on the majority of these novel objects. Below we show the exemplar rollouts on these objects, and the quantitative evaluation results could be found in the paper.

CIMER significantly outperforms the expert on 4 out of 17 objects

CIMER Policies

Gelatin box

Mug

Mustard bottle

Sugar box

Expert Policies

CIMER exhibits comparable performance to the expert on 6 out of 17 objects

CIMER Policies

Banana

Foam brick

Large clamp

Potted meat can

Tomato soup can

Tuna fish can

Expert Policies

CIMER and the expert both show poor performance on 7 out of 17 objects

CIMER Policies

Cracker box

Cube

Cylinder

Master chef can

Power drill

Pudding box

Small ball

Expert Policies

Skill Transfer: Fine-tune CIMER policies to novel objects

We also evaluated if we could transfer the skills learned on the default object to novel objects. We fine-tuned CIMER's motion refinement policy on six novel objects, each of which resulted in poor performance during zero-shot generalization. Below we show the exemplar rollouts of fine-tuned CIMER policies, and the quantitative fine-tuning performance could be found in the paper.

Fine-tuned CIMER policies perform well on the novel objects

Cracker box

Cube

Cylinder

Master chef can

Pudding box

Small ball

Bibtex

@article{han2024CIMER, title={Learning Prehensile Dexterity by Imitating and Emulating State-only Observations}, author={Han, Yunhai and Chen, Zhenyang and Ravichandar, Harish}, journal={arXiv preprint arXiv:2404.05582}, year={2024}}