Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer
ICLR 2024
ICLR 2024
ICLR 2024 Conference Paper: arXiv
Code: GitHub
Slides: ICLR.cc
Remote Presentation: SlidesLive.com
Contact: xingyul3 {at} cs {dot} cmu {dot} edu
We investigate the problem of transferring an expert policy from a source robot to multiple different robots. To solve this problem, we propose a method named Meta-Evolve that uses continuous robot evolution to efficiently transfer the policy to each target robot through a set of tree-structured evolutionary robot sequences. The robot evolution tree allows the robot evolution paths to be shared, so our approach can significantly outperform naive one-to-one policy transfer. We present a heuristic approach to determine an optimized robot evolution tree. Experiments have shown that our method is able to improve the efficiency of one-to-three transfer of manipulation policy by up to 3.2× and one-to-six transfer of agile locomotion policy by 2.4× in terms of simulation cost over the baseline of launching multiple independent one-to-one policy transfers.
To show that our Meta-Evolve can be applied to real robots and real-world tasks, we conduct an additional set of experiments of transferring an object manipulation policy to multiple real commercial robots.
The source robot is the same ADROIT hand robot. The three target robots are as follows:
Jaco: Jaco is a 7-DoF robot produced by Kinova Robotics. It is equipped with the Jaco Three-Finger Gripper, a three-finger gripper with multi-jointed fingers.
Kinova3: Kinova3 is a 7-DoF robot produced by Kinova Robotics. It is equipped with the Robotiq-85 Gripper, the 85mm variation of Robotiq’s multi-purpose two-finger gripper.
IIWA: IIWA is an industrial-grade 7-DoF robot produced by KUKA. It is equipped with the Robotiq-140 Gripper, the 140mm variation of Robotiq’s multi-purpose two-finger gripper.
We follow the high-fidelity robot models introduced in Zhu et al. (2020) for the detailed physical specifications of the target robots to minimize the sim-to-real gap. The task is the manipulation task shown in DexYCB dataset. The goal of the robot is to pick up the object and take it to the desired goal position. The task is considered success if the distance from the object to the goal is sufficiently small. The reward function is sparse task completion reward. The source expert policy is trained by learning from the human hand demonstrations in DexYCB dataset.
We conduct real-world experiment and deploy one of the target robot policies on Kinova3 on the real machine. The real robot demo is at the end of the video.
Select 1080p for the best quality of the video:
We utilize the five-finger ADROIT dexterous hand as the source robot and follow Rajeswaran et al. (2018) for the initial settings. The target robots are three robot grippers with two, three, and four fingers respectively. The target robots can be produced by gradually shrinking the fingers of the source robot.
We use the three tasks from the the task suite in Rajeswaran et al. (2018): Door, Hammer and Relocate illustrated in Figure 3. In Door task, the goal is to turn the door handle and fully open the door; in Hammer task, the goal is to pick up the hammer and smash the nail into the board; in Relocate task, the goal is to pick up the ball and take it to the target position. The source expert policy was trained by learning from human demonstrations collected from VR-empowered sensor glove.
Select 1080p for the best quality of the video:
To show that our Meta-Evolve can generalize to diverse tasks and robot morphology, we conduct additional policy transfer experiments on an agile locomotion task. The goal of the robot is to move out of the maze from the starting position. The source robot is the Ant-v2 robot used in MuJoCo Gym. The six target robots are four-legged agile locomotion robots with different lengths of torsos, thickness of legs, and widths of hips and shoulders.
Select 1080p for the best quality of the video:
If you find our dataset useful in your research, please cite the following:
@inproceedings{meta:evolve:liu:2024,
title="{Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer}",
author={Xingyu Liu and Deepak Pathak and Ding Zhao},
booktitle={International Conference on Learning Representations (ICLR)},
year={2024},
}