Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

ICLR 2024

One-To-Three Transfer of DexYCB Task Manipulation Policy to Real Commercial Robots

To show that our Meta-Evolve can be applied to real robots and real-world tasks, we conduct an additional set of experiments of transferring an object manipulation policy to multiple real commercial robots.

The source robot is the same ADROIT hand robot. The three target robots are as follows:

Jaco: Jaco is a 7-DoF robot produced by Kinova Robotics. It is equipped with the Jaco Three-Finger Gripper, a three-finger gripper with multi-jointed fingers.
Kinova3: Kinova3 is a 7-DoF robot produced by Kinova Robotics. It is equipped with the Robotiq-85 Gripper, the 85mm variation of Robotiq’s multi-purpose two-finger gripper.
IIWA: IIWA is an industrial-grade 7-DoF robot produced by KUKA. It is equipped with the Robotiq-140 Gripper, the 140mm variation of Robotiq’s multi-purpose two-finger gripper.

We follow the high-fidelity robot models introduced in Zhu et al. (2020) for the detailed physical specifications of the target robots to minimize the sim-to-real gap. The task is the manipulation task shown in DexYCB dataset. The goal of the robot is to pick up the object and take it to the desired goal position. The task is considered success if the distance from the object to the goal is sufficiently small. The reward function is sparse task completion reward. The source expert policy is trained by learning from the human hand demonstrations in DexYCB dataset.

We conduct real-world experiment and deploy one of the target robot policies on Kinova3 on the real machine. The real robot demo is at the end of the video.

Select 1080p for the best quality of the video:

008_pudding_box_video.mp4

One-To-Three HMS Task Manipulation Policy Transfer

We utilize the five-finger ADROIT dexterous hand as the source robot and follow Rajeswaran et al. (2018) for the initial settings. The target robots are three robot grippers with two, three, and four fingers respectively. The target robots can be produced by gradually shrinking the fingers of the source robot.

We use the three tasks from the the task suite in Rajeswaran et al. (2018): Door, Hammer and Relocate illustrated in Figure 3. In Door task, the goal is to turn the door handle and fully open the door; in Hammer task, the goal is to pick up the hammer and smash the nail into the board; in Relocate task, the goal is to pick up the ball and take it to the target position. The source expert policy was trained by learning from human demonstrations collected from VR-empowered sensor glove.

Select 1080p for the best quality of the video:

hms_video.mp4

One-To-Six Agile Locomotion Policy Transfer

To show that our Meta-Evolve can generalize to diverse tasks and robot morphology, we conduct additional policy transfer experiments on an agile locomotion task. The goal of the robot is to move out of the maze from the starting position. The source robot is the Ant-v2 robot used in MuJoCo Gym. The six target robots are four-legged agile locomotion robots with different lengths of torsos, thickness of legs, and widths of hips and shoulders.

Select 1080p for the best quality of the video:

gym_video.mp4