iclr21-cycle-dynamics

Learning cross-domain correspondence for control with dynamics cycle-consistency

Abstract

At the heart of many robotics problems is the challenge of learning correspondences across domains. For instance, imitation learning requires obtaining correspondence between humans and robots, sim-to-real requires correspondence between physics simulators and real robots, transfer learning requires correspondences between different robot environments. In this work, we propose to learn correspondence across such domains with an emphasis on differing modalities (vision and internal state), physics parameters (mass and friction), and morphologies (number of limbs). Importantly, correspondences are learned using unpaired and randomly collected data from the two domains. To do so, we propose dynamics cycles that align dynamic robotic behavior across two domains using a cycle consistency constraint. Once this correspondence is found, we can directly transfer the policy trained on one domain to the other, without needing any additional fine-tuning on the second domain. We perform experiments across a variety of problem domains both in simulation and on real robots. Particularly, our framework is able to align uncalibrated monocular video of a real robot arm to dynamic state-action trajectories of a simulated arm without paired data.

cycle_dynamics_iclr21.mp4

Real Xarm Robot Experiment Visualization

Estimate robot arm joint pose without any labels. Left column: real image input; Middle column: our model prediciton; Right column: Cycle-GAN model prediction.

Cross morphology Policy Transfer

Cross-morphology experiment visualization: (a) train a policy on the two-leg cheetah and (b) test it on the three-leg cheetah. (c) Train a policy on three-limb swimmer and (d) test it on a four-limb swimmer. Please notice that there is no any new reward for fine-tuning during test time.

(a) Train a policy on the two-leg cheetah

(b) Test it on the three-leg cheetah

(d) Test it on three-limb swimmer

HalfCheetah Training Process Visualization

Estimated state visualization during training process: Left image: one random sampled image from the dataset. Middle video: estimate the state and rerender the image for different training iterations. Right curve: L1 error (between model output state and the groundtruth) changing curve during the training process: the horizontal axis represents iteration number and the vertical axis represents L1 error. Please notice that there is no any paired data for training.

HalfCheetah State Estimation Comparison

Self-supervised state estimation visualization: estimate the state from image observation and rerender the new image. (a): Our (Cycle-Dynamics) experiment results. (b): Cycle-GAN baseline results.

(a) Our(Cycle-Dynamics) results

(b) Cycle-GAN Baseline results

Page updated

Google Sites

Report abuse