Weakly Supervised Correspondence Learning

Zihan Wang*, Zhangjie Cao*, Yilun Hao, Dorsa Sadigh

Paper / Code / ICRA 2022 Talk

  • * denotes equal contribution

Abstract

Correspondence learning is a fundamental problem in robotics, which aims to learn a mapping between state, action pairs of agents of different dynamics or embodiments. However, current correspondence learning methods either leverage strictly paired data---which are often difficult to collect---or learn in an unsupervised fashion from unpaired data using regularization techniques such as cycle-consistency---which suffer from severe misalignment issues. In this paper, we propose a weakly supervised correspondence learning approach that trades off between strong supervision over strictly paired data and unsupervised learning with a regularizer over unpaired data. Our idea is to leverage two types of weak supervision: i) temporal ordering of states and actions to reduce the compounding error, and ii) paired abstractions, instead of paired data, to alleviate the misalignment problem and learn a more accurate correspondence. The two types of weak supervision are easy to access in real-world applications, which simultaneously reduces the high cost of annotating strictly paired data and improves the quality of the learned correspondence. Our experimental results in Mujoco simulation, a simulated robot, and a real robot environment show that our method substantially outperforms prior works in various correspondence learning settings including cross-morphology, cross-physics, and cross-modality.


An example of the paired abstractions. Given two trajectories of a four- and five-legged ant robots, it is difficult to decide whether two full states that include joint angles of each agent are aligned, while it is easy to align simpler abstractions over these states such as whether the ants have the same spatial location.

Weakly Supervised Correspondence Learning

Multi-Step Dynamics Cycle-Consistency

Even a small error for the state map and the action maps at each step will cause a large deviation in a long horizon because current dynamic cycle consistency method only enforces one-step consistency and the error can accumulate across time steps given no constraint. To address this problem, we use the weak supervision of consecutive states and actions to enforce the dynamics cycle-consistency over multiple steps. The detailed formulation can be found in our paper.

We conduct an experiment on the performance of translation with respect to the final horizon in the HalfCheetah environment. The results are shown in the left figure. We observe that the performance of translation increases with a longer horizon at first but saturates from horizon 5 onwards. The observation indicates that we can treat the final horizon as a hyperparameter and tune it by gradually increasing the horizon until when the performance saturates.

Learning Correspondence by Weak Supervision

The current dynamic cycle consistency method still suffers from misalignment issues. The misalignment issue can occur without strong supervision of paired data. However, strictly paired data is often difficult to collect, and we thus aim for weakly supervised correspondence learning. We adopt weak supervision from paired abstractions over states or state-action pairs, where a similarity metric is defined on the abstractions.

The key difference between strictly paired data and paired abstractions is that strictly paired data need to comprehensively assess all the aspects of the two states or state-action pairs, which is difficult to collect. On the other hand, paired abstractions only consider similarities over an abstraction of the state, which are thus easier to annotate. The results of compounding error improvements are shown in left figute

Experimental Details

Visualization of Environments

Cross-Morphology, Mujoco Environments:

Ant-v3

CC

DCC-1

DCC-2

WeaSCL

Swimmer-v3

CC

DCC-1

DCC-2

WeaSCL

Cross-Morphology, Simulated Robot Environment:

(The red dot denotes the final reached position of the end effector.)

CC

DCC-1

DCC-2

WeaSCL

Cross-Physics, Mujoco Environments:

Hopper-v3

Direct

DCC-1

DCC-2

WeaSCL

Walker-v3

Direct

DCC-1

DCC-2

WeaSCL

Cross-Modality, Real Robot Environment:

DCC-1

WeaSCL-1

WeaSCL-5

Network Implementation Details

For the network architecture, we use three-layer fully-connected network to model similarity function and all the networks in the translation model in the Mujoco environments and simulated robot environment. In the real robot environment. we use two convolutional layers and three fully-connected layers as an image encoder and a three-layer fully-connected network as a state encoder. The outputs of both encoders are concatenated and input to a three-layer fully-connected network to output a similarity value. We use a similar architecture for the forward dynamics model. We use a convolution network for Phi.

Training Details

Cross-Morphology, Mujoco Environments: We train the similarity function for 100 epochs, the forward dynamics model for 10 epochs, and the translation model for 30 epochs. We use Adam optimizer with a learning rate 0.001 for all the model training. For the trade-offs parameters, we use lambda_0=15, ambda_1, lambda_2 = 1 and lambda_3 = 10.

Cross-Morphology, Simulated Robot Environment: We use Adam optimizer with a learning rate 0.001 for all the model trainings. We train the similarity function for 100 epochs, the forward dynamics model for 10 epochs, and the translation model for 30 epochs. We use Adam optimizer with a learning rate 0.001 for all the model training. For the trade-offs parameters, we use lambda_0=30, lambda_1, lambda_2 = 1 and lambda_3 = 10.

Cross-Physics, Mujoco Environments:

We sample 10,000 pairs of state-action pairs with similarity in the Hopper environment and sample 40,000 pairs of state-action pairs with similarity in the Walker2d environment to learn the confidence similarity functions.

We train the similarity function for 10 epochs, the forward dynamics model for 20 epochs, and the translation model for 30 epochs. We use Adam optimizer with a learning rate 0.001 for all the model trainings. We use trade-offs parameters lambda^0=20, lambda_1=1, lambda_2=1, lambda_3=30.

Cross-Modality, Real Robot Environment:

We train the similarity function for 100 epochs, the forward dynamics model for 30 epochs, and the translation model for 50 epochs. We use Adam optimizer with a learning rate 0.001 for all the model training. We use trade-off parameters lambda_0 = 10, lambda_1, lambda_2 = 1, and lambda_3 = 15.

Summary

Summary. We propose a weakly supervised correspondence learning approach (WeaSCL) that leverages weak supervision in the form of temporal ordering and paired abstraction data. This eases the need for expensive paired data, and enables more accurate correspondence learning. Experiment results show that WeaSCL outperforms the state-of-the-art correspondence learning methods based on unpaired data.

Limitations and Future Work. Though we leverage the easy-to-access weak supervision to improve correspondence learning, this type of supervision still requires domain knowledge or human experts to annotate. One potential future direction is to learn from unlabeled or unpaired data. In the future, we also plan to automatically detect the abstraction needed for weak supervision.