What Went Wrong?
Closing the Sim-to-Real Gap via Differentiable Causal Discovery

Conference on Robot Learning (CoRL), 2023

Peide Huang, Xilun Zhang*, Ziang Cao*, Shiqi Liu*,

Mengdi Xu, Wenhao Ding, Jonathan Francis, Bingqing Chen, Ding Zhao

Carnegie Mellon University, Bosch Center for Artificial Intelligence

arXiv

PDF

Code

Abstract

Training control policies in simulation is more appealing than on real robots directly, as it allows for exploring diverse states in a safe and efficient manner. Yet, robot simulators inevitably exhibit disparities from the real world, yielding inaccuracies that manifest as the simulation-to-real gap. Existing literature has proposed to close this gap by actively modifying specific simulator parameters to align the simulated data with real-world observations. However, the set of tunable parameters is usually manually selected to reduce the search space in a case-by-case manner, which is hard to scale up for complex systems and requires extensive domain knowledge. To address the scalability issue and automate the parameter-tuning process, we introduce an approach that aligns the simulator with the real world by discovering the causal relationship between the environment parameters and the sim-to-real gap. Concretely, our method learns a differentiable mapping from the environment parameters to the differences between simulated and real-world robot-object trajectories. This mapping is governed by a simultaneously-learned causal graph to help prune the search space of parameters, provide better interpretability, and improve generalization. We perform experiments to achieve both sim-to-sim and sim-to-real transfer, and show that our method has significant improvements in trajectory alignment and task success rate over strong baselines in a challenging manipulation task.

Motivation

Robot simulators are inevitably different from the real world dynamics

Simulation

Success ✅

Direct

transfer

⇨

Real world

Failure ❌

Overview

In this work, we propose a method that aims to align the simulator with the real world by discovering the causality between envrionment parameters and the sim-to-real gap (COMPASS). COMPASS learns a differentiable mapping, from the simulation environment parameters to the differences between simulated and real-world trajectories of dynamic robot-object interactions, governed by a simultaneously-learned causal graph. With the differentiable causal model fixed, COMPASS back-propagates gradients to optimize the simulation environment parameters in an end-to-end manner to reduce the domain gaps. Beyond the interpretability, the causal graph also helps to prune the parameter search space, thus improving the efficiency of domain randomization as well as the scalability. We summarize our contributions as follows:

We propose a new causality-guided parameter estimation framework to close the sim-to-real gap and improve agent performance in the real world.
We design a fully-differentiable model that explicitly embeds the causal structure to provide better interpretability, prune the search space of parameters, and improve generalization.
We empirically evaluate our method in both the simulation and the real world, which outperforms baselines in terms of trajectory alignment and task success rate with the same sample size.

COMPASS model architecture

In a standard fully-connected multilayer perceptron (MLP), the input is treated as a whole and input into the first linear layer. It blends all information into the features of subsequent layers, making it difficult to separate the cause and effects. To highlight the difference between our model and traditional MLP, we plot the detailed model architecture of the COMPASS model below.

Experiments

Simulation setup

Real setup

Learned causal graph parameters

Mini-Air-hockey

Double-bouncing-ball

The learned causal graph parameters after 2 iterations are shown above (darker colors present values closer to 1). We observe that the learned causal graph is very sparse, reducing the search space by orders of magnitude without extensive domain knowledge. Our method is able to automatically discover different types of relevant environment parameters such as actuation, sensing, and dynamics.

Environment parameters optimization

Trajectory-aligning Results

(a)(b) Top-view visualization of sampled trajectory aligning results using EXI-Net and COMPASS, respectively. The overlapping of trajectories and the obstacle are due to the camera bias. (c) Mean trajectory difference throughout 10 iterations, evaluated using 5 random seeds and 10 pairs of rollouts per seed. It's observed that our COMPASS method (b) outperforms all the baselines in terms of the trajectory difference using the same number of "real" and simulation rollouts.