Causal Reasoning in Simulation for Structure and Transfer Learning
of Robot Manipulation Policies


Tabitha Edith Lee, Jialiang (Alan) Zhao, Amrita S. Sawhney, Siddharth Girdhar, and Oliver Kroemer
The Robotics Institute, Carnegie Mellon University

Accepted to the 2021 IEEE International Conference on Robotics and Automation (ICRA 2021).

[IEEE Xplore] [arXiv] [Talk]

Which features are important for learning a control policy for a task?

We answer this question using CREST, our approach for causal feature selection with an internal model of a task. Our key insight is that under certain assumptions, first identifying which variables are important (i.e., the policy inputs) may be advantageous before learning how they matter (i.e., the learned policy function).

Abstract

We present CREST, an approach for causal reasoning in simulation to learn the relevant state space for a robot manipulation policy. Our approach conducts interventions using internal models, which are simulations with approximate dynamics and simplified assumptions. These interventions elicit the structure between the state and action spaces, enabling construction of neural network policies with only relevant states as input. These policies are pretrained using the internal model with domain randomization over the relevant states. The policy network weights are then transferred to the target domain (e.g., the real world) for fine tuning. We perform extensive policy transfer experiments in simulation for two representative manipulation tasks: block stacking and crate opening. Our policies are shown to be more robust to domain shifts, more sample efficient to learn, and scale to more complex settings with larger state spaces. We also show improved zero-shot sim-to-real transfer of our policies for the block stacking task.

CREST is one approach of a broader methodology of "structural sim-to-real" transfer: causal structure and transfer learning from simulation. In this illustration, the policy structure has first been learned using causal reasoning with the internal model. Here, the Reduced MLP network encodes only the relevant, causal features (τ) that generalize a given low-level controller (parameterized by θ) across task instances. A non-causal approach would instead use the entire input features (c), rather than the causal features (τ), but such inputs would cause policy brittleness if distribution shifts occur in these variables when transferring from simulation to reality.

Illustration of how CREST uses causal reasoning through interventions with an internal model to identify the causal features for a policy that solves a task. Depicted is the second stage of CREST, wherein the relevant variable set (τ) is selected.

Citation

Please cite our work if you use it for your research. Thank you!

@inproceedings{lee2021icra:crest,

title={Causal Reasoning in Simulation for Structure and Transfer Learning of Robot Manipulation Policies},
author={Lee, Tabitha Edith and Zhao, Jialiang (Alan) and Sawhney, Amrita S. and Girdhar, Siddharth and Kroemer, Oliver},

booktitle={2021 IEEE International Conference on Robotics and Automation (ICRA)},

year={2021},

organization={IEEE},

url={https://sites.google.com/view/crest-causal-struct-xfer-manip},

}