Causal Reasoning in Simulation for Structure and Transfer Learning
of Robot Manipulation Policies


Tabitha Edith Lee, Jialiang (Alan) Zhao, Amrita S. Sawhney, Siddharth Girdhar, and Oliver Kroemer
The Robotics Institute, Carnegie Mellon University

Accepted to the 2021 IEEE International Conference on Robotics and Automation (ICRA 2021).

[IEEE Xplore] [arXiv] [Talk]

Below is a list of frequently asked questions (FAQ) related to this work. If you have any questions, please contact the authors and we would be happy to update this list accordingly.

What do you mean by "structure"?

We are referring to "structure" in the causality sense of the word. CREST is learning the causal factors that a control policy should have as input in order to generalize across task instances. The causal factors arise as variables in the underlying causal structure of the task from perception, control, to obtained reward. This structure can be represented as a structural causal model, usually taking the form of a directed acyclic graph. For this work, CREST is not learning the underlying graph, but instead only selecting the features that are used within this graph. Then, a policy is trained to learn the relationship between these features and the action the robot should take.


What assumptions are made in this work?

In this work, our approach learns generalizable policies under the following assumptions:

  1. an internal model (approximate simulator) of the task exists;

  2. the robot can conduct interventions within this internal model to construct scenes needed for causal reasoning (this assumption generally requires that the internal model has a causal/disentangled representation);

  3. the robot is given a low-level control policy that handles low-level actuation (e.g., a parameterized trajectory generator) and the preconditions of this low-level policy are known.


What are the preconditions of the low-level control policy? Why are these important?

The preconditions of the low-level control policy indicate under what state conditions a low-level policy should be successful. This is important for our work, because the causal factors that are learned by CREST are only true while these preconditions are also true. For example, in the block stacking task, one precondition is that there are no obstructions to robot movement. This eliminates cases where the robot would need to move out of the way, as in that case, there would be other causal factors involved (e.g., the geometry of the obstructing blocks). This helps keep the policies that are learned in this work small and compact, as we do not expect to generalize to state conditions where the preconditions no longer hold.


Where would the internal model come from? Must it be provided by a human?

For this work, the internal model is human-specified, but that isn't a requirement --- in principle, the internal model can be learned! All that is needed is a model that facilitates causal interventions. Learning such models is an area of ongoing work within the "world models", "intuitive physics", and "causal representation learning" research areas.


Do you have plans to release the code?

Yes! We plan on releasing the CREST code as a part of a code release for an upcoming follow-up work. If you would like to be notified when the code for this work is released, please contact the authors to let us know.