Causal Induction from Visual Observations for Goal-Directed Tasks

Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Stanford University

Paper/Code

  • In this work, we propose to endow a learning-based interactive agent with the capacity of causal reasoning for completing goal-directed tasks in visual environments, which we cast as a meta-learning problem of two phases.

  • In the first stage, we use a causal induction model to construct a causal structure, i.e., a directed acyclic graph of random variables, from observational data from an agent’s interventions.

  • In the second stage, we use the causal structure to contextualize a goal-conditional policy to perform the task given a goal.

  • Our key insight is that by leveraging iterative predictions and attention bottlenecks, our causal induction model and goal-conditioned policy can focus on the relevant part of the causal graph, leading to better downstream generalization.

  • We propose an iterative causal induction network (left), which iteratively updates the graph for each observed transition while using an attention bottleneck to make isolated updates to the graph.

  • Similarly, we propose an attention augmented policy (right), which uses an attention bottleneck to select what portion of the causal graph to focus on when taking each new action.

Iterative Causal Induction Network

Attention Augmented Policy

We observe that our proposed approach leads to significantly improved task success rate on downstream environments with unseen causal structures. In particular, our method is able to generalize to the unseen environments while using less seen training environments.