Causal Triplet

Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning
Yuejiang Liu, Alexandre Alahi, Chris Russell, Max Horn, Dominik Zietlow, Bernhard Schölkopf, Francesco Locatello

Recent years have seen a surge of interest in learning high-level causal representations from low-level image pairs under interventions. Yet, existing efforts are largely limited to simple synthetic settings that are far away from real-world problems. We present CausalTriplet, a causal representation learning benchmark featuring not only visually more complex scenes, but also two crucial desiderata commonly overlooked in previous works: (i) an actionable counterfactual setting, where only certain (object-level) variables allow for counterfactual observations whereas others do not; (ii) an interventional downstream task with an emphasis on out-of-distribution robustness from the independent causal mechanisms principle. Through extensive experiments, we find that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts. However, recent causal representation learning methods still struggle to identify such latent structures, indicating substantial challenges and opportunities in CausalTriplet. 

Benchmark Design

Modeling paired observations under interventions has emerged as an important setting for causal representation learning. In contrast to prior efforts focused on the identification of latent variables, we introduce an intervention-centric downstream task, where a model learns to infer the type of intervention, presented as a high-level action, that takes place between a given pair of images. 

Solving the task (left figure) in the presence of distribution shifts (right figure) is challenging, as it requires the discovery of not only causal variables that can allow for interventions but also causal mechanisms behind interventions that remain invariant beyond the training distribution. This challenge is further compounded by the use of an egocentric camera attached to an embodied agent, which conceals much of the actor and thereby necessitates direct reasoning about the effects of actions on objects.

Causal Perspective

The diagram below shows a causal graph that describes a pair of scene observations before and after a high-level action. The data-generating process of each observation is characterized by a collection of latent factors, including global scene-level factors and local object-level factors. These latent factors are statistically dependent due to the presence of unobserved confounders. The high-level action is presumed only to influence one or a few object-level factors in the scene. The other latent factors may remain constant in photo-realistic simulations but vary over time in real-world observations.

Empirical Study

The main goal of our empirical study is to examine the potential and limitations of recent hypotheses and methods for causal representation learning. In particular, we seek to answer the following two questions:

To this end, we consider three classes of representations: (i) modern distributed representations, (ii) oracle structured representations, (iii) learned structured representations; and different experimental settings with growing complexities: (i) from compositional to systematic distribution shifts, (ii)  from single-object images to multi-object scenes, (iii) from photo-realistic simulations to real-world observations. 

Our results suggest the following key takeaways:

BibTex

@conference{Liu2023CausalTriplet,
title = {Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning},
author = {Liu, Y. and Alahi, A. and Russell, C. and Horn, M. and Zietlow, D. and Sch{\"o}lkopf, B. and Locatello, F.},
booktitle = {2nd Conference on Causal Learning and Reasoning (CLeaR)},
month = apr,
year = {2023},
month_numeric = {4}
}