Causality and Deep Learning: Synergies, Challenges and the Future

ICML 2022 Tutorial

Deep neural networks have achieved outstanding success in many tasks ranging from computer vision, to natural language processing, and robotics. However such models are still pale in their ability to understand the world around us, as well as generalizing and adapting to new tasks or environments. One possible solution to this problem are models that comprehend causality, since such models can reason about the connections between causal variables and the effect of intervening on them. However, existing causal algorithms are typically not scalable nor applicable to highly nonlinear settings, and they also assume that the causal variables are meaningful and given. Recently, there has been an increased interest and research activity at the intersection of causality and deep learning in order to tackle the above challenges, which use deep learning for the benefit of causal algorithms and vice versa. This tutorial is aimed at introducing the fundamental concepts of causality and deep learning for both audiences, providing an overview of recent works, as well as present synergies, challenges and opportunities for research in both fields.

Deep Learning for Causality

  • Modeling functional relationships

  • Learning distributions over graphs

  • Representations as rich compositions of learned features

  • Latent causal variables




Causality for Deep Learning

  • Why causality for DL

  • Benchmarks for causal learning in DL

  • Objectives & architectures for causal learning in DL

  • Using notions of causality to help DL



Due to space and time limitations we are unfortunately not able to reference every work on the slides of the tutorial directly. That is why we try to curate a list of import works for each chapter here. In case you think we missed a work or cite it wrongly, please reach out to us!

References


Introduction and Background


Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. DALL·E 2: Hierarchical Text-Conditional Image Generation with CLIP Latents. Arxiv, 2022


Christopher G Lucas, Sophie Bridgers, Thomas L Griffiths, Alison Gopnik. When children are better (or at least more open-minded) learners than adults: developmental differences in learning the forms of causal relationships. Cognition, 2013.

Eliza Kosoy, Adrian Liu, Jasmine Collins, David M Chan, Jessica B Hamrick, Nan Rosemary Ke, Sandy H Huang, Bryanna Kaufmann, John Canny, Alison Gopnik. Learning Causal Overhypotheses through Exploration in Children and Computational Models. CLeaR 2022.


Frederick Eberhardt. Almost optimal intervention sets for causal discovery. arXiv preprint arXiv:1206.3250, 2012.


Frederick Eberhardt, Clark Glymour, and Richard Scheines. On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables. arXiv preprint arXiv:1207.1389, 2012.

Christina Heinze-Deml, Marloes H Maathuis, and Nicolai Meinshausen. Causal structure learning. Annual Review of Statistics and Its Application, 2018.

Miguel Hernán, Jamie Robins. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC, 2020.

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan. Flamingo: a Visual Language Model for Few-Shot Learning. Arxiv, 2022.

Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.

Judea Pearl. Causality. Cambridge university press, 2009.

Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018.

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 2019.

Peter Spirtes et al. Causation, prediction, and search. MIT press, 2000.

Sara Beery, Grant van Horn, Pietro Perona. Recognition in Terra Incognita. ECCV, 2018.


Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel. End-to-end training of deep visuomotor policies. JMLR 2016



Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. Language Models are Few-Shot Learners. NeurIPS 2020

Yoshua Bengio, Aaron Courville, Pascal Vincent. Representation Learning: A Review and New Perspectives. ICLR 2014.


Causality for Deep Learning


Michael Ahn et al. "Do as i can, not as i say: Grounding language in robotic affordances." arXiv preprint arXiv:2204.01691 (2022).


Kartik Ahuja, Jason Hartford, and Yoshua Bengio. "Properties from mechanisms: an equivariance perspective on identifiable representation learning." NeurIPS (2021).


Martin Arjovsky et al. "Invariant risk minimization." arXiv preprint arXiv:1907.02893 (2019).


Ossama Ahmed, Frederik Träuble, Anirudh Goyal, Alexander Neitz, Yoshua Bengio, Bernhard Schölkopf, Manuel Wüthrich, and Stefan Bauer. Causalworld: A robotic manipulation benchmark for causal structure and transfer learning, ICLR, 2021.


Sara Beery, Grant Van Horn, and Pietro Perona. "Recognition in terra incognita." ECCV. 2018.


Matthew Botvinick et al. Reinforcement learning, fast and slow. Trends in cognitive sciences 2019.


Robert Geirhos, et al. "Shortcut learning in deep neural networks." Nature Machine Intelligence (2020).


Anirudh Goyal, et al. "Neural production systems." NeurIPS (2021).


Anirudh Goyal, et al. "Recurrent independent mechanisms." arXiv preprint arXiv:1909.10893 (2019).


Alison Gopnik, Clark Glymour, David M Sobel, Laura E Schulz, Tamar Kushnir, and David Danks. A theory of causal learning in children: causal maps and bayes nets. Psychological review, 2004.


Pim De Haan, Dinesh Jayaraman, and Sergey Levine. Causal confusion in imitation learning. NeurIPS, 2019


Christina Heinze-Deml, Jonas Peters, and Nicolai Meinshausen. Invariant causal prediction for nonlinear models. Journal of Causal Inference, 2018.


Brendan Lake and Marco Baroni. "Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks." ICML, 2018.


Francesco Locatello, et al. "Challenging common assumptions in the unsupervised learning of disentangled representations." ICML 2019.

Maximilian Ilse, Jakub M. Tomczak, and Patrick Forré. "Selecting data augmentation for simulating interventions." ICML 2021.


Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, and Christopher Pal. Systematic evaluation of causal discovery in visual model based reinforcement learning. arXiv preprint arXiv:2107.00848, 2021.


Nan Rosemary Ke, et al. "Sparse attentive backtracking: Temporal credit assignment through reminding." NeurIPS (2018).


Eliza Kosoy et al. "Learning Causal Overhypotheses through Exploration in Children and Computational Models." CLEAR, 2022.


David Krueger, et al. "Out-of-distribution generalization via risk extrapolation (rex)." ICML 2021.


Kanika Madan et al. "Fast and slow learning of recurrent independent mechanisms." ICLR (2021).


Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction:

identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2016.


Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 2021.


Jane X. Wang, et al. "Alchemy: A structured task distribution for meta-reinforcement learning." CoRR (2021).


Deep Learning for Causality


Raj Agrawal, Chandler Squires, Karren Yang, Karthikeyan Shanmugam, and Caroline Uhler. Abcd-strategy: Budgeted experimental design for targeted causal structure discovery. AISTATS, 2019.


Yashas Annadani, Jonas Rothfuss, Alexandre Lacoste, Nino Scherrer, Anirudh Goyal, Yoshua Bengio, and Stefan Bauer. Variational causal networks: Approximate bayesian inference over causal structures. arXiv preprint arXiv:2106.07635, 2021.


Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. A meta-transfer objective for learning to disentangle causal mechanisms. ICLR, 2020.


Rohit Bhattacharya, Tushar Nagarajan, Daniel Malinsky, and Ilya Shpitser. Differentiable causal discovery under unmeasured confounding. AISTATS, 2021.


Jorg Bornschein, Silvia Chiappa, Alan Malek, and Rosemary Nan Ke. Prequential mdl for causal structure learning with neural networks. arXiv preprint arXiv:2107.05481, 2021.


Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, and Alexandre Drouin. Differentiable causal discovery from interventional data. In H. Larochelle, M. Ranzato, R. Hadsell, NeurIPS, 2020.


Bertrand Charpentier, Simon Kibler, and Stephan Günnemann. Differentiable dag sampling. ICLR, 2021.


Chris Cundy, Aditya Grover, and Stefano Ermon. Bcd nets: Scalable variational approaches for bayesian causal discovery. NeurIPS, 2021.


Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, and Yoshua Bengio. Bayesian structure learning with generative flow networks. UAI, 2022.


Gonçalo Rui Alves Faria, Andre Martins, and Mario AT Figueiredo. Differentiable causal discovery under latent interventions. Clear, 2021.


Yinghua Gao, Li Shen, and Shu-Tao Xia. Dag-gan: Causal structure learning with generative adversarial nets. ICASSP, 2021.


Tomas Geffner, Javier Antoran, Adam Foster, Wenbo Gong, Chao Ma, Emre Kiciman, Amit Sharma, Angus Lamb, Martin Kukla, Nick Pawlowski, et al. Deep end-to-end causal inference. arXiv preprint arXiv:2202.02195, 2022.


Alexander Hägele, Jonas Rothfuss, Lars Lorch, Vignesh Ram Somnath, Bernhard Schölkopf, and Andreas Krause. Bacadi: Bayesian causal discovery with unknown interventions, arXiv preprint arXiv:2206.01665, 2022.


Fredrik Johansson, Uri Shalit, and David Sontag. "Learning representations for counterfactual inference." ICML 2016.


Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C Mozer, Chris Pal, and Yoshua Bengio. Learning neural causal models from unknown interventions. arXiv preprint arXiv:1910.01075, 2019.


Nan Rosemary Ke, Jane Wang, Jovana Mitrovic, Martin Szummer, Danilo J Rezende, et al. Amortized learning of neural causal representations. arXiv preprint arXiv:2008.09301, 2020.


Nan Rosemary Ke, Silvia Chiappa, Jane Wang, Jorg Bornschein, Theophane Weber, Anirudh Goyal, Matthew Botvinic, Michael Mozer, and Danilo Jimenez Rezende. Learning to induce causal structure. arXiv preprint arXiv:2204.04875, 2022.


Trent Kyono, Yao Zhang, and Mihaela van der Schaar. Castle: Regularization via auxiliary causal graph discovery. NeurIPS, 2020.


Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, and Simon Lacoste-Julien. Gradient-based neural dag learning. ICLR, 2020.


Hebi Li, Qi Xiao, and Jin Tian. Supervised whole dag causal discovery. arXiv preprint arXiv:2006.04697, 2020.


Phillip Lippe, Taco Cohen, and Efstratios Gavves. Efficient neural causal discovery without acyclicity constraints. ICLR, 2022.


David Lopez-Paz, Krikamol Muandet, Bernhard Schölkopf, and Iliya Tolstikhin. Towards a learning theory of cause-effect inference. ICML, 2015.


Lars Lorch, Jonas Rothfuss, Bernhard Schölkopf, and Andreas Krause. Dibs: Differentiable bayesian structure learning, NeurIPS, 2021.


Lars Lorch, Scott Sussex, Jonas Rothfuss, Andreas Krause, and Bernhard Schölkopf. Amortized inference for causal structure learning. arXiv preprint arXiv:2205.12934, 2022.


Christos, Louizos, et al. Causal effect inference with deep latent-variable models. NeurIPS 2017.


Sindy Löwe, David Madras, Richard Zemel, and Max Welling. Amortized causal discovery: Learning to infer causal graphs from time-series data. arXiv preprint arXiv:2006.10833, 2020.


Ignavier Ng, Zhuangyan Fang, Shengyu Zhu, Zhitang Chen, and Jun Wang. Masked gradient-based causal structure learning. arXiv preprint arXiv:1910.08527, 2019.


Ignavier Ng, Shengyu Zhu, Zhitang Chen, and Zhuangyan Fang. A graph autoencoder approach to causal structure learning. arXiv preprint arXiv:1911.07420, 2019.


Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. On the role of sparsity and dag constraints for learning linear dags. NeurIPS, 2020.


Roxana Pamfil, Nisara Sriwattanaworachai, Shaan Desai, Philip Pilgerstorfer, Konstantinos Georgatzis, Paul Beaumont, and Bryon Aragam. Dynotears: Structure learning from time-series data. AISTATS, 2020.


Nino Scherrer, Olexa Bilaniuk, Yashas Annadani, Anirudh Goyal, Patrick Schwab, Bernhard Schölkopf, Michael C Mozer, Yoshua Bengio, Stefan Bauer, and Nan Rosemary Ke. Learning neural causal models with active interventions. arXiv preprint arXiv:2109.02429, 2021.


Nino Scherrer, Anirudh Goyal, Stefan Bauer, Yoshua Bengio, and Nan Rosemary Ke. On the generalization and adaptation performance of causal models. arXiv preprint arXiv:2206.04620, 2022.


Chandler Squires and Caroline Uhler. Causal structure learning: a combinatorial perspective. arXiv preprint arXiv:2206.01152, 2022


Rissanen Severi and Pekka Marttinen. "A critical look at the consistency of causal estimation with deep latent variable models." NeurIPS, 2021.


Panagiotis Tigas, Yashas Annadani, Andrew Jesson, Bernhard Scholkopf, Yarin Gal, and Stefan Bauer. Interventions, where and how? experimental design for causal models at scale. arXiv preprint arXiv:2203.02016, 2022.


Matthew J Vowels, Necati Cihan Camgoz, and Richard Bowden. D’ya like dags? a survey on structure learning and causal discovery. arXiv preprint arXiv:2103.02582, 2021.


Xiaoqiang Wang, Yali Du, Shengyu Zhu, Liangjun Ke, Zhitang Chen, Jianye Hao, and Jun Wang. Ordering based causal discovery with reinforcement learning. arXiv preprint arXiv:2105.06631, 2021.


Dennis Wei, Tian Gao, and Yue Yu. Dags with no fears: A closer look at continuous optimization for learning bayesian networks. NeurIPS, 2020.


Yue Yu, Jie Chen, Tian Gao, and Mo Yu. Dag-gnn: Dag structure learning with graph neural networks. ICML, 2019.


Yue Yu, Tian Gao, Naiyu Yin, and Qiang Ji. Dags with no curl: An efficient dag structure learning approach. ICML, 2021.


Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. DAGs with NO TEARS: Continuous

optimization for structure learning. Advances in Neural Information Processing Systems, 2018.


Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric Xing. Learning sparse nonparametric dags. AISTATS, 2020.


Shengyu Zhu, Ignavier Ng, and Zhitang Chen. Causal discovery with reinforcement learning. ICLR, 2020


Future Directions


Stephan Bongers, Patrick Forré, Jonas Peters, and Joris M Mooij. Foundations of structural causal models with cycles and latent variables. The Annals of Statistics, 2021.


Yuge, Ji et al. Machine learning for perturbational single-cell omics. Cell Systems, 2021.


Alexander Reisach, Christof Seiler, and Sebastian Weichwald. Beware of the simulated dag! causal discovery benchmarks may be easy to game. NeurIPS 2021.






Questions, feedback or spotted a mistake?

Please reach out to us for any of the above at causalityanddeeplearning@gmail.com