Title: Definite Non-Ancestral Relations and Structure Learning
Authors: Wenyu Chen (University of Washington); Mathias Drton (Technical University of Munich); Ali Shojaie (University of Washington)
Abstract: In causal graphical models based on directed acyclic graphs (DAGs), directed paths represent causal pathways between the corresponding variables. The variable at the beginning of such a path is referred to as an ancestor of the variable at the end of the path. Ancestral relations between variables play an important role in causal modeling. In existing literature on structure learning, these relations are usually deduced from learned structures and used for orienting edges or formulating constraints of the space of possible DAGs. However, they are usually not posed as immediate target of inference. In this work we investigate the graphical characterization of ancestral relations via CPDAGs and d-separation relations. We propose a framework that can learn definite non-ancestral relations without first learning the skeleton. This frame-work yields structural information that can be used in both score- and constraint-based algorithms to learn causal DAGs more efficiently.
Title: Causal Gradient Boosting: Boosted Instrumental Variables Regression
Authors: Edvard Bakhitov (University of Pennsylvania); Amandeep Singh (The Wharton School, University of Pennsylvania)
Abstract: Recent advances in the literature have demonstrated that standard supervised learning algorithms are ill-suited for problems with endogenous explanatory variables. To correct for the endogeneity bias, many variants of nonparameteric instrumental variable regression methods have been developed. In this paper, we propose an alternative algorithm called boostIV that builds on the traditional gradient boosting algorithm and corrects for the endogeneity bias. The algorithm is very intuitive and resembles an iterative version of the standard 2SLS estimator. We demonstrate that our estimator is consistent under mild conditions and demonstrates an outstanding finite sample performance.
Title: Heterogeneous effects of waste pricing policies
Author: Marica S Valente (Humboldt University Berlin and DIW Berlin)
Abstract: Using machine learning methods in a quasi-experimental setting, I study the heterogeneous effects of introducing waste prices—unit prices on household unsorted waste disposal—on waste demands and social welfare. First, using a unique panel of Italian municipalities with large variation in prices and observables, I show that waste demands are nonlinear. I find evidence of nudge effects at low prices, and increasing elasticities at high prices driven by income effects and waste habits before policy. Second, I estimate policy impacts on pollution and municipal management costs and compute the overall social cost savings for each municipality. Social welfare effects become positive for most municipalities after three years of adoption when waste prices cause significant waste avoidance.
Title: Providing Causal-Aware Counterfactual Explanations via Latent Space Representation
Authors: Riccardo Crupi (Intesa Sanpaolo); Alessandro Castelnovo (Intesa Sanpaolo); Beatriz San Miguel Gonzalez (Fujitsu Laboratories of Europe); Daniele Regoli (Intesa Sanpaolo)
Abstract: In the field of Explainable Artificial Intelligence (XAI), counterfactual explanations are local (example-based) and mostly model-agnostic (independent of the type of model) statements that communicate to end-users how their characteristics should change to receive a different outcome from an AI model (e.g. a loan approval). In this paper, we present a method called Counterfactual Explanations as Interventions on Latent Space (CEILS) to generate more feasible counterfactual explanations and recommendations. CEILS has the advantage of leveraging the underlying causal relations by design and it can be set on top of standard counterfactual generators, thus avoiding the complexity of current approaches. The explanations are found in the latent space of variables defined by the residuals of an Additive Noise Model over the input space variables and its Structural Causal Model. We test our approach with a synthetic dataset.
Title: On Equivalence of Causal Models
Authors: Jun Otsuka (Kyoto University); Hayato Saigo (Nagahama Institute of Bio-Science and Technology)
Abstract: We develop a category-theoretic criterion for determining the equivalence of causal models having different sets of variables or graphs. Following Jacobs et al. (2019), we define a causal model as a probabilistic interpretation of a causal string diagram, i.e. a functor from “syntactic” category Syn_G of graph G to category Stoch of finite sets and stochastic matrices. The equivalence of causal models is then defined in terms of a natural transformation or isomorphism between two such functors. We illustrate this idea first with models with different variables (but the same graph) and second with different (but homomorphic) graphs.
Title: Condition Number Balancing for Causal Continuous Treatment-Effect Estimation
Authors: Taha Bahadori (Amazon); Eric Tchetgen Tchetgen (The Wharton School, University of Pennsylvania); David Heckerman (Amazon)
Abstract: We study the problem of observational causal inference with continuous treatment. We focus on the challenge of estimating the causal response curve for infrequently observed treatment values. We design a new algorithm based on the framework of entropy balancing which learns weights that directly maximize causal inference accuracy using end-to-end optimization. Our weights can be customized for different datasets and causal inference algorithms. We propose a new theory for consistency of entropy balancing for continuous treatments. Using synthetic and real-world data, we show that our proposed algorithm outperforms the entropy balancing in terms of causal inference accuracy.
Title: Equality Constraints in Linear Hawkes Processes
Author: Søren W Mogensen (Lund University)
Abstract: Multivariate linear Hawkes processes are convenient for modeling mutually exciting streams of events. One can use tests of so-called local independence to learn a graphical representation of point processes such as linear Hawkes processes. Using an integrated measure of covariance between coordinate processes of a linear Hawkes process, we show that these models are structurally similar to classical linear structural equation models in a certain sense. This allows us to find equality constraints that are not described by local independence which enables a more refined constraint-based learning of causal graphs of partially observed linear Hawkes processes.
Title: Efficient Neural Causal Discovery without Acyclicity Constraint
Authors: Phillip Lippe (University of Amsterdam); Taco Cohen (Qualcomm); Efstratios Gavves (University of Amsterdam)
Abstract: Learning the structure of a causal graphical model using both observational and interventional data is a fundamental problem in many scientific fields. A promising direction is continuous optimization for score-based methods, which efficiently learns the causal graph in a data-driven manner. However, to date, those methods require constrained optimization to enforce acyclicity or lack convergence guarantees. In this paper, we present ENCO, an efficient structure learning method leveraging observational and interventional data. ENCO formulates the graph search as an optimization of independent edge likelihoods with the edge orientation being modeled as a separate parameter. Consequently, we can provide convergence guarantees of ENCO under mild conditions without constraining the score function with respect to acyclicity. In experiments, we show that ENCO can efficiently recover graphs with hundreds of nodes, an order of magnitude larger than what was previously possible.
Title: Efficient inference of interventional distributions
Authors: Arnab Bhattacharyya (National University of Singapore); Sutanu Gayen (National University of Singapore); Saravanan Kandasamy (Cornell University); Vedant Raval (IIT Delhi); N. V. Vinodchandran (University of Nebraska)
Abstract: We consider the problem of efficiently inferring interventional distributions in a causal Bayesian network from a finite number of observations. Let $\mathcal{P}$ be a causal model on a set $\mathbf{V}$ of observable variables on a given causal graph $G$. For sets $\mathbf{X},\mathbf{Y}\subseteq \mathbf{V}$, and setting ${\bf x}$ to $\mathbf{X}$, let $P_{\bf x}(\mathbf{Y})$ denote the interventional distribution on $\mathbf{Y}$ with respect to an intervention ${\bf x}$ to variables $\mathbf{X}$. Shpitser and Pearl (AAAI 2006), building on the work of Tian and Pearl (AAAI 2001), gave an exact characterization of the class of causal graphs for which the interventional distribution $P_{\bf x}({\mathbf{Y}})$ can be uniquely determined. We give the first efficient version of the Shpitser-Pearl algorithm. In particular, under natural assumptions, we give a polynomial-time algorithm that on input a causal graph $G$ on observable variables ${\mathbf V}$, a setting ${\bf x}$ of a set $\mathbf{X} \subseteq {\mathbf V}$, outputs succinct descriptions of both an evaluator and a generator for a distribution $\hat{P}$ that is $\varepsilon$-close (in total variation distance) to $P_{\bf x}({\mathbf{Y}})$ where $\mathbf{Y}=\mathbf{V}\setminus \mathbf{X}$, if $P_{\bf x}(\mathbf{Y})$ is identifiable. We also show that when $\mathbf{Y}$ is an arbitrary set, there is no efficient algorithm that outputs both an evaluator and a generator of a distribution that is $\varepsilon$-close to $P_{\bf x}({\mathbf{Y}})$ unless all problems that have statistical zero-knowledge proofs, including the Graph Isomorphism problem, have efficient randomized algorithms. Thus it is unlikely that the Shpitser-Pearl algorithm in its generality can be efficiently implemented as both an evaluator and a generator for arbitrary sets $\mathbf{Y}$.
Title: Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu
Authors: Arnab Bhattacharyya (National University of Singapore); Sutanu Gayen (National University of Singapore); Eric Price (University of Texas at Austin); N. V. Vinodchandran (University of Nebraska)
Abstract: We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution $P$ on $\Sigma^n$ and a tree $T$ on $n$ nodes, we say $T$ is an $\varepsilon$-approximate tree for $P$ if there is a $T$-structured distribution $Q$ such that $\kl{P}{Q}$ is at most $\varepsilon$ more than the best possible tree-structured distribution for $P$. We show that if $P$ itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with $\widetilde{O}(|\Sigma|^3 n\varepsilon^{-1})$ i.i.d. samples outputs an $\varepsilon$-approximate tree for $P$ with constant probability. In contrast, for a general $P$ (which may not be tree-structured), $\Omega(n^2\varepsilon^{-2})$ samples are necessary to find an $\varepsilon$-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart~(STOC, 2018): we prove that for three random variables $X,Y,Z$ each over $\Sigma$, testing if $I(X; Y \mid Z)$ is $0$ or $\geq \varepsilon$ is possible with $\Ot(|\Sigma|^3/\varepsilon)$ samples. Finally, we show that for a specific tree $T$, with $\widetilde{O}(|\Sigma|^2n\varepsilon^{-1})$ samples from a distribution $P$ over $\Sigma^n$, one can efficiently learn the closest $T$-structured distribution in KL divergence by applying the add-1 estimator at each node.
Title: Deep Causal Inequalities: Demand Estimation in Differentiated Products Markets
Authors: Amandeep Singh (The Wharton School, University of Pennsylvania); Edvard Bakhitov (University of Pennsylvania); Jiding Zhang (NYU)
Abstract: Supervised machine learning algorithms fail to perform well in the presence of endogeneity in the explanatory variables. In this paper, we borrow from literature on partial identification to propose deep causal inequalities that overcomes this issue. Instead of relying on observed labels, the DeepCI estimator uses inferred inequalities from the observed behavior of agents in the data. This by construction can allow us to circumvent the issue of endogeneous explanatory variables in many cases. We provide theoretical guarantees for our estimator and demonstrate it is consistent under very mild conditions. We demonstrate through extensive simulations that our estimator outperforms standard supervised machine learning algorithms and existing partial identification methods.
Title: Parameterizing and Simulating from Causal Models
Author: Robin J. Evans (Oxford), Vanessa Didelez (Leibniz Institute for Prevention Research and Epidemiology)
Abstract: Many statistical problems in causal inference involve a probability distribution other than the one from which data are actually observed; as an additional complication, the object of interest is often a marginal quantity of this other probability distribution. This creates many practical complications for statistical inference, even where the problem is non-parametrically identified. For example, naive attempts to specify a model parametrically can lead to unwanted consequences such as incompatible parametric assumptions. As a consequence it is difficult to perform likelihood-based inference, or even to simulate from the model in a general way. We introduce the `frugal parameterization', which places the causal effect of interest at its centre, and then build the rest of the model around it. We do this in a way that provides a recipe for constructing a smooth, non-redundant parameterization using causal quantities of interest. In the case of discrete variables we use odds ratios to complete the parameterization, while in the continuous case we use copulas.<br/><br/>Our methods allow us to construct and simulate from models with parametrically specified causal distributions, and fit them using likelihood-based methods, including fully Bayesian approaches. Models we can fit and simulate from exactly include marginal structural models, structural nested models and instrumental variable models. Our proposal includes parameterizations for the average causal effect and effect of treatment on the treated. Our results will allow practitioners to test their methods against the best possible estimators for correctly specified models, in a way which has previously been impossible. We argue that thinking of causal models as marginal models may lead to many other breakthroughs of this kind.
Title: Quantum Causal Inference: An Entropic Approach
Authors: Mohammad Ali Javidian (Purdue University); Vaneet Aggarwal (Purdue University); Zubin Jacob (Purdue University)
Abstract: A direct generalization of the existing causal inference techniques to the quantum domain is not possible due to superposition and entanglement. We put forth a new theoretical framework for merging quantum information science and causal inference by exploiting entropic principles. First, we build the fundamental connection between the celebrated quantum marginal problem and entropic causal inference. Second, inspired by the definition of geometric quantum discord, we fill the gap between classical conditional probabilities and quantum conditional density matrices. These fundamental theoretical advances are exploited to develop a scalable algorithmic approach for quantum entropic causal inference. We apply our proposed framework to an experimentally relevant scenario of identifying message senders on quantum noisy links, where it is validated that the input before noise is the cause of noisy output. This successful inference on a synthetic quantum dataset can lay the foundations of identifying originators of malicious activity on future multi-node quantum networks.
Title: Dependency in DAG Models with Hidden Variables
Author: Robin J. Evans (Oxford)
Abstract: Directed acyclic graph models with hidden variables have been much studied, particularly in view of their computational efficiency and connection with causal methods. In this paper, we provide the circumstances under which it is possible for two variables to be identically equal, while all other observed variables stay jointly independent of them and mutually of each other. We find that this is possible if and only if the two variables are ‘densely connected’; in other words, if applications of identifiable causal interventions on the graph cannot (non-trivially) separate them. As a consequence of this, we can also allow such pairs of random variables to have any bivariate joint distribution that we choose. This has implications for model search since it suggests that we can reduce to only consider graphs in which densely connected vertices are always joined by an edge.
Title: Causal Markov Boundaries
Authors: Sofia Triantafillou (University of Pittsburgh); Fattaneh Jabbari (University of Pittsburgh); Gregory F. Cooper (University of Pittsburgh);
Abstract: Feature selection is an important problem in machine learning, which aims to select variables that lead to an optimal predictive model. In this paper, we focus on feature selection for post-intervention outcome prediction from pre-intervention variables. We are motivated by healthcare settings, where the goal is often to select the treatment that will maximize a specific patient’s outcome; however, we often do not have sufficient randomized controlled trial data to identify well the conditional treatment effect. We show how we can use observational data to improve feature selection and effect estimation in two cases: (a) using observational data when we know the causal graph and (b) when we do not know the causal graph but have observational and limited experimental data. Our paper extends the notion of Markov boundary to treatment-outcome pairs. We provide theoretical guarantees for the methods we introduce. In simulated data, we show that combining observational and experimental data improves feature selection and effect estimation.