Accepted Papers


research paper (4-6 pages)

Authors: Serge Assaad (Duke University); Shuxi Zeng (Duke University); Henry Pfister (Duke University); Fan Li (Duke University); Lawrence Carin Duke (CS)

Abstract: We examine interval estimation of the effect of a treatment T on an outcome Y given the existence of an unobserved confounder U. Using Hölder's inequality, we derive a set of bounds on the confounding bias |E[Y|T=t]-E[Y|do(T=t)]| based on the degree of unmeasured confounding (i.e., the strength of the connection U->T, and the strength of U->Y). These bounds are tight either when U⊥T or U⊥Y | T (when there is no unobserved confounding). We focus on a special case of this bound depending on the total variation distance between the distributions p(U) and p(U|T=t), as well as the maximum (over all possible values of U) deviation of the conditional expected outcome E[Y|U=u,T=t] from the average expected outcome E[Y|T=t]. We discuss possible calibration strategies for this bound to get interval estimates for treatment effects, and experimentally validate the bound using synthetic and semi-synthetic datasets.


research paper (4-6 pages)

Authors: Edvard Bakhitov (University of Pennsylvania); Amandeep Singh (The Wharton School)

Abstract: Recent advances in the literature have demonstrated that standard supervised learning algorithms are ill-suited for problems with endogenous explanatory variables. To correct for the endogeneity bias, many variants of nonparameteric instrumental variable regression methods have been developed. In this paper, we propose an alternative algorithm called boostIV that builds on the traditional gradient boosting algorithm and corrects for the endogeneity bias. The algorithm is very intuitive and resembles an iterative version of the standard 2SLS estimator. We demonstrate that our estimator is consistent under mild conditions and demonstrates an outstanding finite sample performance.


research paper (4-6 pages)

Authors: Keegan Harris (Carnegie Mellon University); Dung Daniel T Ngo (University of Minnesota); Logan Stapleton (University of Minnesota); Hoda Heidari (Carnegie Mellon University); Steven Wu (Carnegie Mellon University)

Abstract: In social domains, Machine Learning algorithms often prompt individuals to strategically modify their observable attributes to receive more favorable predictions. As a result, the distribution the predictive model is trained on may differ from the one it operates on in deployment. While such distribution shifts, in general, hinder accurate predictions, our work identifies a unique opportunity associated with shifts due to strategic responses: We show that we can use strategic responses effectively to recover causal relationships between the observable features and outcomes we wish to predict. More specifically, we study a game-theoretic model in which a principal deploys a sequence of models to predict an outcome of interest (e.g., college GPA) for a sequence of strategic agents (e.g., college applicants). In response, strategic agents invest efforts and modify their features for better predictions. In such settings, unobserved confounding variables (e.g., family educational background) can influence both an agent's observable features (e.g., high school records) and outcomes (e.g., college GPA). Therefore, standard regression methods (such as OLS) generally produce biased estimators. In order to address this issue, our work establishes a novel connection between strategic responses to machine learning models and instrumental variable (IV) regression, by observing that the sequence of deployed models can be viewed as an instrument that affects agents' observable features but does not directly influence their outcomes. Therefore, two-stage least squares (2SLS) regression can recover the causal relationships between observable features and outcomes.


  • Deep Causal Inequalities: Demand Estimation in Differentiated Products Markets

research paper (4-6 pages)

Authors: Edvard Bakhitov (University of Pennsylvania); Amandeep Singh (The Wharton School); Jiding Zhang (The Wharton School)

Abstract: Supervised machine learning algorithms fail to perform well in the presence of endogeneity in the explanatory variables. In this paper, we borrow from the literature on partial identification to propose deep causal inequalities that overcome this issue. Instead of relying on observed labels, the DeepCI estimator uses inferred inequalities from the observed behavior of agents in the data. This by construction can allow us to circumvent the issue of endogeneous explanatory variables in many cases. We provide theoretical guarantees for our estimator and demonstrate it is consistent under very mild conditions. We demonstrate through extensive simulations that our estimator outperforms standard supervised machine learning algorithms and existing partial identification methods.


research paper (4-6 pages)

Authors: Smitha Milli (UC Berkeley); Luca Belli (Twitter); Moritz Hardt (University of California, Berkeley)

Abstract: Online platforms regularly conduct randomized experiments to understand how changes to the platform causally affect various outcomes of interest. However, experimentation on online platforms has been criticized for having, among other issues, a lack of meaningful oversight and user consent. As platforms give users greater agency, it becomes possible to conduct observational studies in which users self-select into the treatment of interest as an alternative to experiments in which the platform controls whether the user receives treatment or not. In this paper, we conduct four large-scale within-study comparisons on Twitter aimed at assessing the effectiveness of observational studies derived from user self-selection on online platforms. In a within-study comparison, treatment effects from an observational study are assessed based on how effectively they replicate results from a randomized experiment with the same target population. We test the naive difference in group means estimator, exact matching, regression adjustment, and propensity score weighting while controlling for plausible confounding variables. In all cases, all observational estimates perform poorly at recovering the ground-truth estimate from the analogous randomized experiments. Our results suggest that observational studies derived from user self-selection are a poor alternative to randomized experimentation on online platforms. In discussing our results, we present a “Catch-22” that undermines the use of causal inference in these settings: we give users control because we postulate that there is no adequate model for predicting user behavior, but performing observational causal inference successfully requires exactly that.


position piece (4-6 pages)

Authors: Fredrik Savje (Yale University)

Abstract: A common assumption in causal inference is that random treatment assignment ensures that potential outcomes are independent of treatment, or in one word, unconfoundedness. This paper highlights that randomization and unconfoundedness are separate properties, and neither implies the other. A study with random treatment assignment does not have to be unconfounded, and a study with deterministic assignment can still be unconfounded. A corollary is that a propensity score is not the same thing as a treatment assignment probability. These facts should not be taken as arguments against randomization. The moral of this paper is that randomization is useful only when investigators know or can reconstruct the assignment process.


extended abstract (<1000 words)

Authors: Oliver J Maclaren (The University of Auckland)

Abstract: Here we discuss two common but, in our view, misguided assumptions in causal inference. The first assumption is that one requires potential outcomes, directed acyclic graphs (DAGs), or structural causal models (SCMs) for thinking about causal inference in statistics. The second is that identifiability of a quantity implies estimability of that quantity. These views are not universal, but we believe they are sufficiently common to warrant comment.


research paper (4-6 pages)

Authors: Yashas Annadani (ETH Zurich); Jonas Rothfuss (ETH); Alexandre Lacoste (Element AI); Nino Scherrer (ETH Zürich); Anirudh Goyal (University of Montreal); Yoshua Bengio (Mila); Stefan Bauer (MPI IS)

Abstract: Learning the causal structure that underlies data is a crucial step towards robust real-world decision making. The majority of existing work in causal inference focuses on determining a single directed acyclic graph (DAG) or a Markov equivalence class thereof. However, a crucial aspect to acting intelligently upon the knowledge about causal structure which has been inferred from finite data demands reasoning about its uncertainty. For instance, planning interventions to find out more about the causal mechanisms that govern our data requires quantifying epistemic uncertainty over DAGs. While Bayesian causal inference allows to do so, the posterior over DAGs becomes intractable even for a small number of variables. Aiming to overcome this issue, we propose a form of variational inference over the graphs of Structural Causal Models (SCMs). To this end, we introduce a parametric variational family modelled by an autoregressive distribution over the space of discrete DAGs. Its number of parameters does not grow exponentially with the number of variables and can be tractably learned by maximising an Evidence Lower Bound (ELBO). In our experiments, we demonstrate that the proposed variational posterior is able to provide a good approximation of the true posterior.


research paper (4-6 pages)

Authors: Ruibo Tu (KTH Royal Institute of Technology); Kun Zhang (Carnegie Mellon University); Hedvig Kjellström (KTH Royal Institute of Technology); Cheng Zhang (Microsoft)

Abstract: Recently, approaches based on Functional Causal Models (FCMs) have been proposed to determine causal direction between two variables, by restricting model classes; however, their performance is sensitive to the model assumptions. In this paper, we provide a dynamical-system view of FCMs and propose a new framework for identifying causal direction in the bivariate case. We first interpret FCMs in the bivariate case as an optimal transport problem under proper structural constraints. By exploiting the dynamical interpretation of optimal transport, we then derive the underlying time evolution of static cause-effect pair data under the least action principle. It provides a new dimension for describing static causal discovery tasks, while enjoying more freedom for modeling the quantitative causal influences. In particular, we show that additive noise models correspond to volume-preserving pressureless flows. Consequently, based on their velocity field divergence, we derive a criterion to determine causal direction. With this criterion, we propose a novel optimal transport based algorithm which is robust to the choice of models. Our method demonstrated promising results on both synthetic and real cause-effect pair datasets.


research paper (4-6 pages)

Authors: Alicia Curth (University of Cambridge); Mihaela van der Schaar (University of Cambridge)

Abstract: The machine learning toolbox for estimation of heterogeneous treatment effects from observational data is expanding rapidly, yet many of its algorithms have been evaluated only on a very limited set of semi-synthetic benchmark datasets. In this paper, we show that even in arguably the simplest setting -- estimation under ignorability assumptions -- the results of such empirical evaluations can be misleading if (i) the assumptions underlying the data-generating mechanisms in benchmark datasets and (ii) their interplay with baseline algorithms are inadequately discussed. We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators -- the IHDP and ACIC2016 datasets -- in detail. We identify problems with their current use and highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others -- a fact that is rarely acknowledged but of immense relevance for interpretation of empirical results. We close by discussing implications and possible next steps.


research paper (4-6 pages)

Authors: Nihal Sharma (The University of Texas at Austin); Soumya Basu (Google); Karthikeyan Shanmugam (IBM Research NY); Sanjay Shakkottai (University of Texas at Austin)

Abstract: We study the problem of Multi-Armed Bandits with mean bounds where each arm is associated with an interval in which its mean reward lies. We develop the GLobal Under-Explore (GLUE) algorithm which, for each arm, uses these intervals to infer ``pseudo-variances'' that instruct the rate of exploration. We provide regret guarantees for GLUE and show that it is never worse than the standard Upper Confidence Bound Algorithm. Further, we show regimes in which GLUE improves upon existing regret guarantees for structured bandit problems. Finally, we present the practical setting of learning adaptive interventions using prior confounded data in which unrecorded variables affect rewards. We show that mean bounds for each intervention can be extracted from such logs and can thus be used to improve the learning process. We also provide semi-synthetic experiments on real-world data sets to validate our findings.


research paper (4-6 pages)

Authors: Masahiro Kato (Cyberagent); Shota Yasui (Cyberagent); Kenichiro McAlinn (Temple University)

Abstract: We consider policy evaluation with dependent samples gathered from adaptive experiments. To deal with the dependency, existing studies, such as van der Laan (2008), proposed estimators including an inverse probability weight, whose score function has a martingale property. However, these estimators require the true logging policy (the probability of choosing an action) for using the martingale property. To mitigate this neglected assumption, we propose the doubly robust (DR) estimator, which consists of two nuisance estimators of the conditional mean outcome and the logging policy, for the dependent samples. To obtain an asymptotically normal semiparametric estimator from dependent samples without Donsker nuisance estimators and martingale property, we propose adaptive-fitting as a variant of sample-splitting proposed by Chernozhukov et al. (2018) for independent and identically distributed samples. We confirm the empirical performance through simulation studies and report that the DR estimator also has a stabilization effect.


research paper (4-6 pages)

Authors: Andrew Jesson (University of Oxford); Panagiotis Tigas (Oxford University); Joost van Amersfoort (University of Oxford); Andreas Kirsch (University of Oxford); Uri Shalit (Technion); Yarin Gal (University of Oxford)

Abstract: Estimating personalized treatment effects from high-dimensional observational data is essential in situations where experimental designs are infeasible, unethical or expensive. Existing approaches rely on fitting deep models on outcomes observed for treated and control populations, but when measuring the outcome for an individual is costly (e.g. biopsy) a sample efficient strategy for acquiring outcomes is required. Deep Bayesian active learning provides a framework for efficient data acquisition by selecting points with high uncertainty. However, naive application of existing methods selects training data that is biased toward regions where the treatment effect cannot be identified because there is non-overlapping support between the treated and control populations. To maximize sample efficiency for learning personalized treatment effects, we introduce new acquisition functions grounded in information theory that bias data acquisition towards regions where overlap is satisfied, by combining insights from deep Bayesian active learning and causal inference. We demonstrate the performance of the proposed acquisition strategies on synthetic and semi-synthetic datasets IHDP and CMNIST and their extensions which aim to simulate common dataset biases and pathologies.


research paper (4-6 pages)

Authors: Pablo Morales-Alvarez (Universidad de Granada); Angus Lamb (Microsoft); Simon Woodhead (Eedi); Simon Pyton Jones (Microsoft); Miltiadis Allamanis (MSR Cambridge); Cheng Zhang (Microsoft)

Abstract: Missing values constitute an important challenge in real-world machine learning for both prediction and causal discovery tasks. However, only few methods in causal discovery can handle missing data in an efficient way, while existing imputation methods are agnostic to causality. In this work we propose VICAUSE, a novel approach to simultaneously tackle missing value imputation and causal discovery efficiently with deep learning. Particularly, we propose a generative model with a structured latent space and a graph neural network-based architecture, scaling to large number of variables. Moreover, our method can discover relationship between groups of variables which is useful in many real-world applications.

VICAUSE shows improved performance compared to popular and recent approaches in both missing value imputation and causal discovery.


research paper (4-6 pages)

Authors: Jonathan Y Huang (Singapore Institute for Clinical Sciences)

Abstract: Background Exploratory null-hypothesis significance testing (e.g. GWAS, EWAS) form the backbone of molecular mechanism discovery, however methods to identify true causal signals are underdeveloped. We evaluate two negative control approaches to quantitatively control for shared unmeasured confounding and recover unbiased effects using epigenomic data and biologically-informed structural assumptions.

Methods We consider the application of the control outcome calibration approach (COCA) and proximal g-computation (PGC) to case studies in reproductive genomics. COCA may be employed when maternal epigenome has no direct effects on phenotype and proxy shared unmeasured confounders and PG further with suitable genetic instruments (e.g. mQTLs). Baseline covariates were extracted from 777 mother-child pairs in a birth cohort with maternal blood and fetal cord DNA methylation array data. Treatment, negative control, and outcome values were simulated in 2000 bootstraps under a plasmode simulation framework. Bootstrapped, ordinary (COCA) and 2-stage (PGC) least squares were fitted to estimate treatment effects and standard errors under various settings of missing confounders (e.g. paternal data). Regression adjustment and a naive application of doubly-robust, ensemble learning efficient estimators were compared.

Results COCA and PGC performed well in simplistic data generating processes. However, in real-world cohort simulations, COCA performed acceptably only in settings with strong proxy confounders, but otherwise poorly (median bias 610%; coverage 29%). PGC performed slightly better. Alternatively, simple covariate adjustments generally outperformed all others in bias and confidence interval coverage across scenarios (median bias 22%; 71% coverage).

Discussion Molecular epidemiology provides key opportunity to leverage biological knowledge against unmeasured confounding, but these identification strategies are underutilized and understudied in this context. Negative control calibration or adjustments may help under limited scenarios where assumptions are fulfilled, but should be tested with simulations closer to real-world conditions.


research paper (4-6 pages)

Authors: Brian G Vegetabile (RAND Corporation)

Abstract: Recent years have seen a swell in methods that focus on estimating ``individual treatment effects''. These methods are often focused on the estimation of heterogeneous treatment effects under ignorability assumptions. This paper hopes to draw attention to the fact that there is nothing necessarily ``individual'' about such effects under ignorability assumptions and isolating individual effects may require additional assumptions. Such individual effects, more often than not, are more precisely described as ``conditional average treatment effects'' and confusion between the two has the potential to hinder advances in personalized and individualized effect estimation.


  • Signal Manipulation and the Causal Status of Race

extended abstract (<1000 words)

Authors: Naftali Weinberger (Munich Center for Mathematical Philosophy)


research paper (4-6 pages)

Authors: Philippe Brouillard (Mila, Université de Montréal); Perouz Taslakian (Element AI); Alexandre Lacoste (ServiceNow); Sebastien Lachapelle (Mila, Université de Montréal); Alexandre Drouin (Element AI, a ServiceNow company)

Abstract: Causal discovery from observational data is a challenging task to which an exact solution cannot always be identified. Under assumptions about the data-generative process, the causal graph can often be identified up to an equivalence class. Proposing new realistic assumptions to circumscribe such equivalence classes is an active field of research. In this work, we propose a new set of assumptions that constrain possible causal relationships based on the nature of the variables. We thus introduce typed directed acyclic graphs, in which variable types are used to determine the validity of causal relationships. We demonstrate, both theoretically and empirically, that the proposed assumptions can result in significant gains in the identification of the causal graph.


  • Understanding the Role of Prognostic Factors and Effect Modifiers in Heterogeneity of Treatment Effect using a Within-Subjects Analysis of Variance

research paper (4-6 pages)

Authors: Rianne M Schouten (Eindhoven University of Technology); Mykola Pechenizkiy (TU Eindhoven)

Abstract: Personalized Evidence-Based Medicine (EBM) aims to estimate patient specific causal effects using covariate information. In order to adequately estimate these Individual Treatment Effects (ITEs), a thorough understanding of the role of covariates in heterogeneous datasets is necessary. In this preliminary work, we distinguish prognostic factors that influence the outcome variable, from effect modifiers, which influence the treatment effect. By means of a small synthetic data experiment where we temporarily disregard the fundamental problem of causal inference, we evaluate within-subjects variance for three possible distributions of ITEs, while keeping the Average Treatment Effect (ATE) fixed. The hypothetical nature of the experiment allows us to further understand the role of prognostic factors and effect modifiers in estimating ATEs and ITEs.


research paper (4-6 pages)

Authors: Sebastien Lachapelle (Mila, Université de Montréal); Pau Rodriguez (Element AI); Remi Le Priol (Mila, Université de Montréal); Alexandre Lacoste (ServiceNow); Simon Lacoste-Julien (Université de Montréal)

Abstract: It can be argued that finding an interpretable low-dimensional representation of a potentially high-dimensional phenomenon is central to the scientific enterprise. Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application. This work proposes mechanism sparsity regularization as a new principle to achieve nonlinear ICA when latent factors depend sparsely on observed auxiliary variables and/or past latent factors. We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse and if some graphical criterion is satisfied by the data generating process. As a special case, our framework shows how one can leverage unknown-target interventions on the latent factors to disentangle them, thus drawing further connections between ICA and causality. We validate our theoretical results with toy experiments.


research paper (4-6 pages)

Authors: Alexander Franks (UCSantaBarbara); Jiajing Zheng (University of California, Santa Barbara)

Abstract: In Bayesian causal inference for partially identified parameters, there is a delicate balance between parameterizing models in terms of the fully identified and unidentified parameters directly versus modeling the parameters of primary scientific interest. We explore the challenges of Bayesian inference for partially identified models in the context of multi-treatment causal inference with unobserved confounding in the linear model, where the treatment effects are partially identified. We demonstrate how carefully chosen priors can be used to incorporate additional scientific assumptions which further constrain the set of causal conclusions, and describe how our approach can be used assess robustness and sensitivity of the outcomes. We illustrate our approach to multi-treatment causal inference in an example quantifying the effect of gene expression levels on mouse obesity.


research paper (4-6 pages)

Authors: Garima Gupta (Tata Consultancy Services); Lovekesh Vig (Innovation Labs, Tata Consultancy Services Limited); Gautam Shroff (Tata Consultancy Services Ltd.)

Abstract: Medical professionals evaluating alternative treatment plans for a patient often encounter time varying confounders, or covariates that affect both the future treatment assignment and the patient outcome. The recently proposed Counterfactual Recurrent Network (CRN) accounts for time varying confounders by using adversarial training to balance recurrent historical representations of patient data. However, this work assumes that all time varying covariates are confounding and thus attempts to balance the full state representation. Given that the actual subset of covariates that may in fact be confounding is in general unknown, recent work on counterfactual evaluation in the static, non-temporal setting has suggested that disentangling the covariate representation into separate factors, where each either influence treatment selection, patient outcome or both can help isolate selection bias and restrict balancing efforts to factors that influence outcome, allowing the remaining factors which predict treatment without needlessly being balanced. We hypothesize that such disentanglement should be possible in the temporal setting as well, and would be beneficial when dealing with time varying confounders. We propose DRTCI, a model for temporal causal inference which uses a recurrent neural network to learn hidden representation of the patient's evolving covariates that disentangles into three factors that each causally determine either treatment, outcome or both treatment and outcome. The model is evaluated on the same simulated model of tumour growth used to evaluate the CRN, with varying degrees of time-dependent confounding. The resulting outcome predictions from DRTCI significantly outperform the predictions from existing baselines especially for cases with high confounding and minimal historical data (early prediction). Ablation experiments are additionally performed to identify the key contributing factors to the performance of DRTCI.


research paper (4-6 pages)

Authors: Tony Liu (University of Pennsylvania); Lyle Ungar (University of Pennsylvania); Konrad Kording (Upenn)

Abstract: When conducting instrumental variable studies, practitioners may exclude units in data processing prior to estimation. This exclusion step is critical for the study design but is often neglected. Here we view this problem as a well-defined tradeoff between statistical power and external validity, which can be navigated with a data driven strategy. Our method estimates the probability of units being compliant and increases statistical power by excluding units with low compliance probability. This data-driven exclusion criterion can help navigate the tradeoff between power and external validity for many quasi-experimental settings.


  • Nonparametric identification is not enough, but randomized controlled trials are [contributed talk]

position piece (4-6 pages)

Authors: P M Aronow (Yale University); Theo Saarinen (University of California Berkeley); Jasjeet Sekhon (Yale University)

Abstract: We argue that randomized controlled trials (RCTs) are special even among settings where average treatment effects are identified by a nonparametric unconfoundedness assumption. We argue that this claim follows from two results of Robins and Ritov (1997): (1) with at least one continuous covariate control, no estimator of the average treatment effect exists which is uniformly consistent without further assumptions, (2) knowledge of the propensity score yields a consistent estimator and confidence intervals at parametric rates, regardless of how complicated the propensity score function is. We emphasize the latter point, and note that successfully-conducted RCTs provide knowledge of the propensity score to the researcher. We discuss modern developments in covariate adjustment for RCTs, noting that statistical models and machine learning methods can be used to improve efficiency while preserving finite sample unbiasedness. We conclude that statistical inference may be fundamentally more difficult in observational settings than it is in RCTs, even when all confounders are measured.


research paper (4-6 pages)

Authors: David A Bruns-Smith (UC Berkeley)

Abstract: We explore the conditions necessary to guarantee sharp upper bounds on the mean squared error when estimating mean counterfactual outcomes from observational data. In particular, we analyze the large family of designed-based weighting estimators which include balancing weights and matching. Beginning from the bias-variance decomposition, we argue that assumptions have to be made about the outcome function in order to choose a high performance estimator. For a theoretical framework, we use integral probability metrics and $\phi$-divergences to analyze the bias-variance trade-off. Finally, we consider conditions under which our mean squared error bounds are robust to failure of our assumptions.


research paper (4-6 pages)

Authors: Duligur Ibeling (Stanford University); Thomas Icard (Stanford University)

Abstract: As an approach to the workshop theme of causal assumptions, we offer a topological learning-theoretic perspective on causal inference by introducing a series of topologies defined on general spaces of structural causal models (SCMs). To illustrate the power of the framework we prove a topological causal hierarchy theorem, showing that substantive assumption-free causal inference is possible only in a meager set of SCMs. Thanks to a correspondence between open sets in the weak topology and statistically verifiable hypotheses, our results show that inductive assumptions sufficient to license valid causal inferences are statistically unverifiable in principle. Similar to no-free-lunch theorems for statistical inference, the present results clarify the inevitability of substantial assumptions for causal inference. We furthermore suggest that the framework may be helpful for the positive project of exploring and assessing alternative causal-inductive assumptions.


research paper (4-6 pages)

Authors: Gokul Swamy (Carnegie Mellon University); Sanjiban Choudhury (Aurora Innovation); Drew Bagnell (); Steven Wu (Carnegie Mellon University)

Abstract: We derive and implement nonlinear extensions of the classical instrumental variable regression (IVR) technique. Our key insight is that even in the nonlinear setting, finding a causally consistent estimate of a structural equation is equivalent to satisfying constraints on conditional outcome moments. This insight allows us to leverage standard constrained optimization techniques to reframe the work of Dikkala et al. as optimizing a regularized Lagrangian and reveal underlying smoothness assumptions. We then propose a new algorithm, CausAL, that instead optimizes an augmented Lagrangian, requiring a different definition of smoothness and no adversarial training. We then extend our method to handle matching outcome distributions instead of just expected values, propose an efficient no-regret procedure, and implement a practical realization via a modification of an Integral Probability Metric (IPM) GAN which we call ACADIMI.


position piece (4-6 pages)

Authors: Amit Sharma (Microsoft Research); Vasilis Syrgkanis (Microsoft Research); cheng zhang (micorosft research); Emre Kiciman (Microsoft Research)

Abstract: Estimation of causal effects involves crucial assumptions about the data-generating process, such as directionality of effect, presence of instrumental variables or mediators, and whether all relevant confounders are observed. Violation of any of these assumptions leads to significant error in the effect estimate. However, unlike cross-validation for predictive models, there is no global validator method for a causal estimate. As a result, expressing different causal assumptions formally and validating them (to the extent possible) becomes critical for any analysis. We present DoWhy, a framework that allows explicit declaration of assumptions through a causal graph and provides multiple validation tests to check a subset of these assumptions. Our experience with DoWhy highlights a number of open questions for future research: developing new ways beyond causal graphs to express assumptions, the role of causal discovery in learning relevant parts of the graph, and developing validation tests that can better detect errors, both for average and conditional treatment effects. DoWhy is available at https://github.com/microsoft/dowhy.


position piece (4-6 pages)

Authors: Sonali Parbhoo (Harvard University); Shalmali Joshi (Harvard University (SEAS)); Finale Doshi-Velez (Harvard)

Abstract: Assessing the effects of deploying a policy based on retrospective data collected from a different policy is a common problem across several high-stake decision making domains. A number of off-policy evaluation (OPE) techniques have been proposed for this purpose with different bias-variance tradeoffs. However, these methods largely formulate OPE as a problem disassociated from the process used to generate the data. Posing OPE instead as a causal estimand has strong implications ranging from our fundamental understanding of the complexity of the OPE problem to which methods we apply in practice, and can help highlight gaps in existing literature in terms of the overall objective of OPE. Many formalisms of OPE additionally overlook the role of uncertainty entirely in the estimation process, which can significantly bias the estimation of counterfactuals and produce large errors in OPE as a result. Finally, depending on how we formalise OPE, human expertise can be particularly helpful in assessing the validity of OPE estimates or improving estimation from a finite number of samples to achieve certain efficiency guarantees. In this position paper, we discuss each of these issues in terms of the role they play on OPE. Importantly, each of these aspects may be viewed as a means of assessing the validity of various other common assumptions made in causal inference.


research paper (4-6 pages)

Authors: Konstantin Genin (University of Tübingen)

Abstract: Since Spirtes et al. (2000), it is well known that if causal relationships are linear and noise terms are independent and Gaussian, causal orientation is not identified from observational data — even if causal faithfulness is satisfied. Shimizu et al. (2006) showed that linear, non-Gaussian (LiNGAM) causal models are identified from observational data, so long as no latent confounders are present. That holds even when faithfulness fails. Genin and Mayo-Wilson (2020) refine that identifiability result: not only are causal relationships identified, but causal orientation is statistically decidable. That means that for every α > 0, there is a method that converges in probability to the correct orientation and, at every sample size, outputs an incorrect orientation with probability less than α.These results naturally raise questions about what happens in the presence of latent confounders. Hoyer et al. (2008) and Salehkaleybar et al. (2020) show that, although the causal model is not uniquely identified, causal orientation among observed variables is identified in the presence of latent confounders, so long as faithfulness is satisfied. This paper refines these results. When we allow for the presence of latent confounders, causal orientation is no longer statistically decidable. Although it is possible to converge in probability to the correct orientation, it is not possible to do so with finite-sample bounds on the probability of orientation errors. That is true even if causal faithfulness is satisfied.


research paper (4-6 pages)

Authors: Michel Besserve (MPI for Intelligent Systems, Tübingen); Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)

Abstract: Complex systems often contain feedback loops, that can be described as cyclic causal models. Intervening in such systems may lead to counterintuitive effects, which cannot be inferred directly from the graph structure. After establishing a framework for differentiable interventions based on Lie groups, we take advantage of modern automatic differentiation techniques and their application to implicit functions in order to optimize interventions in cyclic causal models. We illustrate the use of this framework by investigating scenarios of transition to sustainable economies.


extended abstract (<1000 words)

Authors: Bernard Koch (UCLA)

Abstract: This abstract describes a survey on deep causal estimators for a social science audience. While the machine learning community has moved quickly to leverage causal reasoning to improve predictive models, adoption of deep learning has been slower in areas of science that prioritize interpretability and robust evidence of causality for inference (e.g., epidemiology, social science, social statistics). Here we summarize deep learning models that adjust for confounding in creative ways (e.g., representation learning and generative modeling) to estimate/predict unbiased treatment effects, and/or extend causal inference beyond tabular data to text and networks. We discuss the strengths and weaknesses of these models from an applied social science perspective, and how the machine learning community might better frame/support their contributions to increase adoption by social and data scientists. Finally, we provide step-by-step tutorials for implementing algorithms, training them, and performing model selection in Tensorflow. Tutorials are available at https://github.com/kochbj/Deep-Learning-for-Causal-Inference.