Challenges in Causal Machine Learning and Its Applications
Emre Kiciman @ Microsoft
Abstract:
Causal machine learning promises improved generalizability and robustness as compared to conventional ML, including robustness under interventional distribution shifts seen in many decision-making scenarios. Causal ML, however, faces key challenges in practical deployments. For example, causal approaches require making crucial assumptions about a system or data-generating process. Not only is eliciting these causal assumptions from domain experts difficult, but many of these assumptions are also unverifiable in the absence of active experiments. This talk describes our perspective and research efforts to address such challenges.
Bio:
Emre Kıcıman is a Senior Principal Researcher at Microsoft Research, where his research interests span causal inference, machine learning, and AI’s implications for people and society.
Summary:
Traditional ML is not robust to shifts in the data generating process
A critical problem for decision-making applications, where actions change the data generating process
Example:
Decide whether to irrigate on farms based on environment
The historical record shows that soil moisture is high whenever weather is hot
Recommend not to irrigate on hot days, which is wrong
This happened because farmers usually irrigate on hot days, which creates a spurious correlation that was learned by the model
Example:
The quality of an ad causes both how well it is laid out on the screen AND the probability of it being clicked
In training data, quality and layout are not sampled IID
Want a better dataset where they are independent
Causal ML: help humans add constraints to data based on their knowledge
Assumptions encoded by missing edges, and their direction
Relationships represent stable, independent mechanisms
Graphs cannot be learned from just data
Challenge:
Constraints come from external knowledge and datasets (often from experimental treatments)
Challenging to validate counterfactuals, since they’re not observed
Case Study: spread of influenza
Flu spreads in a particular spatial pattern of spread
Want to understand drivers of epidemic spread
Data sources: clinical, immunity, movement, weather, census
Matched analysis: socioeconomically similar counties of flu origin, compare based on other drivers
Important factors: local travel, weather before flu origin (cold front hitting humid front)
Not factors: school start, holidays
Observation:
Identification of potential confounding factors by experts was critical
Including weather and antigenic diversity was critical for finding the difference
Case Study: causal factors of effects of psychiatric medication use (from social media, not randomized controlled trial)
Initial observation was that older drugs were more effective than newer ones, which were supposed to be more effective
Possible explanation: older drugs are better, people are switching them, there are common causes for how effective the drugs are and them talking about it online
Outcome was unsatisfactory because there were too many confounders in this process
Causal inference validation tools:
Techniques to refute various assumptions
Many are based on how sensitive the conclusions are to targeted perturbations of the dataset
E.g. Unverifiable assumptions
Create synthetic data where the assumptions are set differently
Take the causal graph provided by experts
Add an extra synthetic confounder
Mutate the dataset for various ways the confounder can affect it
Use causal inference on those mutated datasets
Observe how sensitive conclusions were to these differences
Limitations:
Causal inference relies on outside assumptions
So there’s no silver bullet for validation of these models
Domain knowledge is hard to capture
Need to educate experts about how to properly do this
Causal drivers are often not understood
Raw data are often not featurized into causally independent attributes
Reducing reliance on domain knowledge
Look for ways to standardize it
Reduce reliance on domain knowledge (invariant risk minimization, leverage small experiments for targeted causal knowledge)
Seek new sources for knowledge
Case Study:
Combine small random experiments with large log data
Goal: improve click prediction for Bing ad system
Counterfactual prediction problem: show new configurations of ads to users
Their system predicted well for small changes in ads shown but performed poorly when the experiment was too far from training data
Challenges:
Ad position is confounded with ad quality
Full experiments are very expensive, but can still do AB experiments for small sub-populations
Randomization breaks confounding
Resulting dataset is too small to train a good predictive model
They have a large confounded dataset and a small unconfounded dataset
Approach: Causal Transfer Random Forest
Instance of Honest decision tree: use part of data to learn tree structure, use rest to learn leaf regression functions
Their approach:
Decision tree structure learned from randomized sample
Leaves will have homogeneous causal dynamics
So train regression function in each leaf using the large confounded observational dataset
DoWhy library: Helps people capture and validate assumptions
Modeling: create causal graph to encode assumptions
Identification: formulate what to estimate
Estimation of model
Refutation: validate assumptions