Challenges in Causal Machine Learning and Its Applications

Abstract:

Causal machine learning promises improved generalizability and robustness as compared to conventional ML, including robustness under interventional distribution shifts seen in many decision-making scenarios. Causal ML, however, faces key challenges in practical deployments. For example, causal approaches require making crucial assumptions about a system or data-generating process. Not only is eliciting these causal assumptions from domain experts difficult, but many of these assumptions are also unverifiable in the absence of active experiments. This talk describes our perspective and research efforts to address such challenges.

Bio:

Emre Kıcıman is a Senior Principal Researcher at Microsoft Research, where his research interests span causal inference, machine learning, and AI’s implications for people and society.

Summary:

Traditional ML is not robust to shifts in the data generating process
A critical problem for decision-making applications, where actions change the data generating process
Example:
- Decide whether to irrigate on farms based on environment
- The historical record shows that soil moisture is high whenever weather is hot
- Recommend not to irrigate on hot days, which is wrong
- This happened because farmers usually irrigate on hot days, which creates a spurious correlation that was learned by the model
Example:
- The quality of an ad causes both how well it is laid out on the screen AND the probability of it being clicked
- In training data, quality and layout are not sampled IID
- Want a better dataset where they are independent
Causal ML: help humans add constraints to data based on their knowledge
- Assumptions encoded by missing edges, and their direction
- Relationships represent stable, independent mechanisms
- Graphs cannot be learned from just data
Challenge:
- Constraints come from external knowledge and datasets (often from experimental treatments)
- Challenging to validate counterfactuals, since they’re not observed
Case Study: spread of influenza
- Flu spreads in a particular spatial pattern of spread
- Want to understand drivers of epidemic spread
- Data sources: clinical, immunity, movement, weather, census
- Matched analysis: socioeconomically similar counties of flu origin, compare based on other drivers
  - Important factors: local travel, weather before flu origin (cold front hitting humid front)
  - Not factors: school start, holidays
  - Observation:
    - Identification of potential confounding factors by experts was critical
    - Including weather and antigenic diversity was critical for finding the difference
Case Study: causal factors of effects of psychiatric medication use (from social media, not randomized controlled trial)
- Initial observation was that older drugs were more effective than newer ones, which were supposed to be more effective
  - Possible explanation: older drugs are better, people are switching them, there are common causes for how effective the drugs are and them talking about it online
- Outcome was unsatisfactory because there were too many confounders in this process
Causal inference validation tools:
- Techniques to refute various assumptions
- Many are based on how sensitive the conclusions are to targeted perturbations of the dataset
- E.g. Unverifiable assumptions
  - Create synthetic data where the assumptions are set differently
    - Take the causal graph provided by experts
    - Add an extra synthetic confounder
    - Mutate the dataset for various ways the confounder can affect it
  - Use causal inference on those mutated datasets
  - Observe how sensitive conclusions were to these differences
Limitations:
- Causal inference relies on outside assumptions
- So there’s no silver bullet for validation of these models
Domain knowledge is hard to capture
- Need to educate experts about how to properly do this
- Causal drivers are often not understood
- Raw data are often not featurized into causally independent attributes
Reducing reliance on domain knowledge
- Look for ways to standardize it
- Reduce reliance on domain knowledge (invariant risk minimization, leverage small experiments for targeted causal knowledge)
- Seek new sources for knowledge
Case Study:
- Combine small random experiments with large log data
- Goal: improve click prediction for Bing ad system
- Counterfactual prediction problem: show new configurations of ads to users
- Their system predicted well for small changes in ads shown but performed poorly when the experiment was too far from training data
- Challenges:
  - Ad position is confounded with ad quality
  - Full experiments are very expensive, but can still do AB experiments for small sub-populations
  - Randomization breaks confounding
  - Resulting dataset is too small to train a good predictive model
- They have a large confounded dataset and a small unconfounded dataset
- Approach: Causal Transfer Random Forest
  - Instance of Honest decision tree: use part of data to learn tree structure, use rest to learn leaf regression functions
  - Their approach:
    - Decision tree structure learned from randomized sample
    - Leaves will have homogeneous causal dynamics
    - So train regression function in each leaf using the large confounded observational dataset
DoWhy library: Helps people capture and validate assumptions
- Modeling: create causal graph to encode assumptions
- Identification: formulate what to estimate
- Estimation of model
- Refutation: validate assumptions