Estimating the Long-Term Effects of Novel Treatments using Econ-ML

Greg Lewis, Amazon and Vasilis Syrgkanis, Stanford

Abstract: Policy makers typically face the problem of wanting to estimate the long-term effects of novel treatments, while only having historical data of older treatment options. We assume access to a long-term dataset where only past treatments were administered and a short-term dataset where novel treatments have been administered. We propose a surrogate based approach where we assume that the long-term effect is channeled through a multitude of available short-term proxies. Our work combines three major recent techniques in the causal machine learning literature: surrogate indices, dynamic treatment effect estimation and double machine learning, in a unified pipeline. We show that our method is consistent and provides root-n asymptotically normal estimates under a Markovian assumption on the data and the observational policy. We use a data-set from a major corporation that includes customer investments over a three year period to create a semi-synthetic data distribution where the major qualitative properties of the real dataset are preserved. We evaluate the performance of our method and discuss practical challenges of deploying our formal methodology and how to address them.

Bios:

Greg Lewis: I am an economist who works on industrial organization, market design, applied econometrics and machine learning. My work is unified by the twin goals of making better sense of microeconomic data, and using those insights to optimize firm decision making and improve market performance. My research has spanned a range of industries – online retailing, online advertising, procurement, electricity, education – and has been published in top economics and management journals.

Vasilis Syrgkanis: I am an Assistant Professor in the Management Science and Engineering Department, in the School of Engineering at Stanford University. I am an active member of the Stanford Operations Research Group, the Statistical Machine Learning Group, the CS Theory Group and affiliated with the Stanford AI Lab (SAIL). My research interests are in the areas of machine learning, causal inference, econometrics, game theory/mechanism design and algorithm design.

Until August 2022, I was a Principal Researcher at Microsoft Research, New England, where I was also co-leading the project on Automated Learning and Intelligence for Causation and Economics (ALICE) and was a member of the EconCS and StatsML groups. I received my Ph.D. in Computer Science from Cornell University, where I had the privilege to be advised by Eva Tardos and then spent two years as a postdoc researcher at Microsoft Research, NYC, as part of the Algorithmic Economics and the Machine Learning groups. I obtained my diploma in EECS at the National Technical University of Athens, Greece.

Summary:

Focus of EconML: https://github.com/microsoft/EconML
- Estimation of conditional treatment effects
- Given customer attributes, predict outcomes of treatment
- A common infrastructure and interface for estimation libraries
Traditional process: Decision maker goes to economist to set up the inference problem and then code up the estimation algorithm
New process:
- Decisionmaker goes to data scientist
- Data scientist
  - Sets up problem using DoWhy (https://microsoft.github.io/dowhy) and EconML
  - Estimates treatment effect using EconML
Challenge:
- Automation of estimation procedure and confidence intervals
- Post-estimation analyses
Canonical causal treatment model
- Outcome_i = f(attributes of unit i that modify the treatment effect) * treatment_i + g(other features of unit i) + noise
One application of EconML
- Training the model for individual treatment effects will indicate which attributes of units are most relevant for different sub-groups
  - E.g. train tree that has branches for different sub-groups
- Can’t tell us which data is missing (e.g. if some unobserved attributes are affecting treatment)
  - Can assume all confounders are observer: good, control for them all
    - Make sure that if the impact is non-linear, they’re modeled using the correct function
    - EconML can help with this since it includes many non-linear estimators
EconML includes various algorithms for causal treatment effect estimation, such as:
- Double ML
  - Project treatments and outcomes onto confounders
  - Then work with the residuals of those models
  - i.e. train a model to predict treatment/outcome and then focus on deviations of treatment/outcome from expectation
- Instrumental Variables
  - Randomly assigned
  - Affect treatment
  - Do not affect outcome independently from treatment
  - Used to unbias treatments that are applied in biased way
Recommended flow:
- Use linear regression or other interpretable models as initial step to help debugging
- Then train more flexible models to make sure you flexibly capture the system’s dynamics
- Finally, post-process model’s predictions using interpretable models to communicate with stakeholders
Use-case:
- Estimation of long-term effects via surrogates
- Find some short-term surrogates that indicate long-term behavior
  - E.g. use historical data to see how much/which history is required to make long-term forecasts (that history is the surrogate)
  - Challenge: history includes many different interventions, which don’t correspond to the dynamics and treatments we care about
- To handle this they explicitly model the process in which the treatment->surrogate->outcome process evolves over time
  - So they want to understand the effect of one treatment, ignoring the effects of subsequent treatments, which may depend on the current treatment