PO’d

In his excellent text, Causal Inference: The mixtape, Scott Cunningham, introduces the potential outcomes model. This model is the heart and soul of thinking about treatment effects and causality. While this model has mainly been used to justify the use of the average treatment effect estimand and randomized control trials. The model also allows us to think about richer estimands like Kolmogorov bounds and more realistic estimands like Manski bounds. It also allows us to think about how structural assumptions may affect the tightness of bounds and the joint distribution of potential outcomes.

Scott illustrates potential outcomes with a problem that turns out to be close to home. When I was diagnosed with stage 3 colon cancer, the question came up about whether or not I should get radiotherapy. I had two potential outcomes, probability of 5 year survival with just chemotherapy or 5 year survival with chemo and radiation. My surgeon was for radiotherapy. My oncologist was against. My radio oncologist, was in an interesting position. My oncologist was his boss, but obviously he was pretty fond of radiotherapy.

I had one very bad day. My surgeon called and told me I was an idiot for listening to my oncologist. I called my oncologist who said he would talk with the surgeon. Meanwhile the assistant of my second-opinion oncologist called to discuss my appointment. Boy. Did she get an earful! I told her I didn’t want my second opinion’s opinion. I wanted him to walk through the issues. My case eventually went to a “tumor board.” This is a group made up of oncologists, surgeons and radio oncologists that discuss cases and give recommendations. My oncologist reported back on the discussion of the board and the eventual vote - oncologists were generally against radio therapy and radio oncologists were for, while surgeons were split.


Potential outcomes were not some theoretical construct. We are discussing life or death, my life or death.

Table of (made up) potential outcomes from Causal Inference: the mixtape. The first column are the patient "names." The second column is the potential outcome with surgery (in years of survival). The third is the potential outcome with chemo and the fourth is the difference.

The treatment effect is this last column. We see that it is positive for some people and negative for others. We cannot in general observe the potential outcomes for an individual, that’s why they are “potential.” In the magic of Scott's imagination we can write down the potential outcomes for a group of imaginary people. So in this imaginary world we know the treatment effect.

First off, lets assume that we have data from a randomized controlled trial. Let's assign 1-5 surgery and 6-10 chemo. So imagine we cannot observe first five elements of the third column and last five elements of the second column. Nor any elements of the fourth column. Now we are in the real world!


In this case we can estimate the average treatment effect. This is actually a pretty cool result. We cannot observe the treatment effect, but somehow we can determine the average. The average outcome is 28/5 for surgery and it is 32/5 for chemo. This gives a treatment effect of 0.8 years in favor of chemo. Note that in Scott's analysis he uses the full data set rather than the sample I created. This leads to somewhat different results.

We can also estimate the distribution of outcomes for each of the treatment. Here we see that the distribution of outcomes cross. I discuss this issue in a previous post. The crossing of these curves points to a couple of issues. It suggests that there are heterogenous treatment effects. Now if we could observe all the imaginary data, we would know that the treatment effect varies. But even with our little "RCT" we know that the treatment effect is heterogenous. The crossing excludes the possibility that the treatment effect is 0 or constant. Of course we are using 5 data points to estimate each curve, so they may not accurately reflect the true distributions.

The outcome is the number of years until death after treatment. Blue is surgery and red is chemo. Which would you choose? Note that the more the curve is pushed to the right, the greater the probability of living longer. In cancer, we usually present these in the negative: The probability of being alive at each point in time.

Kolmogorov Bounds

We know that we don't know the treatment effect from real data. We can never observe the fourth column. The Soviet mathematician, Andrey Kolmogorov, conjectured that we can bound the distribution of the treatment effect from the probability distributions above. It turns out his conjecture is correct. These bounds are simple to calculate although formula is a little intimidating. Let's say we want to know the probability that the treatment effect is 0. We may hypothesize that this probability is 1. To bound it, look at the chart and find the maximum vertical distance, this is 0.4 (at 5 years) and the minimum vertical distance, this is -0.2 (at 2 years). The bounds on the probability that the treatment effect is 0, is [0.4, 0.8], where the upper bound is 1-0.2 = 0.8.

The chart shows the bounds on the distribution of the treatment effect from the "RCT" data. We see that for some people there is a probability that the treatment effect is 0, but for some people it is definitely negative, while for others it is definitely positive.

We also see that the bounds can be quite tight for some cases.

Again. These bounds are based on ESTIMATED outcome distribution functions and so are themselves estimated.

Manski Bounds

In Scott’s write up he considers an example where treatment allocation is determined by an all-knowing physician. The treatment is allocated to the higher outcome. This time we don’t know the distribution of outcomes for each treatment. While we, of course, observe outcomes for the chemo treatment. This distribution is not the one would see if everyone received chemo.


Its not that we don’t know anything at all. Probabilities are between 0 and 1. This gives us bounds on the treatment outcome distribution. Judea Pearl calls these "natural bounds." Manski calls them "worst-case bounds."

We can then bound the treatment effect distribution. These bounds are wide and frankly not informative. But this reflects the information available in the data.

Bounds with Structural Assumptions

In Scott’s example we have that the treatment allocation is not random. In fact it is very specific. Each individual gets allocated to the treatment with the better outcome. What if we knew this? We still don’t know the counter-factual outcome, but we can now bound each individual outcome.

The chart presents the bounds on outcome distributions when we can actually bound the outcomes. In general, we don't know the counter-factual outcome. But here, if we know how the data is generated then we can bound the outcome for each patient. We know that the outcome for the treatment not received was worse. It led to fewer years of life.

Still not that informative. However, the chart suggests that the blue treatment may be lead to a higher probability of living for 9 or more years. See how the two red lines are (weakly) below the two blue lines?

Joint Distribution

Our "RCT" data provides good information about the distribution of the outcome for each treatment. We can bound the joint distribution similar to the Kolmogorov bounds, these are the Frechet-Hoeffding bounds. But I want talk about the "observational" data. This data provides a lot of information about the joint distribution. For each observation we learn two things. We learn the exact amount of the outcome for the treatment received. We aslo learn the bounds on the outcome for the treatment not received.

Again. None of this analysis accounts for the fact that Scott imagined a very small sample size. Nor does it account for any math errors made by me that have nothing at all to do with Scott's analysis.