DAGs are Cool

Dags are not cool

Growing up in the eastern suburbs of Melbourne in the 1970s, dags were definitely not cool. Not even close. In fact, dags were the exact opposite of cool. If you are not sure what a dag is, check out Magda Szubanski's character Sharon from Kath and Kim above.

Dags are also German

My German is limited to want I learned from Hogan's Heros and the extremely useful pun, "donner dag es doner dag."

DAGs not dags

While I’m sure dags are still uncool in the eastern suburbs of Melbourne, DAGs or directed acyclic graphs are finally cool in economics. You know that they are cool when established econometricians like Guido Imbens publish a Journal of Economic Literature dedicated to DAGs and econometrics texts like Scott Cunningham’s Mixtape make DAGs central to explaining the topic.


I remember back in the 90s going to a psychology talk and the presenter putting up a “picture” of his model. I was nonplussed. It was not cool. At the end of the naughts I came across Judea Pearl’s Causality. I was intrigued but not convinced. The book is not cool. In 10s, I read Mackenzie and Pearl's Book of Why. I was convinced. DAGs are not only cool, they could improve econometrics and economics.


Using DAGs to explain IV is cool. Imbens has been doing this for years, but its not really providing anything new just a clearer explanation of the model. Can DAGs make a real dent in our understanding?


I think so.


I think there are two areas where DAGs could have a real impact in economics and econometrics, collider bias and omitted variable bias. I will briefly discuss the latter.

Dual path DAGs

Consider Duke labor economist, Peter Arcidiacono’s problem of trying to show that Harvard’s admissions policy discriminated against Asian-Americans (Peter's paper). A simple analysis of the data shows that Asian-Americans are substantially less likely to be admitted to Harvard. But why? There are two possible causal effects, a direct causal effect and an indirect causal effect. A direct effect means that Harvard explicitly uses the race of the applicant in deciding who is admitted. An indirect effect is that Harvard uses some other characteristic of the applicant such as their SAT scores and extra-curricula activities, and these characteristics are determined by race. If Harvard is being racist, does it matter whether it is direct or indirect? I don’t know, but one may put Harvard in legal jeopardy while the other may not.


In the DAG below think of X as Race, Y as admissions and W as extra-curricular activities.

A DAG showing that X can causally affect Y in two different ways. There is a direct effect labeled "b" and indirect effect which is labeled "d" and "c". The indirect is "mediated" by W.

Regressing Y on X, we don't get b. Rather we get b + dc.

Imagine a problem where we need to determine if b = 0. This is equivalent to determine whether Harvard is using race in the admissions process. Regressing Y on X does not answer the question.

Just throw W in the regression!

The standard solution to the omitted variable problem is to think of our regression of Y ~ X as a short regression, where the true long regression is Y ~ X + W. If that is true, then we could solve our problems by putting W in the regression. In theory, this works. In practice, not so much. The problem is that X and W are correlated. The arrow from X to W means that X causes W, and we know that causation implies correlation. This correlation between X and W may make it difficult for the regression to distinguish the two effects and thus the long regression is not able to determine if b = 0.

DAGs to the rescue!

The DAG above suggests that there is more than one way to skin a cat. There is more than one way to estimate b. We already said that we can estimate the total relationship between X and Y, b + dc. If we could just estimate d and c, we would be golden. Estimating d is not a problem. Just run the regression W ~ X. Then we can regress Y ~ W to get c. Not so fast young padawan.


The problem is what Pearl calls the "backdoor" relationship. Running the regression Y ~ W gives c + b/d, which is not equal to c. The reason is that there is a direct relationship between W and Y, but an indirect relationship through X. Now there is a case where we can run the regression and get the right answer, that is when b = 0.

Simulation

The simulation below considers the problem represented by the DAG. If you regress Y ~ X you get 12, but b is 0. If you regress Y ~ X + W, you do get b = 0, but only some of the time. You can also get that b is really not 0 (see the first column in the table below). In fact you can get a confidence interval that does not include 0 (see the second column)!

On the other hand, taking the DAG seriously and estimating the various parts, we can determine that b is really close to 0 (see the third column).

The figure presents the distributions of the estimates of b from the simulations and the two different methods. The true value of b is 0. The solid line is the density of estimates of b using the standard long regression approach and the dotted line is the maximum and minimum of the estimates using the DAG.

The long regression gives estimates of b that are all over the place. The DAG based approach gives estimates that are really close to the true value of 0.