This video discusses how we talk about causality in research – whether we think a study did a good job of demonstrating that one thing caused another or not. This is called the internal validity of a study. This video defines internal validity and discusses three main threats to it: measurement error, omitted variable bias, and reverse causality.
As a reminder, causality means that we see a relationship between two variables where:
1) They are correlated (when one variable changes, the other changes)
2) The cause comes before the effect, and
3) No third variable can explain their relationship
We say that a study is internally valid when it does a good job at showing that a correlation we observe is in fact a causal relationship. But there are lots of ways that can go wrong. These are called threats to the internal validity of a study, and there are many. In this video, I will discuss three important ones, and when we talk about statistics we’ll discuss a fourth – statistical error.
The first and most important part of doing good research is making sure you are studying what you say you are studying. Whether we are doing qualitative or quantitative research, we make assumptions and simplifications when we move from discussing a broad concept, like a leader’s legitimacy, to deciding how we measure this. Whether we use a qualitative measure, like the way a leader is portrayed visually on television, or a quantitative measure, like a poll about a leader’s popularity, when we move from defining an idea to finding ways to observe it, we lose part of the idea. Citizens may see a leader as legitimately elected even if they don’t like the leader, for example. As discussed in the video on operationalization, if you choose a measurement that consistently under or over estimates your concept, you have introduced bias into your study. This concept is called construct validity and when you do it poorly you have measurement bias or error. So measurement error, a threat to construct validity, is our first threat to internal validity. Because you can’t say you have found that one thing causes another if you never measured those things correctly in the first place.
The second threat to internal validity relates to the second requirement for causality: that the cause come before the effect. When you can’t demonstrate this, you may have reverse causality – your effect might actually be a cause. For example, the United States holds elections in what are called single member districts: we elect representatives from a geographic area and whoever gets the most votes wins. The U.S. also has a two party system, which is the most likely outcome of this type of electoral system. But it’s not actually clear whether the institution causes the number of parties or whether the parties choose the institution most likely to keep them in power. Here’s what I mean by that. A common theory is that the magnitude of the electoral system determines the number of parties (this is called Duverger’s law). When districts elect 1 person, you get 2 parties. When you have larger districts (Germany, for example, elects between 10 and 50 or so representatives from party lists in their equivalent of states) you get more parties. But it doesn’t always work like this: The UK has the same electoral system as the United States, but in recent years, we have seen many parties emerge. So it’s not clear that the institution is the causal factor in this case – the fact that the U.S. has had two parties for such a long time might be the reason why we keep our single member majoritarian system, not the other way around. That is reverse causality and why we have to show that a cause comes before an effect.
The final threat to internal validity I will talk about here is omitted variable bias. The third and most important part of demonstrating causality is ruling out alternate explanations. When you don’t do that you have omitted (or left out) a control variable. Let’s go back to the United States’ two party system. If we want to prove that it is a result of our electoral system, then we don’t just have to show that the institution came first, but that nothing else explains our number of parties better. Another common theory of what determines the number of parties has to do with social cleavages. When a country’s population can be divided into many different categories (such as race/ethnicity, social class, religion, etc.). When this happens, you are likely to get many parties to represent all of those categories. You might get parties based on social class (like a social democratic party or a labor party, representing the working class) or religion (such as India’s largest party, the BJP, which advocates for Hindu nationalism). When a country’s population is polarized into two groups, as the United States is right now, it tends to be represented by the two parties those two groups fall into. So to show that the electoral system is a cause, you need to rule out polarization as a cause of the two-party system in the U.S. That is ruling out alternate explanations. Along with measurement bias, omitted variable bias is one of the most common ways to critique someone’s research – you can say they didn’t take all they factors (or control variables) they needed to into consideration.
These last two threats to internal validity often get lumped together into my least favorite word in political science: endogeneity. I am not going to hold you responsible for knowing this word, but am going to discuss it for two minutes so that you don’t worry if you see it in an article. I hate this word because it refers to two different ideas: omitted variable bias (aka you didn’t rule out alternate explanations) and reverse causality (aka your effect is actually the cause). You will see this word used as a noun and an adjective (endogenous) and it is the opposite of exogenous. Exogenous is good – if something is exogenous to an effect it means it is an independent variable, it can be a cause. Saying you have an endogeneity problem or something is endogenous to a variable is bad – it means you don’t have a causal relationship. Again, I will never ask you to use these words, but mention them because academic language is opaque, and this is one of the most difficult to understand parts.