Case Selection and External Validity

This video covers three topics: What is a case, how we choose what cases to study, and how that can bias the results of our study. To summarize, cases are the unit of analysis we choose to study. When we design our research, the most important thing to remember is to study cases that differ on the dependent variable. Without proper case selection we can’t demonstrate causality and may have difficultly applying the results of our study to other situations.

What is a case?

So to begin with, what is a case? Like variable, this is one of the harder research terms to define. A case can be an event (like an election), a geographic unit (like a state or a country), a period of time (like a year), a specific social movement, or a piece of legislation. It is the fundamental unit that you are trying to study. This is also a word that is typically used in two different ways in political science: Sometimes we mean case as observation – the unit of analysis in statistical research. But in qualitative research, historical analysis, and other fields like business, case tends to refer to a case study – a combination of observations about an event or policy or whatever that tell a story about what you are trying to research. In general, when I am talking about statistics, I will use the word observation, and when I say something like “comparing cases,” I’ll be talking about qualitative or experimental analysis of a handful of things (like comparing drug policy in two states).

Case selection

So how do you decide what cases or observations to include in your study? The level of analysis really depends on the individual project. You need to look at your outcome, your dependent variable, and see where there the most variation. For example, if you want to study variation in the minimum voting age, you could look at differences across states (your unit of analysis would be state), you could look at differences over time in the federal voting age (your unit of analysis would be year), or you could compare laws in different countries (your unit of analysis would be country). Which one you choose depends on what question you are asking. Choose the one that is of greatest interest, the one you will be able to get the most information on, or the one where there is the greatest variation – the most difference in outcomes.

As you are making this decision, you need to take two things into consideration:

1) First, how many cases or observations do you want to study? If you are doing qualitative or historical research, you typically will not choose more than 4 cases to study. Statistical analysis and surveys are best done with at least 100 observations. Sometimes you want to study all cases or observations – this is called a census. And sometimes, if you are just trying to identify initial correlations or are dealing with large systems (like the weather), you may want tens or hundreds of thousands of pieces of data (this is what we mean by big data). The more observations you have for statistical analysis, the easier it will be to identify correlations, although showing one thing causes another still requires more work.

2) Second, which cases or observations do you want to study? The most fundamental rule of thumb is that you need variation on the dependent variable/the outcome. But beyond that, the cases you study will impact the results you get and you want to choose the cases that will do the best job of showing a causal relationship that is generally applicable (or that maximizes internal and external validity, in formal terms). There are many techniques for this – here, I will review three.

The most common way to select cases (and the best to choose if you have large numbers of observations) is to take a representative sample. The best example of this is sampling a population for a survey. To accurately represent the views of a large group of people, you can randomly select respondents to ask questions to. If the way you chose survey participants is truly random, then your results will apply to the full population.

Sometimes you don’t have large numbers of observations, though. If you want to study major wars, there (fortunately), haven’t been that many in modern history – at least, there haven’t been the thousands you would need to choose a random sample. When you have small numbers of cases, the two most common strategies are controlled comparison or single case studies. Controlled comparison (also called Most Similar Systems design) means that you take two cases that are similar across all the control variables you have identified, but differ in the cause and effect. For example, I wrote my senior thesis on why there was ethnic conflict in Bosnia, but not Macedonia. These are two similar countries – both parts of the former Yugoslavia, so they shared a history, and are both ethnically diverse (the most common reason people cited for civil war in the 1990s). But there was only civil war in one country, not the other. That leads us to look at the differences between the two countries (in this case, a federal political system in Bosnia, but unitary system in Macedonia) as the cause of conflict.

Finally, if you only have the resources to study one case, then think through whether you want to study a typical case – the one that best demonstrates the causal relationship you are studying – or an outlier – a case that is different from all the others, so you can figure out what makes it different.

Selection Bias

Case selection is so important because it is one of the biggest sources of bias in research. When case selection goes wrong, you have selection bias. I’m going to provide examples of how these can mess up your research, then define them in more detail.

The first problem is selecting on the dependent variable. Sometimes this just means forgetting to vary your outcome. If you want to study what causes ethnic conflict, but only look at cases where it happens, you are going to form the wrong conclusions. You might say ethnic diversity causes conflict, because all of your cases are, by definition, diverse. But there are plenty of diverse countries that didn’t have wars, so without studying negative cases – cases where the outcome didn’t happen – you will identify the wrong causes. But more often, you accidently select on the DV. For example, let’s say you want to study the impact of taking accounting classes on your income. If you decide to study how many accounting classes MBA students take (represented by the dotted line), you are going to bias your research. The thing is that MBA grads make a lot of money – even without having studied accounting they are making around $100,000 a year. So taking additional accounting classes doesn’t make much of a difference – you make a relatively small amount of additional money. If you look at everyone working in a company, you see a larger effect (the solid line). Taking accounting classes can significantly increase your income in these cases. So by limiting the range of cases you looked at on the DV/outcome (only high incomes vs all incomes) you didn’t see a causal effect that was there.

The second problem is selecting on the independent variable, which is just called selection effects. This is very easy to do wrong and can show you causal relationships that don’t exist at all. For example, let’s say you want to study the impact of labor law on economic growth, and you want to study whether limiting the ability of unions to organize helps growth (because it can keep wages low and therefore companies can sell more at lower prices, etc.). If you study the effect of repressing labor looking only at cases from Southeast Asia (the so-called Asian Tigers that had very high growth since the 1960s), you will see a relationship that looks like the graph on the left. The countries that repressed labor the most achieved the highest levels of growth. But if you look at all developing countries over the same time period, you get the graph on the right – there is no relationship between labor repression and economic growth. So the problem with limiting your cases is that unless you do it well, you might be accidentally ruling out some causes and over-emphasizing others. In this case, the Asian countries all had medium levels of repression, but they also shared other qualities that might explain growth, like active industrial policies (meaning the government actively choose industries and companies to support).

There are many types of selection effect beyond sampling bias, which we just discussed. Sometimes the bias isn’t obvious or the result of a decision made by the researcher. For example, there may be attrition in the cases you study. Sometimes not all research subjects survive. This matters in longitudinal studies (those that examine change over time) such as one of the surveys done as part of the American National Election Study. That particular survey interviews the same people every few years to see how and whether their political behavior changes. But people die, and if people with certain opinions tend to die earlier than others (mortality rates are higher in the south, e.g.) your results will become more and more biased with every survey. Survivorship doesn’t just apply to people – in international relations, an important source of bias is state survival. There are many more countries in Europe today than there were in 1900. Some states disappeared entirely (like the Austro-Hungarian Empire) or were created and then disappeared (like East Germany or Yugoslavia). If you want to study the results of war or of state strength, you have to consider that only the strong states may have survived – that we are looking at a biased sample due to history.

The last two types of selection bias I want to talk about both relate to why people choose to participate in a study. Non-response bias is bias that emerges when certain types of people choose not to answer survey questions or participate in a medical trial. So there are many reasons why the polling in advance of the 2016 elections was a bit misleading, but one contributing factor is the fact that few people respond to opinion polls, and Republicans are less likely to fill out a poll than Democrats. When a key group is underrepresented in a survey, it is hard to predict how they will vote, and you end up with a gap between the results expected from polling and the actual results of an election.

The last type of bias is volunteer bias, which is very common in Internet polls. When anyone can choose to participate in a survey, only people really invested in a topic – or, as in the case of a British survey on what to name a research sub, people who find an answer funny. Volunteer bias also comes up in other ways. If you want to study the psychology of people who run for president, for example, you need to realize that standard psychological studies may not apply – people who run for president are different from the average American: more ambitious, more educated, more risk-acceptant, there could be many differences. So you can “select into” leadership just like you select into answering a poll.

So at the end of the day, external validity is about how good a job you did at case selection. Did the cases or observations you chose to include in your study represent the universe of cases you were interested in? If yes, then you are more likely to have internal validity (because you are more likely to have correctly identified a cause) and you can have external validity. Because external validity’s shortest definition is about extrapolation. Can the conclusions you draw from your study apply to other situations? If you study a representative group of Americans, you may be able to draw conclusions about the whole US population. If you study Bosnia and Macedonia for your study of civil war, can your conclusions apply to other countries too? Under what conditions?

Selection bias is thus a threat to external validity – your ability to apply your conclusions in other contexts.

Page updated

Google Sites

Report abuse