Observational vs Experimental Research

The key distinction between observational and experimental research is in how you gather information. Observational research relies on information that already exists. It might be economic data, it might be opinions about health care reform, it might be images or texts, but it is out there already. If you want to show that one thing causes another, you need to eliminate alternate explanations or maybe show the process by which a cause leads to an effect – everything I talked about in my video on cause and effect.

Experimental research creates data, and does so in a specific way. By controlling and randomizing who is exposed to a cause, experimental researchers are better able to show causal relationships. In this video lecture, I am going to explain how experiments work in political science in order to clarify this distinction, then end with a discussion of the pros and cons of observational vs experimental research.

Okay, for the next few minutes I am going to describe a lot of technical terms, but bear with me – it will make more sense when I get to the example. We will also review experiments in detail in our final course module.

One thing I did not mention in my video on cause and effect is the idea of inference, in particular causal inference. This is just a technical term to describe what we do when we look for causal relationships. The challenge with observational research is that we are supposed to be observing what is going on in the world. But you can’t observe everything, and sometimes we have to infer conclusions from the information we have. There are many types of inference, here I will just explain the difference between descriptive and causal inference. Descriptive inference is making assessments about facts that is hard to observe. For example, what is Biden’s approval rating? Or, how many non-citizens vote? Causal inference is drawing conclusions about the causes of those facts, such as does education influence vote for Biden? Or, do voter ID laws depress turnout?

But there is a fundamental problem with causal inference (and yes, it is always referred to this way). The problem is that, to determine causality in an ideal world, you would find two identical cases where one was exposed to a cause and the other was not. But those two identical cases don’t exist – no two people are exactly alike, no two moments in time. And you can’t expose the same person to a cause (this is called receive a treatment in the language of experiments) and not expose them at the same time. This makes more sense with an example. Let’s say you wanted to see whether a vaccine works against the current strain of the flu. To test this, you want to compare the whether someone who got the vaccine is healthier than someone who didn’t. But you can’t both give and not give the same person a vaccine, so it becomes hard to rule out alternate explanations – you can’t necessarily say that some other factor about the person who got the vaccine isn’t causing their health outcome.

So this is the fundamental problem of causal inference: You can never observe the effect of receiving two values of an independent variable on the same case at the same time.

Observational research is always subject to that problem, but experiments can resolve it by looking at average treatment effects. Instead of comparing the effect of a vaccine on two people, we test it on a large number of people. What is important is that when you give out the vaccine, you give it to a randomly selected subset of your study population. If your group is big enough and you are truly random in your selection of people who get the vaccine, or your potential cause (again, the technical way to say this is assignment to treatment), then all the differences between people that might affect the outcome get averaged out. On average, those two groups will be equivalent on any key factors, like gender, race, income, prior health status, or any characteristic that might impact those two results,(even those we can’t think of. Two people can’t be the same, but two groups can be (at least on average). When the only difference between two groups is exposure to the treatment (aka the cause, here, the vaccine), and we see a difference in outcomes (getting the flu or not) between those groups, then we can conclude that the vaccine worked – that it was a cause of better health.

Let’s look at a political example to see how this works in more detail. Back in the early 2000s, a scholar ran an experiment on whether negative campaign ads influence whether people vote. One way to study this would be to ask people in a survey whether they considered advertising when they decided whether to go vote, but it’s hard to know if they would tell the truth or not. So the academic got a list of all the voters in a certain area (this is publicly available info) and divided them at random into two groups. And he sent one of those groups a negative campaign ad (about a mayoral election) in the mail. Then he looked at voter records after the election to see who voted and who didn’t (again, records of whether someone voted in an election is publicly available, although who they voted for is not).

Since he randomly divided the group of voters into two groups, all of the demographic characteristics that could influence turnout (what we might call observable variables) – and all of the things you can’t observe like personal preferences – should be randomly distributed between the two groups. The math doesn’t work out perfectly when you look at seven observations, but you would conduct this study with several hundred people. The more people you have in your study, the more likely you are to get equivalent groups. Then, to see what the effect of sending a negative ad was, you average the turnout of people who got the ad and subtract it from the average turnout of people who didn’t.

We’ll talk more about how to design an experiment later in the class, but for now I want to clarify two points. The first is random sampling vs random assignment. There’s a lot of random in political science. Random sampling is how you choose a representative group to study. You look at the whole US population or all the voters and choose a smaller group from that population that you can actually survey. If you choose randomly from the population, all of its characteristics should be represented in the same proportion in the sample.

Random assignment is how you divide up that sample. You might flip a coin and assign some of the people in your sample to the treatment (they get the political ad or the vaccine) and some people to a control group (that does not get the ad or the vaccine). Doing this correctly will allow you to demonstrate causality – it establishes the internal validity of your study.

For an experiment to give you results, you need both random sampling and random assignment. For observational research, you only need random sampling (if that is the right case selection strategy for your question).

The other point of confusion is the difference between a control variable and an experimental control. An experimental control is a group of cases or observations – the ones that did not get exposed to the treatment (the independent variable). In this set-up about plant growth, the control is the plan that did not get exposed to sunlight.

Control variables are potential alternate explanations of the outcome you are studying. So in this case, they are the other things that might impact how a plant grows, like the soil, amount of water, type of plant, etc. In an experiment, your control group needs to hold all the control variables constant – they should have the same soil, amount of water, etc. – so that the only difference between the control and treatment groups is the independent variable (here, sunlight).

Similarly, in our political advertising setup, the control variables are demographics like gender, race, and employment status, but the control group are the people who got the negative add (subjects 2, 5, and 6). With a large enough sample, your control variables should average out to be equal for both the control and treatment groups.

So those are the basics behind how experiments work. Why would you use them? Basically, they are great at establishing internal validity, although they are weak on external validity. An experiment is the gold standard for establishing causality – you can be fairly certain that you have shown a cause leads to an effect. But you only discover a cause for the group of cases you studied. Just because you randomized within a sample doesn’t mean that this sample represents the group you wanted to study. In our negative advertising example, for ethical reasons, this study was conduct around a non-competitive mayoral election (it would be wrong to do something that could change the results of a competitive race just for research). So we have to consider whether this experiment tells us anything about the role advertising plays in competitive elections.

Similarly, experiments don’t explain how a cause leads to an effect. The diabetes drug Ozempic went through a series of clinical trials (medical experiments) to demonstrate it is safe and effective, but scientists haven’t figured out yet how it reduces cravings and promotes weight loss. We know it works, but we don’t know why.

The last set of disadvantages of experiments relate to the fact that you can’t always do one – not everything you want to know about politics can or should be tested with an experiment.

With these disadvantages, why do we do experiments? Basically, because showing causal relationships – establishing internal validity is so hard. It’s great when we can conduct a study that does that well. And most observational research tries to mimic this experimental set up. When we use control variables in statistics or try to make sure two cases we study are similar in as many ways as possible, we are trying to set up a control and treatment group to study. The concepts are the same, even if we implement them in different ways.

Page updated

Google Sites

Report abuse