Failed Replications, Probabilities, and Tarnished Reputations

Post date: Jan 24, 2014 9:05:22 AM

What is the likelihood a novel published finding will replicate in a high-powered replication study? And what does success or failure tell us about the researcher who performed the original experiment?

Obviously, we would love to know the probability that novel findings will replicate (i.e., if they are true in the population), but this probability is very difficult to quantify based on a single study. A formally correct answer is: “We won’t know until we try”. Even though we can never know for sure, the probability that a published finding will replicate differs depending on characteristics of the study and the prior likelihood that the hypothesis is true. An important characteristic is the sample size of the original study. In general (but not necessarily!) the larger the sample, the more reliable the effect size estimate. This means that results from smaller studies are more likely to be false positives (see Ioannidis, 2005). This is a basic statistical fact.

There are some people who think that when researcher B fails to replicate researcher A, the reputation of researcher A is tarnished. How could this be true, if the likelihood that a finding replicates is a characteristic of the data, and not of the researcher? I have been pondering this question, and can see two logical flaws people might make.

First, people who think failed replications affect the reputation of the researcher fail to understand basic statistics. Not every finding that is significant should replicate (even though some people still believe that if a study is significant at the .05 level, it should replicate 95% of the time). If you live in a magical world were all significant findings should replicate, a failed replication must mean Researcher A has done something that gave the data cooties, since we all know that only research that has cooties does not replicate, and giving your data cooties is bad for your reputation.

Instead, we should accept and understand that in academia, researchers submit, reviewers recommend to accept, and editors decide to publish articles that from the very outset have a high probability not to replicate (Wait – you said you couldn't quantify the probability something replicates?! I know – I’ll explain in a later blog how you can get at least some ideas about this probability).

Second, people might think that a researcher who submits an article for publication has established the reliability of this finding in separate, not reported studies. This is an interesting perspective, but I think it is unlikely that published studies with tiny samples (e.g., 20 participants per between subject condition) were preceded by studies with very large samples, which a researcher nevertheless decided not to publish. So, for the published studies least likely to replicate, the assumption that researchers established the reliability of these findings in pilot studies with 4 or 5 times the sample size of the published study seems improbable.

If someone believes a failed replication tarnishes the reputation of an individual, then logically this person should believe that publishing a study that is relatively unlikely to replicate (e.g., with a medium effect size and 20 participants in each condition, testing an a-priori unlikely idea) should tarnish a researchers reputation. After all, the probability that a finding replicates is fixed at the moment this data is published. Therefore, I don’t understand why you would question a researchers reputation when a replication fails, but not when the replication succeeds, and I’m pretty sure academia would be a lot more fun, and slightly more professional, if we would all relate the success or failure of a replication to characteristics of the data, and not to characteristics of the researcher.

Google Sites

Report abuse