Figure 1. Replication effect size in function of original effect size for 34 studies. The gray line indicates results that should be expected if original and replication effect sizes were identical. Dots above the line indicate studies for which the replication effect size was greater than the original effect size, while dots below indicate studies for which the original effect size is greater than the replication effect size. Blue dots indicate successful replications while red dots indicate unsuccessful replications (according to Replication Teams’ assessment).

From a broad perspective, experimental philosophy can be understood as the use of empirical methods to address philosophical issues. More often than not, these empirical methods are those of cognitive and social psychology. However, in the past years, doubt have been shed on the general validity of results in cognitive and social psychology, as a lot of them have proven difficult to reproduce (for example, a wide-scale replication project in psychology only succeeded in replicating 36 to 47% of selected studies). In this perspective, it is only fair to ask whether we should trust results in experimental philosophy, and to which extent these are likely to replicate.

This is the question the XPhi Replicability Project set out to address. We selected 40 studies in experimental studies and decided to replicate them to see whether similar results would be obtained. One third of the studies were selected for being the most cited for each year between 2003 and 2015, while the others were selected at random. Then, each study was assigned at random to a replication team that reproduced the original study.

To assess whether the results of each replication actually matched the results of the original study, we used three different criteria:

  • The replication teams’ personal assessment of whether their results supported the conclusion of the original study. According to this criterion, 31 replications out of 40 (77.5%) were judged as successful.
  • The replication’s results were significant (p <.05).  According to this criterion, that does not apply to studies that originally presented null results, 29 replications out of 37 (78.4%) were judged as successful.
  • The original effect size was not significantly lower than the original effect size (and also not significantly higher for original studies presenting null results). According to this criterion, that only applies to studies for which we were able to compute the original effect size, 24 studies out of 34 (70.6%) were judged as successful.


Overall, the results of the XPhi Replicability Project, suggests that results in experimental psychology are highly replicable, with a replication rate between 70 and 78%. Moreover, this estimate is biased by the fact that we voluntarily picked one third of our studies among the most cited ones (that typically present more surprising results). For studies selected at random, this replication rate is situated between 78 and 87%. This indicates that, even if experimental philosophy is not perfect, most of its results can be taken at face-value and trusted. Moreover, it also suggests that experimental philosophers are less likely than other researchers to fall prey to questionable research practices such as selective publishing of significant results, or multiplying statistical analyses to only present the ones that give significant results.

The final draft of our paper presenting the full results of the XPhi Replicability Project can be found at https://psyarxiv.com/sxdah.