Review of "Commentary: How Bayes factors change scientific practice" by J. Perezgonzalez

Post date: Sep 22, 2016 8:07:37 AM

I recently reviewed a paper by Jose Perezgonzalez for Frontiers in Psychology, which was written in response to another paper by Zoltan Dienes, "How Bayes factors change scientific practice" published this year in a special issue of the Journal of Mathematical Psychology. Encouraged by Jose Perezgonzalez's feedback, I thought I would share my review here, in case it triggers some thoughts on this highly controversial topic in experimental science.

I had to read Dienes' paper, which I was not aware of, and must admit that I share your concern with his "reification of Bayes factors". A basic problem I have with Dienes' paper is that part of his case against significance testing is based on the fact that people tend to misuse it (e.g., by interpreting a p-value >= 0.05 as evidence for the null). That a method is not well understood is certainly a concern, but not a reason for dismissing it.

A second problem, which is only partly in line with your commentary, is that Dienes seems to assume that Bayes factors are applicable within the same wide scope as significance testing, while it is certainly not the case. To compute a Bayes factor, one needs to have a data-generating model under the alternative, and this is unrealistic in many situations of practical interest. To return to the treatment assessment example, the problem is not only that we might not be interested in supporting the null, it is also that we are most likely unable to do it because we don't know how the data is distributed if the treatment has an effect (or, if we define that distribution in a Bayesian sense, it would heavily depend on priors, not to mention computational issues).

Another example is testing the null hypothesis that a subject is healthy against the alternative that he is diseased based on a set of biomarkers: one would really like to compute a Bayes factor in this case, but it is unrealistic unless we know the distribution of the biomarkers for each and every possible disease... Reality dictates to be less ambitious and simply decide whether there is a suspicion of deviation from normality.

Therefore, the comparison between Bayes factors and significance testing should be restricted to situations where both the null and the alternative distributions are known. However, even in that case, the comparison still does not make much sense because these two concepts serve distinct purposes. Bayes factors are meant to define binary decision rules (such as 'accept H1 if B > 3') while p-values relate in essence with the specificity of such rules, which is nothing but their replicability under the null hypothesis.

This is an aspect that I found a bit confusing in your discussion, as you state that Bayes factors are "at the same level as Fisher's p-values". It is not clear to me what you mean by this. While I fully agree that Bayes factors have no built-in interpretation in terms of replicability, the same cannot be said for p-values. They do relate with some measure of replicability, however arguably not the one we are interested in since it does not account for the sensitivity, i.e., the replicability under the alternative. Luckily, in the case of binary classification, other replicability measures are available, such as the balanced accuracy (at least if our distributional assumptions hold). Therefore, in this ideal scenario, which ironically corresponds to the assumptions of the Neyman-Pearson lemma, significance testing is not the right thing to do, while Bayes factors by themselves can only be part of the solution.