Good Science Depends on Good Peer-Review1
Darwin might not have anticipated just how widely applicable his theory of evolution would be. Some of his most important insights, buttressed by subsequent discoveries of the mechanisms of heredity, have elucidated form, function, and the diversification and extinction of species and lineages. An understanding of the more general explanatory power of evolutionary theory has only emerged during the lifetimes of the readers of this article, involving areas as diverse as cell biology, political science, economics and anthropology, just to name a few.
I would like to develop the idea that science itself is subject to evolution (Hull 2001), and that peer-review is one of the key processes maintaining and advancing scientific quality (Riisgård et al. 2001; Ware 2008); otherwise put, peer review extends well beyond the improvements made to individual manuscripts.
Together with collaborators I have previously argued that high quality, external reviewers are increasingly at a premium, creating an effect analogous to the “tragedy of the commons” in social evolution theory (“the tragedy of the reviewer commons”; Hochberg et al. 2009; Hochberg 2010). It is well known that over the short term the “tragedy” can only be avoided either through an overarching justice mechanism, or through increased cognizance among community members of common interests. To the extent that the quality and reliability of peer review is in danger, so too, is the very mechanism that maintains and augments scientific quality in the commons.
It is my view that scientific quality can diminish at the scientific community level due to two principal causes. First, in these times of fast and furious publishing, scientists may cut corners: omit to read or follow relevant literature, put short turn-around at a premium, reduce costs to executing science, and not make the mental effort in thinking carefully about their method. Thus, in the absence of questioning oneself and having critiques from peers, a scientist can go, simply put, downhill. Second, the action of the tragedy of the reviewer commons, resulting in insufficient critique on scientific quality will ultimately mean the publication of sub-standard work. It could be argued that low quality reviews can be compensated for, by obtaining a larger number of reports. However, this just accelerates the tragedy.
Thus, we have a problem: selection for substandard work based on “costs” of high standards, and a peer-review system designed to maintain such standards, that is threatened by overuse.
The objective of this opinion piece is to promote consciousness of the central role and importance of good peer review in maintaining scientific quality and promoting scientific progress. It is tempting to view published work as having attained a certain level of scientific quality as ensured by a professionally operated journal. Unfortunately, there exists no independent form of journal quality control in manuscript assessment. By first understanding why peer review is important, we can then consider the challenge of promoting its institution.
The basic ingredient of evolution is heritable trait differences among individuals. When these trait differences are linked to differential reproduction, then evolution by natural selection can occur. To the extent that heritable traits change in frequency, but not due to correlated differences in relative fitness, evolution occurs via drift. Thus, any application of evolutionary thought must identify the mechanism of heritability, resultant traits, and in the case of adaptation, how environments may favor one type of trait over another (that is, qualitative trait selection), or a given level of a trait over another (quantitative selection).
Does Darwinian evolution apply to the quality of science? In my opinion: yes. To see this, it is important to understand that science is communicated in units; that is, as studies published in journals, books, or on the internet, oral or film presentations, just to name a few. These units typically include context of how the problem at hand is interesting, what part of the puzzle is missing, how gaining the missing part was done, what the findings were, and what the findings mean. In article presentation, we usually call these the Introduction, Methods, Results, and Discussion, respectively. In most of the discussion below, I will present ideas in the context of publication in scientific journals.
For science to evolve, we need traits, trait variability, hereditability, and differential “fitness” of alternative traits. Traits may be many in a scientific article. These include scholarliness, experimental methods, statistical methods, etc. For illustration, let’s just say that the relevant traits are “methods”. Traits differ from article to article to some extent, and thus there will be trait variance within a population of published articles. Trait heritability is the employment of specific methods by readers (the vehicle of transmission) of the article in their own (to be) published work. Thus, the method is both the heritable unit (similar to genetic material in biological evolution) and the expressed trait. Author-driven changes in existing methods are possible, and this is similar to genetic mutation. Finally, higher relative fitness is certain methodologies resulting in more papers using those methodologies; this necessitates more activity per vehicle (present readers-future authors) and/or more vehicles citing the methodology.
The goal of every scientist is to communicate her/his study to interested scientists. But, who or what determines the validity of the study? Can some components be “invalid” and others “valid”? Are there many valid alternatives? Given some level of subjectivity in determining validity, what keeps scientists “scientific” at all? Does the selection process occur before the study is presented for communication (e.g., submitted to a journal), during the communication, and/or once communicated?
So as not to get bogged-down in the complexity of scientific traits, let’s rather treat them, for the sake of illustration, as Mendelian heritable traits. Although overly simple, let’s further assume that a trait has two alternatives: valid method or faulty method. How does a researcher decide which method to employ? Three possibilities are that (s)he either (i) uses published study or past experience, (ii) uses advice from other scientists, or (iii) makes a decision independent of the first two alternatives, that is, a guess. We quickly see that if experience is not the principal influence on method (i or ii), then the method will evolve randomly in a population of scientists. Although in itself not ensuring that the valid method will always be adopted, using either (i) or (ii) means that selection potentially occurs. If these selective options permit changes in information in a stepwise process, then at the population level, we refine our views as to which is the better of the two alternative methods. Adaptation based on consensus can occur.
Authors, journals and readers
It’s hardly surprising that population consensus for one method or another would emerge from individual selection. More interesting are the mechanisms promoting the emergence of the valid method, and its subsequent protection from less valid alternatives. Remember, improvement on method will only occur if a subset of possible variants survives. The basic patterns produced by such selection are well known: stabilizing to an intermediate trait, directional to an extreme trait, destabilizing to two or more distinct traits, or fluctuating to different traits at different times. To see how different selection mechanisms work, consider three sources: (1) the authors of the scientific package, (2) the broadcasting support (e.g. journal), and (3) the scientific community (e.g., journal readers).
Independent of the journal and the reader, can the scientist her or himself ensure scientific status quo or even improvement? Yes and no. A qualified “yes” for status quo, because a careful scientist will tend to select methods that are more efficient and scientifically sound, but this process relies on the existence of knowledge benchmarks in terms of quality education and quality publications. A qualified “yes” for quality improvement though discovery, because this is a process akin to mutation, rather natural selection. Thus, the first mechanism relies on the existence of quality, its correct identification by the scientist, and the decision by the scientist to adopt perceived quality, even if it is more time consuming, expensive, etc. than lower standard alternatives. The second mechanism provides the raw material for increased quality, but requires other mechanisms (journals or readers, see below) to be selected.
If we lived in a world where all articles were publishable without any kind of selection or correction, and high scientific quality were the norm, could we prevent drift to lower average quality? In the short term this would be possible, since if a population of large effective size evolves, it will be shielded from such drift (i.e., for the purposes here, random variation entering the scientific method). Evolutionary theory predicts that the effects of drift will increase as effective population size becomes small, and mutation rates (i.e., new methods or increased willingness to try alternative scientific methods) increase. Thus, in the simple example of a population with only two schools of thought on a scientific method, and for which individuals frequently, randomly experiment between the two, drift will eventually mean that one school will come to dominate in its method. The take-home message is that without selection, scientific method will drift, possibly maintaining variants for a long time, possibly fixing the either the “valid” or “less valid” alternative at random. The only way to instill selection (be it stabilizing, directional or destabilizing) is that “environments” differentially favor one alternative or the other. Is the main selective environment the journal or the reader?
Let’s first briefly consider the reader. Can a reader create the selective environment that will result in the frequency of the “valid” method to increase? It’s my argument that on its own it cannot. Take the following example. A paper is published and readers choose to copy the method and cite the article in their own work (hence promote the “survival” of the specific method). What ensures that, on average, readers maintain the method status quo or even improve the scientific quality of the method? Nothing. In order to achieve this, that is natural selection for method maintenance or improvement, we need an independent judge and jury (but see two paragraphs below for discussion of the catalytic effect of prestigious journals).
The independent judge and jury is typically the role of journals. Journals have two basic functions: ensuring method validation or improvement and publication of only a subset of submitted manuscripts. But what keeps the judge=editor from making publication decisions based on information or criteria not related to peer review, such as the subject matter or author reputation? Few journals, if any, have independent checks on the objectivity of their publication decisions (by “objectivity” I mean whether they abide by a set of rules, with potential oversight). Specifically, given that many papers may be judged as “scientifically valid”, journals with potentially strong selection (i.e., high rejection rates) may use many criteria, only one of which is scientific quality. Indeed, selection on scientific quality may be weak if journals estimate that only a baseline level is necessary for publication, and that other criteria outweigh improving scientific quality beyond the baseline acceptable level. Other criteria include the perceived excitement of the subject, communication quality of the manuscript, and the reputation of authors. (Obviously, double and triple blind review alleviates this latter effect, but this does not obviate that reputation affects subsequent citations, and thus differential communication of methods and findings).
There is an additional problem that emerges from high rejection rates, and the multiple, arguably subjective, criteria required to justify rejection: a paper rejected from one journal may be published in another due to differing opinions/criteria between editorial boards (even though journals are obviously not in contact with one another when making publication decisions). It is here that we encounter a subtle, and in my view, important effect. Since authors generally choose to submit their papers first to those journals that they view as the best for broadcasting their work, those journals with the highest impact factors have the greatest responsibility for ensuring that the papers they do publish are of the highest scientific standards. As such, the extent to which readers preferentially emulate and cite science published in the most esteemed journals will create a selective effect on the traits expressed by those cited articles. Even if the journal prestige effect plays a role in the evolution of science, the effect may be moderate or even weak since given the sheer number of journals available, many if not most studies will find their way to publication and potentially influence readers.
The final mechanism presented here that could affect the evolution of scientific quality is the “jury”, that is, reviewers who have independence with respect to the authors and journal. Peer reviewers serve two main functions. First, they provide an opinion to editors whether a manuscript should be accepted, sent back to the authors for revision, or rejected. In and of itself, this does not substantially differ from the judgmental role of editors. More salient is that the reviewer’s recommendation is linked to a report, parts of which may or may not be released to authors. The reviewer’s report comments on the study—and eventually criticizes it—whilst maintaining anonymity. Thus, insofar as the reviewer is scientifically qualified (Thurner and Hanel 2011), her critiques will either not affect or will increase the scientific quality of the manuscript. There is the possibility of course that the reviewer is not sufficiently qualified, recommends changes to the manuscript which lower the scientific standard, and the author(s) either does not realize it, or does, but is unable to convince the editor of the faulty critique. The system is thus analogous to a legal system, with the laws=norms of scientific quality subject to improvement, based on altruistic acts by judges and juries for the “interest for science”. Editors, reviewers and authors work in an iterative, interactive fashion that defines and enforces what scientific quality is. Ideally in such a system there are checks and balances to minimize conflicts of interest.
Thus, my argument is that an independent jury—reviewers who have no vested interest in the journal nor in their own work being published in the journal—is essential to the selection process on maintaining or improving scientific standards, and without them, science would ultimately suffer for the reasons explained above (i.e. drift or biased selection). The essential mechanism is wonderfully simple: jury members suggest improvements to a study, which are reviewed by judges, and a decision is made concerning possibly acceptance, and necessary modifications for this acceptance. These modifications could be viewed metaphorically as “beneficial mutations”, which become fixed upon package modification, and publication in the target journal, or should the study not finally be accepted, in another journal.
I argue that constraints encountered by scientists tend to push scientific quality downward. This is most efficiently counteracted by peer review, but this latter mechanism is threatened by the overexploitation of reviewers, that is, the tragedy of the reviewer commons.
Having an (apparently) independent jury does not obviate conflicts of interest, positive or negative, and issues with reviewer quality (Rothwell et al. 2000) and more generally the publication process (Statzner and Resh 2010), but it appears to be the best means we currently have of achieving scientific quality control. From an evolutionary perspective, external reviewers function to correct methodological shortcomings, and can even create “positive mutations”, that is, improvements in scientific quality that the authors themselves did not anticipate. It is the editor’s function to verify whether or not corrections and suggestions should be followed.
What can be done to conserve, augment and improve this institution? This is a very active area of debate (http://www.nature.com/nature/peerreview/debate/), and three themes are gaining some traction:
1. Online science, such as ResearchGate (researchgate.net) or arXiv (arxiv.org), where manuscripts can be continuously updated based on readers’ comments. In both instances as well as others the verification of the process of scientific improvement needs to be monitored.
2. Peerage, where reviewers are rewarded for their efforts. Several solutions have been proposed to foster peer review (Fox and Petchey 2010; Lortie 2011) or penalize potential reviewers who do not comply (Davidoff 2006; Hauser and Fehr 2007).
3. Education. The transmission of good practice starts with mentors, be they professors, project directors, or experienced scientists. Of course, like the transmission of any practice, refinement is necessary, which ultimately means discussion and debate. This is akin to the selective effect of peer reviewers on journal articles: students and colleagues need to provide feedback to improve the quality of mentorship.
I would suggest that there is no “magic bullet” here. Two or all three of these (and other approaches not listed here) will probably be necessary to maintain quality status quo and promote scientific quality improvements. However, what is certain is that solutions to this emergent problem, much like those to the tragedy of the reviewers commons, need to start with a recognition of its importance and dialogue amongst scientists as authors, scientists as editors, and scientists as readers. In conjunction with this, senior scientists need to take a stance as educators, as editors, as steering committee members.
In closing, it is my opinion that to the extent peer review is diminished or sacrificed, the more scientific complacency will pervade. Science is a culture, and cultures are prone to trait selection (Danchin et al. 2011) and (at least theoretically) erosion (Hochberg 2004). We need to acknowledge the problem and the risks, and seek solutions that will not result in the loss of what generations of scientists have constructed, both through their own science and their improvement of other’s science.
Copyright 2012 Michael Hochberg
1 This is a dynamic
article that will be updated from time to time. I welcome comments with the
perspective of integrating them into this article.
Acknowledgments. I thank Doyle McKey, Isabelle Olivieri and Stephen Stearns for comments on the original version of this article.
Davidoff F. 2006. Improving peer review: who's responsible? British Medical Journal 328: 657–658.
Fox J. and O.L. Petchey. 2010. Pubcredits: fixing the peer review process by ‘privatizing’ the reviewer commons. Bulletin of the Ecological Society of America July 2012, pp 325-333.
Hauser M. and E. Fehr. 2007. An incentive solution to the peer review problem. PLoS Biology 5: e107.
Hochberg M.E. 2004. A theory of modern cultural shifts and meltdowns. Proceedings of the Royal Society of London B 271: S313 - S316.
Hochberg, M.E. 2010. Youth and the tragedy of the reviewer commons. Ideas in Ecology and Evolution 3: 8–10.
Hochberg, M.E., Chase, J.M., Gotelli, N.J., Hastings, A. and S. Naeem. 2009. The tragedy of the reviewer commons. Ecology Letters 12: 2–4.
Hull D.L. 2001. The success of science and social norms. History and Philosophy of the Life Sciences 23:341-360.
Lortie C.J. 2011. Money for nothing and your referees for free. Ideas in Ecology and Evolution 4: 43–47.
Riisgard H.U. et al. 2001. The peer-review system: time for re-assessment? Aquatic Microbial Ecology 26: 305A–313A.
Rothwell P.M. and C.N. Martyn 2000. Reproducibility of peer review in clinical neuroscience. Brain 123: 1964-9.
Statzner B. and V.H. Resh. 2010. Negative changes in the scientific publication process in ecology: potential causes and consequences. Freshwater Biology 55:2639-2653
Thurner S. and R. Hanel. 2011. Peer-review in a world with rational scientists: Toward selection of the average. The European Physical Journal B 84: 707–711.
Ware M. 2008. Peer Review: benefits, perceptions and alternatives. Publishing Research Consortium Summary Papers 4: 1–22.