Article Review Metrics (ARM)

Replacing Journal Impact Factors with prospective, article-based metrics based on peer review.

The Problem:

The incentives of jobs, grants and scientific recognition are bought with the currency of high-profile papers. Young and even established scientists skirt the established venues at their peril. Therefore, the status quo of closed review and impact factor-based evaluation of scientific work is “baked in” to the scientific publications landscape by powerful forces external to the publishing process itself. Nevertheless, there is growing dissatisfaction with this model. Authors often feel editorial decisions lack transparency, and are biased, or at best, capricious. Eventual acceptance may require multiple rounds of revision and resubmission at several journals, often at significant cost and effort and with modest perceived benefit. Trainees, typically the lead authors of these manuscripts, worry that their careers are being held hostage. Although preprints enable rapid sharing of research, career advancement requires not only dissemination, but the imprimatur of publication in a respected scientific journal, yet the gap between completion of a project and final publication can stretch to years.

The current publishing model also takes a toll on scientific societies. In the increasingly competitive landscape, large commercial publishers like Elsevier and Springer/Nature have established tiers of journals crowned by premier generalist journals like Cell and Nature, buttressed by highly respected specialty journals like Neuron and Nature Neuroscience, and joined, in the last few years, by larger, initially less competitive journals like Cell Reports, Nature Communications, iScience and Scientific Reports. By lowering barriers for transfer of manuscripts, for example from Cell to Neuron to Cell Reports, these publishers increase market share at the expense of Society publishers and elevate the prestige of their lower tier offerings. For example, Cell Reports and Nature Communications were founded in 2012 and 2010 and now have Impact Factors (7.8 and 11.9) that exceed those of highly respected Society journals like the Journal for Neuroscience (6.1). To maintain relevance, Society journals are faced with the choice of becoming more selective, but serving as a venue for a smaller constituency, or favoring breadth at the expense of being perceived as lower “impact.”

In the current system, the likely impact and technical merit of a manuscript is judged by reviewers and editors at the time of journal acceptance, and is encapsulated for others in the journal’s Impact Factor. The realization that Impact Factor is a flawed metric of scientific merit is widespread (https://sfdora.org/resources/), but the system of apportioning scientific recognition and reward based on perceived journal “profile” has proven largely impervious to this criticism (https://peerj.com/preprints/27638/, https://www.nature.com/articles/d41586-018-01642-w). One reason for this is the absence of a viable alternative metric. Article-level citations, though increasingly available, are retrospective, and so do not provide a prospective estimate of likely article impact. Citations also do not explicitly address technical merit, while the journal in which an article is published maintains its reputation, in part, by assuring technical merit through review and revision.

ARM: a prospective, article-based estimate of scientific merit.

Editors consult expert reviewers for the purpose of answering a few simple questions about submitted manuscripts:

1) Who is the likely audience of paper? Is the paper likely to be of broad interest, or primarily of interest to specialists in a particular subfield.

2) What is the likely impact of the paper? Does it fundamentally change our way of thinking about a problem or illuminate some more focused aspect of it?

3) How innovative are the questions asked and approaches taken? How novel are the results obtained?

4) Are the experiments and analyses described technically sound? Do they support the conclusions reached or are they flawed or misinterpreted in some way?

These questions, being quite general, can be easily standardized. The answers, coarsely quantified along a 3-to-5 point scale, supported by succinct yet detailed arguments, are the Article Review Metrics. An example assessment instrument is available at https://forms.gle/ESbNNFBv6avtpJz18.

These assessments, like the work being assessed, are subject to change and should be versioned. Good reviews provide authors and editors with useful suggestions about how to increase the impact and interest of the reported findings or how to improve the organization, scholarship and clarity of the figures and text. These suggestions often allow the reviewers to answer the question, “Would your assessment change with adequate revision?”

The assessment of the revised manuscript at the time of journal acceptance is an explicit, expert estimate of the expected audience, impact, innovation and technical merit of the article. Because it is article-based, served as the basis of journal acceptance, and is backed up by justification, it is far more informative than Journal Impact Factor and so is a better tool for prospective research assessment.

The payoffs:

Like open reviews, but better: The benefits of sharing Article Review Metrics and the justifications that support them overlap those of publishing reviews and decision letters: improved transparency, increased fairness, and a platform on which to build a critical post-publication of the work. But they go further. Take a look at the published reviews and decision letter for papers well outside of your own field in a “high-impact” journal like eLife, EMBO Journal, or the British Medical Journal and you will likely find it remarkably difficult to ascertain the reason the paper was considered worthy of publication in that venue as opposed to in a more specialized journal. Concerns about technical merit and their resolution can be followed in great detail, but too often the features that made the paper important and noteworthy are hard to judge without specialized knowledge. Reviewers and editors make these judgements all the time, but they are mostly implicit. Only by making these judgements explicit can we hope to separate our assessments of quality and impact from the journal brand.

Richer information for evaluators: Article Review Metrics (ARMs) provide hiring, promotion, and grant review committees richer information about a paper than the single bit of information present in whether it is published or not, and the additional flawed few bits of information implicit in the journal impact factor.

Community review: A standardized format also provides an easy on-ramp for post-publication review. The complexities involved are best tackled elsewhere, but a simple guiding principle is worth stating here: There is no need to choose between expert peer review, as currently organized by journals, and more inclusive review platforms that enable community engagement (http://www.reviewcommons.org, https://www.authorea.com/inst/14743-prereview). Making reviewer judgements about Audience, Impact, Innovation and Technical Merit explicit allows others to engage with them, learn from them, and agree or disagree with them. Evaluations from multiple sources can then be aggregated flexibly by an interface that enables any desired synthesis of expert, community, chartered, and volunteer reviews.

The next step: Preprints have provided a standardized, centralized venue for archiving and distribution that is independent of journals and can be used to ensure broad access to the scientific literature (https://asapbio.org/preprint-info/preprint-faq). Archives have not replaced journals, they have augmented them. The next step in the evolution of the scientific literature is to bring the evaluative decisions made by reviewers and editors into the open by making them explicit, interoperable and accessible. By formulating these decisions in a broad and coarsely quantified fashion, easily shared across reviewers and disciplines, we can return the focus of the evaluative process to the individual article where it belongs.

The logistics:

Who does these assessments? How are they used? This proposal is intentionally orthogonal to proposals about how review should be organized and how reviews should be used to make editorial decisions. ARMs filled out by journal reviewers and editors help make their judgements explicit, transparent and broadly comparable. Provided legal hazards can be avoided, existing reviews can be translated into ARMs to rapidly build up a database of article evaluations. ARMs are also suitable for non Journal-based reviewing platforms as a way of explicitly encoding judgements about the most suitable venues for publishing reviewed manuscripts. ARMs are also a way of organizing community input. Approaches for balancing transparency and anonymity, inclusion and expertise, and open criticism with author response are needed, but these are issues common to every attempt to move beyond the current closed review system. The choices made with respect to these trade-offs are independent from the choice to embrace a common framework for encoding reviewer decisions.

How do you prevent bad actors from gaming the system? How do you insure uniformity? These are problems for any review system. Ultimately it is critical to review the reviewers. Feedback is required to correct bias. But a more open system, which can be studied by anyone, is less susceptible to these abuses than the closed and more parochial review systems currently in place. As Justice Brandeis said, "Sunlight is the best disinfectant."

Why do we need more metrics? Aren't there too many already? Don't these just provide more opportunities for abuse? The key is that ARMs are independent of journal brand. Authors need opportunities for their best work to recognized as having general interest and broad impact without having to publish in Nature. Alternative prospective metrics are needed to break the current vicious cycle. Of course, these explicit judgements must be supported by reasoned arguments. When considering dozens of papers, reading these arguments, or reading the papers themselves suffices. But when faced with choices about which of hundreds to thousands of papers to read and evaluate, some coarse stratification is required. This is especially true when trying to evaluate science in fields other than your own.


Please send comments and suggestions to Sacha Nelson: nelson@brandeis.edu