Peer and expert review for submissions to academic journals, national research evaluation exercises, promotion/tenure boards, and funding bids consume enormous amounts of expert time. This issue is exacerbated by the extensive informal pre-reviewing by colleagues seeking to help select or improve work for its eventual formal review. Whilst the peer review process is partly a learning experience for the reviewer, it consumes an enormous amount of time annually that might be spent instead on conducting research. In this context, the advent of GPTs with the capability of writing plausible academic reviews presents both the opportunity to assess whether this new technology can help to reduce the reviewing workload and the threat that overburdened reviewers will exploit GPTs to partly or completely write their reports, potentially threatening the security of all academic decisions based on the results. It is therefore urgent to assess the capability of GPTs for academic review tasks.
Most prior research into the use of AI for research evaluation has used traditional machine learning approaches with a range of inputs (e.g., early citation rates, number of authors) to predict long term or medium-term citation counts as a proxy for journal article quality, with a few directly predicting journal article quality (see: Approach section). GPTs represent the state of the art in general purpose AI and seem particularly suited to academic peer review, as least in the sense that they are effective at summarising documents and generating text is the task of peer review. One small scale prior study has assessed the capability of ChatGPT 4.0 for academic peer review, finding evidence of a moderate ability to differentiate between weak and strong work written by a single author (Thelwall, 2024). Larger scale, more systematic studies are needed, however. The primary practical obstacles to these are (a) peer review reports and article scores are usually private, and (b) most open access copyright does not clearly permit the uploading of documents to GPTs/LLMs that learn from them.
The current project exploits existing academic quality control scores and reports to assess the ability of GPTs to write peer review reports and give scores/recommendations. These existing sources are (a) internal private departmental review scores and reports for REF output selection procedures and (b) public journal article peer review reports and scores/recommendations for two large journals/platforms. Copyright issues are dealt with by (a) obtaining explicit permission to use GPTs from copyright-holding authors and (b) using copyright compliant GPTs for CC-BY copyright articles. The project will run experiments with GPTs to assess (a) their ability to make statistically valid/useful recommendations and (b) how GPTs construct plausible reports and choose scores/recommendations.
We will build community by creating a Generative AI in Scholarly Review International Advisory Committee (GAISRIAC) and associated events from our powerful steering group of international experts, end users and international bodies. This will directly inform those in charge of academic quality control processes, including journal editors, research managers, and the REF team, and provide advice on ethical GPT exploitation for academic reviewers.
The UK Metascience Unit was founded in 2024 to advise the Department for Science, Innovation and Technology (DSIT).