Kataoka, 2023

Appraisal of: Kataoka Y, Taito S, Yamamoto N, So R, Tsutsumi Y, Anan K, et al. An open competition involving thousands of competitors failed to construct useful abstract classifiers for new diagnostic test accuracy systematic reviews. Res Synth Methods. 2023;14(5):707-717

Reviewer(s): Caroline Higgins and Julie Glanville

Full Reference:

Kataoka Y, Taito S, Yamamoto N, So R, Tsutsumi Y, Anan K, et al. An open competition involving thousands of competitors failed to construct useful abstract classifiers for new diagnostic test accuracy systematic reviews. Res Synth Methods. 2023;14(5):707-717

Short description:

Using machine learning (ML) techniques currently used for title and abstract screening in updating systematic reviews, the authors organized an open competition for participants to submit ML algorithms, termed “abstract classifiers”, designed to accurately identify primary DTA studies in bibliographic databases.

Over three months in 2021, the authors received 13,774 submissions from 1429 teams. To evaluate/validate submitted abstract classifiers, the authors used the Fbeta score. This is a ML performance metric to assess each classifier’s precision and recall, with 0 representing the lowest possible value and 1 representing the highest. The winning team classifier achieved a Fbeta score of 0.4036 for precision and a recall score of 0.2352. With this low recall score, the authors concluded that the use of abstract classifiers is not currently recommended for information retrieval for diagnostic test accuracy systematic reviews.

Limitations stated by the author(s):

The authors identified several limitations to their study. One was the potential inclusion of non-DTA records in the test sets because, when authors could not determine if a record was a DTA study, they erred on the side of inclusion. Second, the authors recognized that developing other, more effective, abstract classifiers using alternative approaches, such as natural language processing, could be a possibility.

Limitations stated by the reviewer(s):

None

Study Type:

Primary study

Related Chapters:

None

Tags:

Diagnostic test accuracy studies

Machine learning
Classifiers

Page updated

Report abuse