Diagnostic accuracy


Julie Glanville

Caroline Higgins


Corinne Holubowich, René Spijker and Anita Fitzgerald contributed to earlier versions of this chapter.

Last updated: 7 December 2023

What's new in this update


We identified one relevant publication for this update, Kataoka et al. (2023), which adds to the evidence on suboptimal performance of DTA study design filters, or machine learning derived “abstract classifiers”,


HTA may include assessment of new diagnostic technologies or techniques. These can involve the identification and review of diagnostic test accuracy (DTA) studies designed to differentiate between individuals with and without a target condition (1).  In 2008, Cochrane published an evidence-based guide to searching for DTA studies, which provided the original basis for this SuRe Info chapter (2). Subsequently the SuRe Info chapter has been updated by the authors listed above twice a year. The Cochrane Handbook for Systematic Reviews of DTA (version 2) chapter 7: Searching for and selecting studies was published in March 2022 and is available to Cochrane members (3). Key additional messages from the 2022 Handbook have been added to this SuRe Info chapter. Going forward, this SuRe Info chapter will continue to be updated every six months as new research is published.

DTA studies tend to be poorly reported and searching for them can be problematic due to this inadequate reporting and inconsistent terminology (3), the absence of appropriate indexing terms in some databases for this publication type, and inconsistent use of suitable indexing terms where they are available (2,4–5).

Sources to search

Relying only on searching MEDLINE is not recommended, as it is unlikely to be the most comprehensive source of diagnostic information and because diagnostic studies are not easy to retrieve efficiently in bibliographic databases (6). Relative recall analysis of systematic reviews has also suggested other databases might yield additional studies including Science Citation Index, BIOSIS and LILACS (6). Recent analyses have suggested that fewer databases might be adequate, but are weakened by their reliance on known-item searches (7–9). Review searches may not detect all the records in MEDLINE that might be relevant to a review, so searching other databases provides opportunities to pick up (MEDLINE indexed) studies by other routes. An analysis of ten meta-analyses found that only using studies indexed in MEDLINE did not impact significantly on the sensitivity and specificity estimates of the meta-analyses in those reviews (7). A second analysis of 16 meta-analyses of diagnostic accuracy studies of depression screening tools found 94% (range: 83-100%) of the primary studies included in the meta-analyses were indexed in MEDLINE (8). The remaining non-MEDLINE indexed studies were located in Scopus, PsycINFO, and Embase, suggesting searching additional databases may reveal further relevant studies (8). The authors acknowledged that the quality of the majority of the original reviews could not be determined. A 2015 study of nine reviews performed by a single research group found that the reviewers’ original searches would have found 85% of their included studies from MEDLINE and Embase (range: 60-100%) (9). Adding reference checking to the process would have found 93% of the included studies. There is evidence from one case study that searching regional databases, such as Chinese databases, may identify studies not identified from MEDLINE or Embase (10).  A number of COVID-19 focused databases have recently been developed (see 2022 Cochrane DTA Handbook (3)). 

There is no information on whether dissertations about test accuracy research are valuable for DTA reviews, but the 2022 Cochrane DTA Handbook suggests they should be considered and describes some specific collections as well as noting that CINAHL and PsycINFO contain dissertations (3).

As well as the major bibliographic databases typically searched for effects and safety evidence, the following databases might also be considered:

An extensive list of databases can be found in the Appendix to the Technical Supplement to the Cochrane Handbook chapter on searching for studies (13)

HTA agencies may also undertake assessments of diagnostic tests and so agency websites should also be explored, for example NICE diagnostic test guidance.

Although the proportion of ongoing studies investigating diagnostic test accuracy may still be relatively low (14-15), some are being recorded prospectively on trials registers such as ClinicalTrials.gov and the ICTRP portal (16). In the absence of study registration, it may be helpful to search for study protocols: in the field of radiology studies protocols were available for 10% of studies in a 2018 paper (15). Searching for unpublished studies is important for reducing potential biases and research has demonstrated that between 25% and 50% of DTA studies do not get published in peer-reviewed publications (17). A 2018 study in a large sample of 200 systematic reviews of DTA studies has demonstrated that searching for unpublished studies is not yet standard practice (17).

The evidence for the value of handsearching is currently sparse, with one study of one topic showing that handsearching contributed little (18). It is possible that the topic of the research was well defined and the database searches were exemplary, and therefore the handsearching contribution would be different in other topics (18). More evidence is required on the yield and value of handsearching. Where a topic is published in journals that are not indexed in bibliographic databases, handsearching can still serve a purpose, but this needs to be evaluated question by question.

The Cochrane DTA Handbook notes the lack of evidence on the value of citation searching (forwards and backwards), web searching and grey literature searching as part of DTA study identification (3). The Handbook notes that all of these methods have potential usefulness as supplemental approaches to searching to identify DTA studies (3). Searching preprints may be useful but should be undertaken with caution since they have not yet been peer reviewed (3).

Designing search strategies

Search strategies should be designed to be highly sensitive using a wide variety of search terms, both text words and subject indexing, to ensure that the many ways that a test may be described feature in the search (2).  Information specialists should be aware of the weaknesses of reporting in titles and abstracts of diagnostic accuracy studies. Research into the quality of reporting in both primary DTA studies and systematic reviews of DTA studies (DTA SRs) found underreporting of critical items related to study design and methods, including the failure to identify publications as DTA studies (19-22).. In addition, the application of available indexing may not be consistent in databases and should not be relied upon. One study reported that the sensitivity of three key Emtree headings, including the checktag ‘diagnostic test accuracy study’, was found to be individually below 50%, only achieving 72.7% when used together (23).

The search should reflect some, but not necessarily all, of the key concepts of the review (2,3). The search is likely to capture the index test being investigated and the target condition being diagnosed (2,3,19). A third set of terms can be considered to capture the patient description or the reference standard. The development of search strategies for DTA studies can be challenging and may involve several iterations to reach a strategy that captures the complex way records may present concepts of diagnosis (2,3). Cochrane Reviews of diagnostic test accuracy studies and the 2008 Cochrane Handbook provide examples of search approaches for these, often complex, topics. Strategies may include both general terms (such as the generic type of diagnostic method, for example dipsticks) and specific terms such as named dipstick tests (2).

Information specialists may be aware of the growing interest in automation software to improve efficiencies in evidence synthesis production. Text-mining tools can analyse a large amount of text in seconds to identify frequently used words and/or indexing terms, which can assist in the development of search strategies. Owing to the many challenges caused by incomplete reporting of DTA studies discussed above, there is emerging evidence to support the use of text-mining tools in the development of DTA searches. O’Keefe et al (2022) reported a case study that evaluated 16 text-mining products in several domains, one being its contribution to identifying studies (24). Authors reported that 11 relevant, previously unidentified DTA publications were captured in search results because of text-mining applications. Of interest to searchers, two open access applications, Text Analyzer and Yale MeSH Analyzer, scored the highest for their ease of use and contribution to identifying relevant articles (24). Having a set of key studies one would expect to be captured in search results to analyse with text mining tools may aid in ensuring relevant terms and subject headings are identified.

There are many published methodological search filters designed to capture studies of diagnostic test accuracy and that include test measurement terms such as sensitivity and accuracy (25). The evidence, however, on the performance of DTA search filters suggests that combining filters with a search for a population and an index test is likely to miss relevant studies  (26–29). Search filters for DTA studies do not seem to perform consistently and may result in unacceptable reductions in sensitivity (25–31). One small study in psychometric tests used to diagnose postpartum depression highlights another concern with DTA filters: psychometric tests are often associated with reliability and validity and not conventional diagnostic accuracy terms like sensitivity and specificity (32).

Some studies have found that there may be instances where these methodological filters could be used, but these are not within the context of information retrieval to produce health technology assessments (33-34).  When all the research is considered together, current evidence suggests that for search strategies designed to support systematic reviews of diagnostic accuracy, if DTA filters are not the only approach, they may be useful as one component of a search strategy which involves several search approaches:  a “multi-stranded” approach involves multiple queries run sequentially and using different combinations of concepts. Search filters can be identified from the InterTASC Information Specialists' Sub-Group (ISSG) Search Filter Resource (25).

Subheadings (floating subheadings and subheadings attached to the index test or the target condition) may be a helpful component of the search strategy (2).

Attention to the proper translation of DTA search strategies into subsequent databases is important. To illustrate, a 2019 study identified suboptimal translation of MEDLINE DTA search strategies into the LILACS database and provides detailed guidance on how to search for DTA studies in LILACS (35).

Developing a search strategy can be iterative and complex and it can be helpful to have topic experts to review samples of search results for relevance and it is always helpful to be able to test retrieval against sets of known relevant records.

Although date limits are a decision for the overall review question and are just implemented in the search if they are selected, researchers should note Furuya-Kanamori’s paper that suggests, in the context of rapid reviews that restricting the search date to the recent 10–15 years does not harm the robustness of rapid reviews (36).

Reporting Search Strategy Methods

The PRISMA-DTA Checklist and PRISMA-DTA for Abstracts Checklist were designed to capture critical design elements unique to DTA studies, with evidence supporting PRISMA-DTA  use over PRISMA-2020 (22). Three studies assessing the "completeness" and quality of reporting of DTA SRs found although information sources searched and search date are often properly reported, there is suboptimal description of search strategies (22, 37-38).  When reporting search methods in DTA SR abstracts, one study of DTA SRs in cardiovascular diseases recommended better adherence to the PRISMA-DTA for Abstracts Checklist when reporting the key databases searched and the last search date (20).


We have used the searching chapters of the two editions of the Cochrane DTA Handbook (2,3) as our baseline and SuRe Info appraisals have only been prepared for recently identified studies and studies identified between the two editions.

Reference list

(1)        EUnetHTA. EUnetHTAWork Package 4. HTA Core Model® for Diagnostic Technologies v. 1.0r [Internet]. EUnetHTA; 2008.

(2)       de Vet HCW, Eisinga A, Riphagen II, Aertgeerts B, Pewsner D. Searching for studies. In: Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy version 04 [Internet]. Cochrane; 2008.

(3)        Spijker R, Dinnes J, Glanville J, Eisinga A. Searching for and selecting studies. In: Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy version 2 [Internet]. London: Cochrane; 2022.

(4)        Olsen M, Zhelev Z, Hunt H, Peters JL, Bossuyt P, Hyde C. Use of test accuracy study design labels in NICE’s diagnostic guidance. Diagn Progn Res. 2019;3(1):17. [Publication appraisal]

(5)        Cohen JF, Korevaar DA, Bossuyt PM. Diagnostic accuracy studies need more informative abstracts. Eur J Clin Microbiol Infect Dis. 2019;38(8):1383–5. [Publication appraisal]

(6)        Whiting P, Westwood M, Burke M, Sterne J, Glanville J. Systematic reviews of test accuracy should search a range of databases to identify primary studies. J Clin Epidemiol. 2008;61(4):357.e1-357.e10.

(7)        van Enst WA, Scholten RJPM, Whiting P, Zwinderman AH, Hooft L. Meta-epidemiologic analysis indicates that MEDLINE searches are sufficient for diagnostic test accuracy systematic reviews. J Clin Epidemiol. 2014;67(11):1192–9.  [Publication appraisal]

(8)        Rice DB, Kloda LA, Levis B, Qi B, Kingsland E, Thombs BD. Are MEDLINE searches sufficient for systematic reviews and meta-analyses of the diagnostic accuracy of depression screening tools? A review of meta-analyses. J Psychosom Res. 2016;87:7–13. [Publication appraisal]

(9)        Preston L, Carroll C, Gardois P, Paisley S, Kaltenthaler E. Improving search efficiency for systematic reviews of diagnostic test accuracy: an exploratory study to assess the viability of limiting to MEDLINE, EMBASE and reference checking. Syst Rev. 2015;4(1):82. [Publication appraisal]

(10) Cohen JF, Korevaar DA, Wang J, Spijker R, Bossuyt PM. Should we search Chinese biomedical databases when performing systematic reviews? Syst Rev. 2015;4(1):23. [Publication appraisal]

(11) Kaizik MA, Hancock MJ, Herbert RD. DiTA: a database of diagnostic test accuracy studies for physiotherapists. J Physiother. 2019;65(3):119–20.

(12) Kaizik MA, Hancock MJ, Herbert RD. A description of the primary studies of diagnostic test accuracy indexed on the DiTA database. Physiother Res Int [Internet]. 2020 Oct [cited 2022 Oct 5];25(4). [Publication appraisal]

(13) Glanville J. Supplementary material: appendix of resources. In: Higgins JPT, Thomas J, Chandler J, Cumpston MS, Li T, Page MJ, Welch VA (eds). Cochrane Handbook for Systematic Reviews of Interventions Version 6.3 (updated February 2022). Cochrane, 2022. 

(14) Korevaar DA, Bossuyt PMM, Hooft L. Infrequent and incomplete registration of test accuracy studies: analysis of recent study reports. BMJ Open. 2014;4(1):e004596. [Publication appraisal]

(15) Zarei F, Zeinali-Rafsanjani B. Assessment of Adherence of Diagnostic Accuracy Studies Published in Radiology Journals to STARD Statement Indexed in Web of Science, PubMed & Scopus in 2015. J Biomed Phys Eng. 2018;8(3):311–24. [Publication appraisal]

(16) Korevaar DA, Hooft L, Askie LM, Barbour V, Faure H, Gatsonis CA, et al. Facilitating Prospective Registration of Diagnostic Accuracy Studies: A STARD Initiative. Clin Chem. 2017;63(8):1331–41. [Publication appraisal]

(17) Korevaar DA, Salameh J, Vali Y, Cohen JF, McInnes MDF, Spijker R, et al. Searching practices and inclusion of unpublished studies in systematic reviews of diagnostic accuracy. Res Synth Methods. 2020;11(3):343–53. [Publication appraisal]

(18) Glanville J, Cikalo M, Crawford F, Dozier M, McIntosh H. Handsearching did not yield additional unique FDG-PET diagnostic test accuracy studies compared with electronic searches: a preliminary investigation. Res Synth Methods. 2012;3(3):202–13. [Publication appraisal]

(19) Korevaar DA, Cohen JF, Hooft L, Bossuyt PMM. Literature survey of high-impact journals revealed reporting weaknesses in abstracts of diagnostic accuracy studies. J Clin Epidemiol. 2015;68(6):708–15. [Publication appraisal]

(20) Pagkalidou E, Anastasilakis DA, Kokkali S, Doundoulakis I, Tsapas A, Dardavessis T, et al. Reporting completeness in abstracts of systematic reviews of diagnostic test accuracy studies in cardiovascular diseases is suboptimal. Hellenic J Cardiol. 2022;65:25–34. [Publication appraisal]

(21) Thompson G, Zhelev Z, Hunt H, Hyde C. It was not easy to identify the study design from the title and abstract of articles indexed as diagnostic (test) accuracy studies in EMBASE in 2012 and 2019. J Clin Epidemiol. 2022;144:102–10. [Publication appraisal]

(22) Li Q, Hou W, Li L, et al Measuring quality of reporting in systematic reviews of diagnostic test accuracy studies in medical imaging: comparison of PRISMA-DTA and PRISMA. Ultrasound Obstet Gynecol. 2023;61(2):257-266. [Publication appraisal]

(23) Gurung P, Makineli S, Spijker R, Leeflang MMG. The Emtree term “diagnostic test accuracy study” retrieved less than half of the diagnostic accuracy studies in Embase. J Clin Epidemiol. 2020;126:116–21. [Publication appraisal]

(24) O’Keefe H, Rankin J, Wallace S, Beyer F. Investigation of text-mining methodologies to aid the construction of search strategies in systematic reviews of diagnostic test accuracy-a case study. Res Synth Methods. 2022 Jul 15. [Publication appraisal]

(25) InterTASC Information Specialists’ SubGroup. ISSG Search Filters Resource [Internet]. 2022 [cited 2023 April 14].

(26) Leeflang MMG, Scholten RJPM, Rutjes AWS, Reitsma JB, Bossuyt PMM. Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies. J Clin Epidemiol. 2006;59(3):234–40.

(27) Ritchie G, Glanville J, Lefebvre C. Do published search filters to identify diagnostic test accuracy studies perform adequately? Health Inf Libr J. 2007;24(3):188–92.

(28) Whiting P, Westwood M, Beynon R, Burke M, Sterne JA, Glanville J. Inclusion of methodological filters in searches for diagnostic test accuracy studies misses relevant studies. J Clin Epidemiol. 2011;64(6):602–7.

(29) Beynon R, Leeflang MMG, McDonald S, Eisinga A, Mitchell RL, Whiting P, et al. Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE. Cochrane Database Syst Rev [Internet]. 2013 [cited 2022 Oct 5];(9). [Publication appraisal]

(30) Yao X, Vella E, Brouwers M. How to conduct a high-quality systematic review on diagnostic research topics. Surg Oncol. 2018;27(1):70–5. [Publication appraisal]

(31) Kataoka Y, Taito S, Yamamoto N, So R, Tsutsumi Y, Anan K, et al. An open competition involving thousands of competitors failed to construct useful abstract classifiers for new diagnostic test accuracy systematic reviews. Res Synth Methods. 2023 Sep;14(5):707-717

(32) Mann R, Gilbody SM. Should methodological filters for diagnostic test accuracy studies be used in systematic reviews of psychometric instruments? a case study involving screening for postnatal depression. Syst Rev. 2012;1(1):9. [Publication appraisal]

(33) Rogerson TE, Ladhani M, Mitchell R, Craig JC, Webster AC. Efficient strategies to find diagnostic test accuracy studies in kidney journals: Finding nephrology diagnostic studies. Nephrology. 2015;20(8):513–8. [Publication appraisal]

(34) Huang Y, Yang Z, Wang J, Zhuo L, Li Z, Zhan S. Performance of search strategies to retrieve systematic reviews of diagnostic test accuracy from the Cochrane Library: Performance of search strategies. J Evid-Based Med. 2016;9(2):77–83. [Publication appraisal]

(35) Pereira RA, Puga ME dos S, Atallah ÁN, Macedo EC, Macedo CR. LILACS search strategy for systematic reviews of diagnostic test accuracy studies. Health Inf Libr J. 2019;36(3):223–43. [Publication appraisal]

(36) Furuya-Kanamori L, Lin L, Kostoulas P, Clark J, Xu C. Limits in the search date for rapid reviews of diagnostic test accuracy studies. Research Synthesis Methods 2023;14(2):172-179. DOI: 10.1002/JRSM.1598.

(37) Salameh JP, McInnes MDF, Moher D, Thombs BD, McGrath TA, Frank R, et al. Completeness of Reporting of Systematic Reviews of Diagnostic Test Accuracy Based on the PRISMA-DTA Reporting Guideline. Clin Chem. 2019;65(2):291–301. [Publication appraisal]

(38) Kim W, Kim JH, Cha YK, Chong S, Kim TJ. Completeness of Reporting of Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy (DTA) of Radiological Articles Based on the PRISMA-DTA Reporting Guideline. Acad Radiol. 2022;S1076633222002057. [Publication appraisal]

How to cite this chapter:

Glanville J, Higgins C. Diagnostic accuracy.  Last updated 7 December 2023. In: SuRe Info: Summarized Research in Information Retrieval for HTA. Available from: https://www.sure-info.org//diagnostic-accuracy

Copyright: the authors