Appraisal of: Horton J, Kaunelis D, Rabb D, Smith A. What's beyond the core? Database coverage in qualitative information retrieval. J Med Libr Assoc. 2025;113(1):49-57.
Reviewer(s):
Andrew Booth
Full Reference:
Horton J, Kaunelis D, Rabb D, Smith A. What's beyond the core? Database coverage in qualitative information retrieval. J Med Libr Assoc. 2025;113(1):49-57
Short description:
This study investigates the effectiveness of five commonly used bibliographic databases for retrieving qualitative studies in the context of Health Technology Assessment (HTA) research, with particular focus on rapid reviews. The databases examined were MEDLINE, CINAHL, PsycINFO, Scopus, and Web of Science Core Collection. The study employs a two-part methodology to address the question of database selection for qualitative evidence synthesis.
In the first part of the study, the authors compiled a comprehensive list of journal titles likely to publish qualitative health research using Clarivate Analytics InCites Journal Citation Reports. They searched JCR for subject categories including Anthropology, Cultural Studies, Health Policy & Services, Social Sciences Biomedical, and Social Issues, initially identifying 286 titles. This list was then validated and refined by qualitative researchers at Canada's Drug Agency (formerly CADTH), resulting in a final list of 191 relevant journal titles. The authors used Ulrich's Web to determine which databases indexed each of these 191 journals, calculating both the percentage of total titles held by each database and the number of unique titles (journals indexed in only one database).
The results of this journal coverage analysis revealed that multidisciplinary citation indexes had substantially higher coverage than subject-specific health databases. Web of Science had the highest coverage at 92% of the 191 titles (175 journals), followed by Scopus at 82% (157 journals). Among the subject-specific databases, MEDLINE performed best at 51% (98 journals), followed by CINAHL at 48% (91 journals) and PsycINFO at 38% (73 journals). Notably, neither MEDLINE, CINAHL, nor PsycINFO had any unique holdings when compared against each other and the multidisciplinary indexes. When the authors calculated unique holdings excluding one multidisciplinary index from comparison with the other, Scopus held 26% unique titles (49 journals) and Web of Science held 29% unique titles (55 journals), suggesting substantial potential value in searching these resources in addition to the traditional "core" health databases.
The second part of the study sought to validate whether this theoretical journal coverage advantage would translate into practically useful results. The authors tested Scopus by translating and running search strategies from nine previously published qualitative rapid reviews conducted by CADTH. These covered diverse topics including adverse childhood experiences, prescription drug monitoring, breast cancer surgery, prostatectomy, skin cancer biopsy, robotic surgical systems, and anticoagulant testing. Search strategies from MEDLINE were translated to Scopus syntax, with MeSH headings converted to free-text terms where not already covered in title/abstract searches, since Scopus lacks hierarchical controlled vocabulary. MEDLINE and Embase records were excluded from Scopus results to evaluate Scopus on its own merits rather than as a redundant source. Date and language limits matching the original reviews were applied.
Results from these practical tests revealed that Scopus contributed substantial numbers of additional unique records ranging from 2% to 36% of the original search yields, with a median contribution of 13-15%. However, when these unique Scopus records were screened for inclusion by experienced qualitative researchers, very few proved relevant. Across all nine rapid reviews, only four studies from Scopus were selected for inclusion: one study from the review on adverse childhood experiences (0.33% of 300 Scopus results) and three studies from the review on anticoagulant testing (0.7% of 65 Scopus results). The remaining seven reviews included zero studies from Scopus despite retrieving between 6 and 283 unique records per review.
The authors acknowledge several important limitations, including their decision to exclude Embase (informed by previous literature but representing an evidence gap), the potential underrepresentation of Scopus holdings since the initial journal list was derived from JCR (a Web of Science product), the lack of date range verification for journal holdings in each database, the limited scope of nine test reviews covering a restricted range of topics, the organization-specific context (rapid reviews for HTA), the direct translation approach for search strategies which may not optimize Scopus's unique features, and the absence of hierarchical controlled vocabulary in Scopus which limits precision compared to MeSH-based searching in MEDLINE.
The study concludes that while multidisciplinary citation indexes like Scopus and Web of Science contain unique journal titles and can retrieve additional unique records, the practical value of including these databases in rapid qualitative evidence synthesis remains question-dependent. The authors recommend that searchers consider including Scopus or Web of Science as supplemental sources but acknowledge the significant resource implications of screening potentially large numbers of additional records for minimal return. They suggest that further research is needed to determine optimal search strategies for multidisciplinary indexes and to better understand when their inclusion provides sufficient value to justify the additional screening burden.
Limitations stated by the author(s):
The authors acknowledge multiple important limitations. First, they excluded Embase from the analysis based on prior literature suggesting minimal unique contribution, but note there is no consensus on this and their study cannot definitively recommend for or against Embase use. Second, the journal title list, while validated by qualitative researchers, is not exhaustive. Third, because JCR is a Web of Science product that draws only from WoS holdings, Scopus holdings are likely underrepresented in the initial journal list, potentially biasing results against Scopus. Fourth, the use of Ulrich's Web to evaluate database holdings did not account for date ranges of coverage for each journal in each database, meaning "indexed" was treated as binary without verifying completeness of holdings. Fifth, it is difficult to comprehensively determine where qualitative research is published, especially as qualitative studies increasingly appear in otherwise clinical publications. Sixth, the sample of nine qualitative rapid reviews provides limited insight given the restricted range of research topics covered, and database selection effectiveness is heavily dependent on research topic and question. Seventh, the study is biased toward the needs and resources of Canada's Drug Agency, characteristic of HTA organizations but limiting broader generalizability. Eighth, the study focused specifically on rapid reviews with more focused questions and shorter timelines, so findings may not apply to more comprehensive systematic or scoping reviews. Ninth, the decision to directly translate search strings from MEDLINE to Scopus influenced the number and quality of results retrieved, as MeSH terms could not be adequately translated due to Scopus's broad Subject Area controlled vocabulary.
Limitations stated by the reviewer(s):
1. Fundamental methodological limitation: No gold standard or validation set [Measurement Bias; Construct Validity]: The study lacks a true gold standard against which to evaluate database performance. Unlike the Shaw 2004 study which at least attempted to create a proxy "population" by combining results from all strategies tested, this study evaluates databases against an arbitrary list of journals that may or may not represent the actual universe of journals publishing relevant qualitative health research. The 191-journal list was derived from JCR subject categories and validated by researchers at one institution, but there is no external validation that these are actually the "right" journals for qualitative health research. Journals publishing highly relevant qualitative health research might have been excluded because they don't fit the selected JCR categories or weren't known to the CADTH team. The study essentially measures "coverage of journals we think are important" rather than "coverage of journals that actually publish relevant qualitative research retrieved in systematic reviews."
2. Circular logic from using Web of Science product to evaluate Web of Science [Selection Bias; Methodological Flaw]: The authors acknowledge this limitation but it fundamentally undermines the validity of their journal coverage findings. By starting with JCR (which draws from Web of Science holdings) to create the journal list, they virtually guaranteed that Web of Science would show the highest coverage. This is circular reasoning: "We'll evaluate databases by seeing how well they cover journals that we identified using one of the databases we're evaluating." The finding that WoS has 92% coverage while Scopus has 82% is not meaningful evidence that WoS has better coverage—it's evidence that using a WoS product to create your sampling frame will make WoS look better. The authors' acknowledgment of this limitation doesn't address how severely it compromises the study's primary findings.
3. Binary indexing data without date ranges [Incomplete Data; Measurement Error]: The use of Ulrich's Web to determine if journals are "indexed" treats indexing as binary (yes/no) without considering the crucial dimension of temporal coverage. A journal might be indexed in MEDLINE from 2010-present but in Scopus from 1995-present, giving Scopus access to 15 additional years of content. Conversely, a journal might be indexed in Scopus but only for the most recent 3 years, making that "coverage" far less valuable than complete historical indexing. The authors acknowledge this limitation but don't address how it might systematically bias results. Multidisciplinary indexes may show higher journal counts but potentially shallower temporal coverage. Without date range data, the coverage percentages are nearly meaningless for understanding actual retrieval capability.
4. Failure to validate journal list against actual retrieved studies [Ecological Fallacy]: The study assumes that coverage of journals equals ability to retrieve relevant studies, but this is an ecological fallacy. The authors never validated whether their 191 journals actually produced the studies included in qualitative systematic reviews. It's entirely possible that highly productive journals for qualitative health research aren't in their list, while journals on their list rarely publish relevant qualitative research. The disconnect between Part 1 (theoretical journal coverage) and Part 2 (practical retrieval) of the study—where Scopus contributed many unique records but almost no included studies—suggests that journal coverage is a poor proxy for relevant retrieval. The authors should have worked backwards from included studies in systematic reviews to determine which journals actually matter.
5. Extremely limited and potentially unrepresentative test set [External Validity; Sample Size]: Nine rapid reviews from a single HTA organization on primarily surgical/procedural topics is an inadequate sample for drawing general conclusions about database performance for qualitative research. All nine reviews were conducted by the same organization using similar methodologies, potentially introducing systematic biases in topic selection, search strategy construction, or relevance screening. The topics covered (breast cancer surgery, prostatectomy, robotic surgery, skin biopsy) are heavily weighted toward surgical interventions and procedures, which may not be representative of the broader landscape of qualitative health research. Qualitative research on topics like patient experiences of chronic illness, mental health services, public health interventions, or healthcare organization might show very different database performance. The findings cannot be generalized beyond "rapid reviews of surgical interventions conducted by Canadian HTA agencies."
6. Confounding between database coverage and search strategy optimization [Confounding Variable]: The poor performance of Scopus in the practical tests (contributing 4 relevant studies across 918 unique records screened, a yield of 0.4%) could be attributed to either poor database coverage or poor search strategy design. The authors directly translated MEDLINE searches to Scopus without optimization for Scopus's unique features, structure, or vocabulary. They acknowledge that MeSH terms were converted to free-text and that Scopus lacks hierarchical controlled vocabulary, both of which would predictably reduce precision. The study cannot determine whether Scopus genuinely has poor relevant content or whether the search strategies were simply inappropriate for the platform. A fair comparison would require search strategies specifically designed and optimized for each database platform, which would be methodologically challenging but necessary for valid conclusions.
7. No inter-rater reliability for relevance screening [Measurement Reliability]: The screening of Scopus results for relevance was conducted by "one author who is an experienced qualitative researcher" with no indication of reliability checking, second screening, or validation. Screening 918 unique records across nine topics is substantial work, and without verification that screening decisions were reliable, we cannot be confident in the finding that only 4 studies were relevant. Different screeners might have different thresholds for relevance, particularly for qualitative research where inclusion criteria may be more nuanced than for quantitative effectiveness reviews. The study would have been strengthened by having at least a sample of records double-screened to calculate inter-rater reliability (kappa statistic) and resolve disagreements.
8. Incomplete reporting of search strategies and screening decisions [Transparency; Reproducibility]: While the authors state that search strategies are available in supplementary materials and on Open Science Framework, the main text does not provide sufficient detail for readers to evaluate the quality of the Scopus translations or understand screening decisions. For example, which MeSH concepts were added as free-text when not already covered in title/abstract searches? How were complex MeSH heading structures (with subheadings, explosions, and focus) translated? What specific inclusion/exclusion criteria were applied during screening? Were there patterns in why Scopus records were excluded (wrong methodology? wrong topic? duplicates not caught by deduplication?) This level of detail would help readers understand whether the low yield from Scopus reflected genuine low utility or methodological artifacts.
9. Failure to consider cost-benefit analysis or decision threshold [Practical Significance]: The study demonstrates that Scopus adds unique records but very few relevant studies, yet provides no framework for decision-making about when this trade-off is worthwhile. From a practical perspective, is screening 300 additional records to find 1 relevant study (as in the adverse childhood experiences review) worth the time and cost? How should reviewers weigh comprehensive versus pragmatic approaches in rapid reviews? The study would benefit from explicit discussion of the opportunity cost: time spent screening low-yield Scopus results could potentially be spent on other evidence synthesis tasks (citation screening, grey literature searching, expert consultation) that might have higher yield. Without this analysis, readers cannot make informed decisions about resource allocation.
10. Assumption that unique records are desirable without assessing saturation [Conceptual Framework]: The study implicitly assumes that identifying unique records is always valuable, but qualitative evidence synthesis often aims for conceptual saturation rather than exhaustive identification. After including 18 studies in a rapid review (as in the prescription drug monitoring review), would finding additional unique studies from Scopus meaningfully change the findings, or would they simply provide redundant themes already captured? The study doesn't engage with qualitative sampling theory (purposeful sampling, theoretical sampling, saturation) which might suggest that missing some studies is acceptable if conceptual saturation is reached. This is particularly relevant for rapid reviews where comprehensive searching is explicitly traded off against feasibility.
11. Limited exploration of why Scopus records were excluded [Missed Learning Opportunity]: The study reports that very few Scopus records were relevant but doesn't systematically analyze why. Were Scopus records primarily: irrelevant to topic? quantitative rather than qualitative? from disciplines too far removed from health? from lower-quality journals? non-English language sources? conference abstracts rather than full studies? Understanding the composition of the Scopus unique set would provide valuable insights for developing better search strategies or determining when Scopus is more likely to add value. A structured analysis of a sample of excluded Scopus records could have yielded recommendations for subject category limits or other filters to improve precision.
12. Embase exclusion creates significant evidence gap [Incomplete Evidence]: The authors' decision to exclude Embase because prior literature suggested "minimal unique results" means this study cannot contribute to current debates about Embase's value for qualitative searching. Embase has substantial unique content, particularly from European journals and conference abstracts, and its exclusion limits the study's ability to recommend a comprehensive database set. The study can only answer "Is Scopus useful supplementary to MEDLINE/CINAHL/PsycINFO?" but cannot answer the more practical question of "What is the optimal database combination?" Given that many organizations have Embase access, this is a significant limitation. The authors also don't clarify whether Embase records in Scopus were excluded—if Scopus's value partly derives from including Embase content, excluding those records may artificially deflate Scopus's apparent contribution.
13. Single organizational context limits generalizability [External Validity]: All aspects of this study reflect the specific context of Canada's Drug Agency: their access to databases, their topic priorities (surgical interventions), their methodology (rapid reviews with ~9 month timelines), their search strategy conventions, their screening thresholds, and their resource constraints. Different organizations doing different types of reviews (comprehensive systematic reviews, scoping reviews, qualitative meta-syntheses) with different resources, timelines, and quality standards might reach very different conclusions. An organization with access to specialized social science databases might find MEDLINE/CINAHL add little value; an organization doing comprehensive Cochrane reviews might find Scopus more valuable when trying to achieve maximum sensitivity; an organization focused on patient experience research might find the disciplinary breadth of Scopus highly valuable. The findings are probably most applicable to similar HTA organizations doing rapid reviews on medical interventions.
14. Temporal factors not considered [Confounding]: The study doesn't account for temporal aspects of database performance. Have databases' qualitative content holdings changed over time? Scopus and Web of Science have expanded coverage substantially in recent years—would the journal coverage analysis look different if conducted in 2020 versus 2024? Were the nine rapid reviews all conducted around the same time, or spread over years during which database content was changing? The indexing lag (time from publication to database indexing) differs across databases and could affect retrieval for recent publications, particularly relevant in rapid reviews. MEDLINE's qualitative research subject heading was only introduced in 2003 and has evolved—older studies might be poorly indexed. These temporal dynamics could significantly affect database performance but aren't addressed.
15. Publication bias in the sample of tested reviews [Selection Bias]: The nine rapid reviews used for testing were all published CADTH reports. This creates potential selection bias if published reports differ systematically from unpublished reviews or reviews that struggled with evidence identification. Perhaps CADTH publishes reports when adequate evidence is found; reviews where evidence identification was problematic might not be published and thus not included in this study's sample. If Scopus is particularly valuable for topics with sparse literature (where any additional unique study matters more), excluding such reviews from the sample would underestimate Scopus's value. Alternatively, if Scopus is particularly problematic for topics with abundant literature (where unique records are more likely to be redundant), including primarily published reviews might overestimate its utility.
Strengths of the study:
Despite these limitations, the study makes important contributions. It is one of the first to explicitly evaluate multidisciplinary citation indexes for qualitative health research beyond theoretical speculation, providing empirical data on journal coverage. The validation of the journal list by qualitative researchers at an HTA organization adds content expertise to what could otherwise be a purely bibliometric exercise. The two-part design (theoretical journal coverage plus practical testing) is methodologically sophisticated, recognizing that database holdings must translate to practical retrieval. The transparency regarding data availability (Open Science Framework) and the detailed acknowledgment of limitations demonstrate scientific rigor. The sample size of nine reviews, while small, is larger than many database comparison studies and represents real-world searching conducted for actual decision-making needs. The focus on rapid reviews fills an important gap, as most database comparison research focuses on comprehensive systematic reviews. The finding that theoretical coverage advantages don't translate to practical value offers an important cautionary note. Finally, the recommendation for ongoing case-by-case testing and reflective practice represents a pragmatic, learning-oriented approach appropriate for an evolving evidence base.
Study Type:
Database evaluation study / Comparative study
Related Chapters:
Tags:
• Database selection
• Database coverage
• Qualitative research
• Scopus
• Web of Science
• MEDLINE
• CINAHL
• PsycINFO
• Rapid reviews
• Health Technology Assessment
• Journal coverage analysis
• Search strategy testing
• Multidisciplinary databases
• Citation indexes
• Information retrieval
• B. Designing strategies - general
• C. Choosing sources - databases and other tools