Appraisal of: Methley AM, Campbell S, Chew-Graham C, McNally R, Cheraghi-Sohi S. PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Services Research. 2014; 14:579.
Reviewer(s):
Andrew Booth
Full Reference:
Methley AM, Campbell S, Chew-Graham C, McNally R, Cheraghi-Sohi S. PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Services Research. 2014; 14:579. doi:10.1186/s12913-014-0579-0
Short description:
This comparative methodological study tested the effectiveness of three search tools for identifying qualitative research in systematic reviews. The authors compared PICO (Population, Intervention, Comparison, Outcome), PICOS (PICO plus Study design), and SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) tools using identical search terms across three databases: Ovid MEDLINE, Ovid EMBASE, and EBSCO CINAHL Plus. The search focused on qualitative studies investigating healthcare experiences of people with Multiple Sclerosis.
PICO searches generated 23,758 hits and identified 18 relevant articles with highest sensitivity but lowest specificity. SPIDER searches generated only 239 hits and identified 13 relevant articles, demonstrating highest specificity but missing 5 relevant studies identified by PICO. PICOS generated 448 hits and identified 13 relevant articles, showing intermediate performance. The authors calculated sensitivity and specificity for each tool across databases and found that CINAHL Plus consistently performed better than MEDLINE or EMBASE for identifying qualitative research. The study recommends using PICO for comprehensive searches where time allows, and PICOS where resources are limited, while noting SPIDER's potential but current limitations due to lower sensitivity.
Limitations stated by the author(s):
The authors acknowledge that while the study provides a real-world example of evidence searching, it only addresses one specific topic (Multiple Sclerosis healthcare experiences). They note this limits generalizability and recommend further research to test these search tools against a wider variety of narrative review and meta-synthesis topics to determine if findings hold across different subject areas.
Limitations stated by the reviewer(s):
Methodological Limitations:
The study has several methodological weaknesses that may affect the validity and reliability of findings. First, the initial screening of titles and abstracts was conducted by a single reviewer, introducing potential selection bias and reducing reproducibility. Without independent dual screening or inter-rater reliability testing, there is no way to assess consistency in applying inclusion criteria. This is particularly problematic given the subjective nature of determining whether qualitative studies meet criteria for "experiences of healthcare services."
Second, the inclusion criteria regarding studies with subthemes about healthcare (where only part of the paper addressed the research question) may have disadvantaged SPIDER more than PICO. The authors acknowledge this but do not adequately explore how this methodological decision affected comparative performance. A more refined analysis distinguishing between studies fully focused on the research question versus those with relevant subthemes would have strengthened the comparison.
Analysis and Interpretation Limitations:
The study lacks depth in analyzing why different tools missed specific articles. While Table 6 shows which articles were identified by which tools, there is insufficient exploration of the characteristics of missed studies, indexing patterns, or specific search term failures. Understanding why SPIDER missed five articles that PICO identified would provide valuable insights for improving search strategies. Similarly, examining the one article identified in CINAHL Plus but not by any search tool reveals gaps in the methodology that warrant explanation.
The authors do not adequately address the trade-offs between comprehensiveness and efficiency. While they mention time spent screening (weeks for PICO versus hours for PICOS/SPIDER), they provide no quantitative data on actual time investment, cost implications, or diminishing returns. For teams with limited resources, more detailed resource analysis would be valuable for decision-making.
Generalizability Concerns:
Beyond the single-topic limitation acknowledged by authors, the study's generalizability is further restricted by its focus on a well-defined clinical population (Multiple Sclerosis) with established qualitative literature. Performance of these tools may differ substantially for emerging topics, interdisciplinary research questions, or areas with less developed qualitative evidence bases. The study also exclusively examined healthcare experiences, which may not represent other types of qualitative research questions such as those exploring processes, meanings, or theories.
Reporting and Transparency Issues:
While the authors provide detailed search strategies in tables, the article lacks transparency regarding specific decisions made during screening. For instance, the criteria for determining whether a study section or subtheme on healthcare services was substantial enough for inclusion are not clearly operationalized. Additionally, although two reviewers independently reviewed full-text articles, the process for resolving disagreements and the extent of disagreement are not reported, making it difficult to assess the rigor of study selection.
Missing Considerations:
The study does not explore several important aspects of search strategy development. First, no consideration is given to supplementary search methods such as citation searching, hand searching, or grey literature, which are standard components of comprehensive systematic reviews. Second, the potential for combining elements of different tools or developing hybrid approaches is not explored. Third, the impact of database-specific indexing practices and controlled vocabulary on tool performance deserves more attention. Finally, the study does not consider how search strategies might be optimized iteratively based on initial results.
Statistical Limitations:
The sensitivity and specificity calculations, while appropriate, are based on relatively small numbers of relevant articles (18 total), limiting statistical power and precision. Confidence intervals around these estimates would have been helpful for interpretation. Additionally, the study does not employ any statistical tests to determine whether differences between tools are significant, relying instead on descriptive comparisons.
Overall Assessment:
Despite these limitations, the study makes a valuable contribution by providing empirical comparison of search tools specifically for qualitative research. However, the single-reviewer screening, limited topic scope, and lack of detailed analysis of why tools performed differently reduce confidence in the generalizability of findings. The recommendation to use PICO for comprehensive searches appears sound, but the dismissal of SPIDER may be premature given acknowledged issues with qualitative study indexing that may improve over time.
Study Type:
Methodological comparison study
Related Chapters:
Tags:
• Search tools
• PICO
• PICOS
• SPIDER
• Qualitative research
• Systematic reviews
• Literature searching
• Databases
• MEDLINE
• EMBASE
• CINAHL
• Sensitivity
• Specificity
• Multiple Sclerosis
• Healthcare experiences
• Search strategies
• Evidence synthesis
• Information retrieval
• B. Designing strategies - general