Wilczynski appraisal

This appraisal is for Wilczynski NL, Haynes RB. Developing optimal search strategies for detecting clinically sound prognostic studies in MEDLINE: an analytic survey. BMC Medicine 2004;2:23.

This appraisal was prepared by Anne Fry-Smith

Information and Methodological Issues

Categorisation Issues

Detailed information, as appropriate

A. Information

A.1 State the author's objective

To determine how well various methodologic textwords, Medical Subject Headings and their Boolean combinations retrieve methodologically sound literature on the prognosis of health disorders in MEDLINE.

A.2 State the focus of the search

[ ] Sensitivity-maximising

[ ] Precision-maximising

[ ] Specificity-maximising

[x] Balance of sensitivity and specificity / precision

[ ] Other

A.3. Database(s) and search interface(s).

MEDLINE (Ovid).

A.4.Describe the methodological focus of the filter (e.g. RCTs).

Prognostic studies.

A.5 Describe any other topic that forms an additional focus of the filter (e.g. clinical topics such as breast cancer, geographic location such as Asia or population grouping such as paediatrics).

All topics

A.6 Other obervations

B. Identification of a gold standard (GS) of known relevant records

B. 1 Did the authors identify one or more gold standards (GSs)?nown relevant records

B.2 How did the authors identify the records in each GS? wn relevant records

6 research assistants hand searched 161 journal titles for the year 2000.

B.3 Report the dates of the records in each GS. wn relevant records

2000

B.4 What are the inclusion criteria for each GS? relevant records

The methodological criteria applied for prognostic studies were:

Inception cohort of individuals all initially free of outcome of interest; follow-up of at least 80% until the occurrence of a major study end point or to the end of the study; and analysis consistent with study design.

B.5 Describe the size of each GS and the authors’ justification, if provided (for example the size of the gold standard may have been determined by a power calculation) antcords

1547

B.6 Are there limitations to the gold standard(s)? ntcords

Yes

It contains records only published in 1 year.

B.7 How was each gold standard used? cords

[x] to identify potential search terms

[ ] to derive potential strategies (groups of terms)

[x] to test internal validity

[x] to test external validity

[ ] other, please specify

B.8 Other observations. cords

C. How did the researchers identify the search terms in their filter(s) (select all that apply)?

C.1 Adapted a published search strategy.

Not clear how the authors did this. Terms may have been derived from the GS but the paper doesn’t say this explicitly.

C.2 Asked experts for suggestions of relevant terms.

Yes

C.3 Used a database thesaurus.

C.4 Statistical analysis of terms in a gold standard set of records (see B above).

C.5 Extracted terms from the gold standard set of records (see B above).

C.6 Extracted terms from some relevant records (but not a gold standard).

C.7 Tick all types of search terms tested.

[x] subject headings

[x] text words (e.g. in title, abstract)

[x] publication types

[x] subheadings

[ ] check tags

[ ] other, please specify

C.8 Include the citation of any adapted strategies.

C.9 How were the (final) combination(s) of search terms selected?

The authors present the single terms with the best sensitivity (with specificity 50% or greater), best specificity (with sensitivity 50% or greater) and best optimization of sensitivity and specificity. The paper does not describe how larger strategies were achieved.

C.10 Were the search terms combined (using Boolean logic) in a way that is likely to retrieve the studies of interest?

Search terms were combined using OR.

C.11 Other observations.

D. Internal validity testing (This type of testing is possible when the search filter terms were developed from a known gold standard set of records).

D.1 How many filters were tested for internal validity? cords).

D.2 Was the performance of the search filter tested on the gold standard from which it was derived?ds).

D.3 Report sensitivity data (a single value, a range, ‘Unclear’* or ‘not reported’, as appropriate). *Please describe. ds).

D.4 Report precision data (a single value, a range, ‘Unclear’* or ‘not reported’ as appropriate). *Please describe. ).

D.5 Report specificity data (a single value, a range, ‘Unclear’* or ‘not reported’ as appropriate). *Please describe. ).

D.6 Other performance measures reported.

D.7 Other observations.

E. External validity testing (This section relates to testing the search filter on records that are different from the records used to identify the search terms).

E.1 How many filters were tested for external validity on records different from those used to identify the search terms?

A: Best single term sensitivity, specificity & best optimization of sensitivity & specificity.

B: Combination of terms best sensitivity.

C: Combination of terms best specificity.

D: Combination of terms best optimization of sensitivity & specificity.

E.2 Describe the validation set(s) of records, including the interface.

40% of the gold standard.

For each filter report the following information.

E.3 On which validation set(s) was the filter tested?

On the 40% of the gold standard which was the validation set.

E.4 Report sensitivity data for each validation set (a single value, a range or ‘Unclear’ or ‘not reported’, as appropriate).

A: 73.4%

B: 82.3%

C: 48.1%

D: 73.4%

E.5 Report precision data for each validation set (report a single value, a range or ‘Unclear’ or ‘not reported’, as appropriate).

A: 1.4%

B: 1.6%

C: 3.2%

D: 1.8%

E.6 Report specificity data for each validation set (a single value, a range or ‘Unclear’ or ‘not reported’, as appropriate).

A: 79.1%

B: 79.7%

C: 94.2%

D: 84.1%

E.7 Other performance measures reported.

Accuracy

A: 79.1%

B: 79.7%

C: 94.0%

D: 84.0%

E.8 Other observations

The difference and confidence intervals around the difference between the performance in test and validation sets are presented.

In most instances the differences were nonexistent or very small.

F. Limitations and Comparisons

F.1 Did the authors discuss any limitations to their research?

None mentioned.

F.2 Are there other potential limitations to this research that you have noticed?

F.3 Report any comparisons of the performance of the filter against other relevant published filters (sensitivity, precision, specificity or other measures).

F.4 Include the citations of any compared filters.

F.5 Other observations and / or comments.

At the time of publication the authors were testing the filters by combining them with disease specific terms in mental health, infectious disease and TB.

G. Other comments. This section can be used to provide any other comments. Selected prompts for issues to bear in mind are given below.

G.1 Have you noticed any errors in the document that might impact on the usability of the filter?

G.2 Are there any published errata or comments (for example in the MEDLINE record)?

Link

G.3 Is there public access to pre-publication history and / or correspondence?

Link

G.4 Are further data available on a linked site or from the authors?

Yes. A list of the 161 journals hand searched to obtain the gold standard are available from the authors.

The 161 journals hand searched had methodologic criteria applied to each item to determine if the article was methodologically sound for 7 purpose categories. All category definitions and corresponding methodologic criteria are outlined in a previous paper (cited).

G.5 Include references to related papers and/or other relevant material.

Wilczynski NL et al. Enhancing retrieval of best evidence for health care from bibliographic databases: calibration of the hand search literature. Medinfo 2001;10(1):390-3.

G.6. Other comments

Page updated

Report abuse