Boynton: appraisal

This appraisal is for Boynton J, Glanville J, McDaid D, Lefebvre C. Identifying systematic reviews in MEDLINE: developing an objective approach to search strategy design. Journal of Information Science 1998;24(3):137-54.

This appraisal was prepared by Sue Bayliss

Information and Methodological Issues

Categorisation Issues

Detailed information, as appropriate

A. Information

A.1 State the author's objective

To use word frequency analysis to design a highly sensitive search strategy to retrieve systematic reviews and meta-analyses from MEDLINE.

A.2 State the focus of the search

[] Sensitivity-maximising

[] Precision-maximising

[] Specificity-maximising

[] Balance of sensitivity and specificity / precision

[] Other

A.3. Database(s) and search interface(s).

MEDLINE(Ovid).

A.4.Describe the methodological focus of the filter (e.g. RCTs).

Systematic reviews and meta-analyses.

A.5 Describe any other topic that forms an additional focus of the filter (e.g. clinical topics such as breast cancer, geographic location such as Asia or population grouping such as paediatrics).

A.6 Other obervations

B. Identification of a gold standard (GS) of known relevant records

B. 1 Did the authors identify one or more gold standards (GSs)?nown relevant records

One

Quasi-gold standard (p141)

B.2 How did the authors identify the records in each GS? wn relevant records

Handsearch of 6 high impact factor journals and electronic search of MEDLINE (Ovid).

B.3 Report the dates of the records in each GS. wn relevant records

1992-1995

B.4 What are the inclusion criteria for each GS? relevant records

Used CRD’s definition of systematic reviews.

B.5 Describe the size of each GS and the authors’ justification, if provided (for example the size of the gold standard may have been determined by a power calculation) antcords

288 papers. 90% from handsearch and10% from MEDLINE search.

B.6 Are there limitations to the gold standard(s)? ntcords

Yes

Relatively small and covers limited time period.

B.7 How was each gold standard used? cords

[x ] to identify potential search terms

[x ] to derive potential strategies (groups of terms)

[x ] to test internal validity

[x ] to test external validity

[ ] other, please specify

B.8 Other observations. cords

C. How did the researchers identify the search terms in their filter(s) (select all that apply)?

C.1 Adapted a published search strategy.

C.2 Asked experts for suggestions of relevant terms.

C.3 Used a database thesaurus.

C.4 Statistical analysis of terms in a gold standard set of records (see B above).

Yes

Word frequency analysis – appearance of words in specific fields (ti, ab and subject indexing).

C.5 Extracted terms from the gold standard set of records (see B above).

C.6 Extracted terms from some relevant records (but not a gold standard).

C.7 Tick all types of search terms tested.

[x] subject headings

[x] text words (e.g. in title, abstract)

[x] publication types

[ ] subheadings

[ ] check tags

[ ] other, please specify

C.8 Include the citation of any adapted strategies.

C.9 How were the (final) combination(s) of search terms selected?

The list of terms was sorted by into two ranked tables giving the best results for sensitivity and precision respectively.

C.10 Were the search terms combined (using Boolean logic) in a way that is likely to retrieve the studies of interest?

Yes

Although it is not explicitly stated that this is done.

C.11 Other observations.

D. Internal validity testing (This type of testing is possible when the search filter terms were developed from a known gold standard set of records).

D.1 How many filters were tested for internal validity? cords).

Strategy A: Higher sensitivity low precision single term.

Strategy B: High sensitivity low precision 10 terms.

Strategy C: High sensitivity low precision 6 terms.

Strategy D: Low sensitivity higher precision 3 term.

Strategy E: “Busy searcher” strategy High precision single term.

Strategy F: Medium sensitivity medium precision.

Strategy H: High sensitivity single term. Strategy J: “Thorough” searcher strategy (highly sensitive).

Further test strategies (not based on word frequency generation):

Strategy K: Lower sensitivity higher precision.

Strategy L: High sensitivity lower precision. Strategy M: Medium sensitivity medium precision.

D.2 Was the performance of the search filter tested on the gold standard from which it was derived?ds).

Yes for all strategies

D.3 Report sensitivity data (a single value, a range, ‘Unclear’* or ‘not reported’, as appropriate). *Please describe. ds).

Strategy A: 66%

Strategy B: 95%

Strategy C: 92%

Strategy D: 39%

Strategy E: 29%

Strategy F: 61%

Strategy H: 98%

Strategy J: 98%

Test strategies not based on word frequency generation:

Strategy K: 55%

Strategy L: 89%

Strategy M: 58%

D.4 Report precision data (a single value, a range, ‘Unclear’* or ‘not reported’ as appropriate). *Please describe. ).

Strategy A: 26%

Strategy B: 12%

Strategy C: 23%

Strategy D: 49%

Strategy E: 79%

Strategy F: 42%

Strategy H: 19%

Strategy J: 20%

Test strategies not based on word frequency generation:

Strategy K: 71%

Strategy L: 31%

Strategy M: 37%

D.5 Report specificity data (a single value, a range, ‘Unclear’* or ‘not reported’ as appropriate). *Please describe

Not reported

D.6 Other performance measures reported.

D.7 Other observations.

E. External validity testing (This section relates to testing the search filter on records that are different from the records used to identify the search terms).

E.1 How many filters were tested for external validity on records different from those used to identify the search terms?

The filters were tested on the same dataset as was used to derive the filter (see internal validation).

E.2 Describe the validation set(s) of records, including the interface.

For each filter report the following information.

E.3 On which validation set(s) was the filter tested?

E.4 Report sensitivity data for each validation set (a single value, a range or ‘Unclear’ or ‘not reported’, as appropriate).

E.5 Report precision data for each validation set (report a single value, a range or ‘Unclear’ or ‘not reported’, as appropriate).

E.6 Report specificity data for each validation set (a single value, a range or ‘Unclear’ or ‘not reported’, as appropriate).

E.7 Other performance measures reported.

E.8 Other observations

F. Limitations and Comparisons

F.1 Did the authors discuss any limitations to their research?

Yes

Using high impact journals may not provide a representative sample, e.g. these would usually contain papers with a better standard of study description.

Limited years chosen (1992-1995).

CRD definition of systematic review used in inclusion/exclusion may be too demanding.

Quick assessment process used during inclusion/exclusion may not be sufficiently detailed.

More sophisticated frequency analysis software than was used would cope better with problems of hyphenation and phrases.

F.2 Are there other potential limitations to this research that you have noticed?

Yes

Same dataset was used to derive and validate strategies. Could introduce bias as strategies may tend to perform better on set of records from which they were derived (White et al).

F.3 Report any comparisons of the performance of the filter against other relevant published filters (sensitivity, precision, specificity or other measures).

Hunt and McKibbon full:

Sensitivity 41% Precision 75%

Hunt and McKibbon brief:

Sensitivity 40% Precision 75%

CRD full:

Sensitivity 84% Precision 31%

CRD brief:

Sensitivity 41% Precision 64%

F.4 Include the citations of any compared filters.

Hunt DL, McKibbon KA. Locating and appraising systematic reviews. Annals of Internal Medicine 1997;126:532-538.

NHS Centre for Reviews and Dissemination. Undertaking systematic reviews of research on effectiveness: CRD guidelines for those carrying out or commissioning reviews. York: NHS CRD, 1996.

F.5 Other observations and / or comments.

G. Other comments. This section can be used to provide any other comments. Selected prompts for issues to bear in mind are given below.

G.1 Have you noticed any errors in the document that might impact on the usability of the filter?

NHS Centre for Reviews and Dissemination Undertaking systematic reviews of research on effectiveness: CRD guidelines for those carrying out or commissioning reviews (NHS CRD York 1996)

NB the reference number for this paper as the source to locate the CRD filters in the text is incorrectly cited on p 146 as [3] whereas it should be [2]

G.2 Are there any published errata or comments (for example in the MEDLINE record)?

G.3 Is there public access to pre-publication history and / or correspondence?

G.4 Are further data available on a linked site or from the authors?

G.5 Include references to related papers and/or other relevant material.

G.6. Other comments