Ayiku, 2017 appraisal

This appraisal is for Ayiku, L., Levay, P., Hudson, T., Craven, J., Barrett, E., Finnegan, A. and Adams, R. The MEDLINE UK filter: development and validation of a geographic search filter to retrieve research about the UK from OVID MEDLINE. Health Info Libr J. 2017;34: 200-216.

This appraisal was prepared by Helen Fulbright and Claire Stansfield in December 2022.

Information and Methodological Issues

Categorisation Issues

Detailed information, as appropriate

A. Information

A.1 State the author's objective


To create and validate a geographic search filter that retrieves research about the United Kingdom in systematic literature searches.

A.2 State the focus of the search

[ ] Sensitivity-maximising

[ ] Precision-maximising

[ ] Specificity-maximising

[x] Balance of sensitivity and specificity / precision

[ ] Other


A.3. Database(s) and search interface(s).


Ovid MEDLINE

A.4.Describe the methodological focus of the filter (e.g. RCTs).


To retrieve research about the United Kingdom.

A.5 Describe any other topic that forms an additional focus of the filter (e.g. clinical topics such as breast cancer, geographic location such as Asia or population grouping such as paediatrics).


Not applicable

A.6 Other obervations


None

B. Identification of a gold standard (GS) of known relevant records


B. 1 Did the authors identify one or more gold standards (GSs)?nown relevant records

Five

Yes, there were three gold standard sets for testing recall. Two of these were used for developing and testing the filter and the third was used for external validity for recall.

An additional dataset was used to test and modify the precision of the search. Recall and precision were tested on a 'case study'.

B.2 How did the authors identify the records in each GS? wn relevant records


Studies about the UK which informed National Institute for Health and Care Excellence (NICE) guidance between 2013-2015 were used to create the three gold standard reference sets. In NICE guidance, there are evidence description sections which summarise the included publications and note the geographic setting of each publication. This allowed the authors to identify the records.

The additional dataset to test precision was from one clinical guideline search, and the 'case study' was from a systematic review on a different health-related topic.

B.3 Report the dates of the records in each GS. wn relevant records


The dates of the records for each gold standard set are not explicitly reported, although there is a brief mention that publication dates ranged from 1960-2015, with most references dating from the last 20 years. The publication dates of the NICE guidance used to identify records for each gold standard are as follows:

o GS1 = January to June 2015

o GS2 = January to June 2014

o GS3 = January to June 2013

o The additional dataset is from 2014.

o The case study was published in 2015 and contained research from 2003-2013 (based on the title in the reference)

B.4 What are the inclusion criteria for each GS? relevant records


The only inclusion criteria applied to each GS was that records had to be about research undertaken within the UK or have at least one UK site in the case of multi-centre trials, or UK-focussed publications, in the case of systematic reviews.

If a record was included in the original guidance or systematic review, it was assumed to have a UK setting. The project team did not verify the geographic setting details provided in the evidence description sections.

B.5 Describe the size of each GS and the authors’ justification, if provided (for example the size of the gold standard may have been determined by a power calculation) antcords


GS1 = 418

GS2 = 285

GS3 = 266

The additional dataset contained 24 UK references from a search yield of 6,462 records without applying the filter

The case study contained 79 UK references from MEDLINE from a search yield of 5,281 records without applying the filter

The size of each gold standard was not determined by a power calculation. The numbers used were justified from previous research stating that a set of at least 100 publications is sufficient for search filter development (Sampson, Margaret et al. “An alternative to the hand searching gold standard: validating methodological search filters using relative recall.” BMC Medical Research Methodology. 2006;6: 33. doi:10.1186/1471-2288-6-33)

B.6 Are there limitations to the gold standard(s)? ntcords

Yes

The gold standards were created from articles included in a range of UK healthcare guidelines, which is a strength. The authors acknowledge, however, that the reference sets were created using the relative recall method, so the reference sets are only as good as the sum of the individual searches. Moreover, the methods for determining records relevance to the UK was not reported in the guidance.

B.7 How was each gold standard used? cords

[GS1 ] to identify potential search terms

[ GS1, GS2 and additional trial search] to derive potential strategies (groups of terms)

[GS1, GS2 and additional trial search ] to test internal validity

[GS3, case study] to test external validity

[ ] other, please specify


B.8 Other observations. cords


None

C. How did the researchers identify the search terms in their filter(s) (select all that apply)?


C.1 Adapted a published search strategy.

No


C.2 Asked experts for suggestions of relevant terms.

Unclear


C.3 Used a database thesaurus.

Yes


C.4 Statistical analysis of terms in a gold standard set of records (see B above).

Yes

A frequency counter was used to identify the occurrence of common UK terms, and these are presented in Appendix 3 of the paper.

C.5 Extracted terms from the gold standard set of records (see B above).

Yes


C.6 Extracted terms from some relevant records (but not a gold standard).

No


C.7 Tick all types of search terms tested.

[Yes ] subject headings

[Yes ] text words (e.g. in title, abstract)

[ ] publication types

[ ] subheadings

[No ] check tags

[ ] other, please specify

The strategy also includes text words in the following fields: country of publication, institutional field and journal words.

C.8 Include the citation of any adapted strategies.


Not applicable

C.9 How were the (final) combination(s) of search terms selected?


Search terms were selected by checking up-to-date MeSH terminology; using the geographic setting of publications included in the evidence description sections of NICE guidance; by using knowledge of terms for the UK and knowledge of non-UK terms that contain related UK terms. This was also informed by sampling 163 records that were non-relevant results from the 'additional dataset', to identify common issues that affected precision.

C.10 Were the search terms combined (using Boolean logic) in a way that is likely to retrieve the studies of interest?


Yes

C.11 Other observations.


None

D. Internal validity testing (This type of testing is possible when the search filter terms were developed from a known gold standard set of records).

D.1 How many filters were tested for internal validity? cords).


Three: the draft filter, the modified filter, and the final filter.

D.2 Was the performance of the search filter tested on the gold standard from which it was derived?ds).

Yes


D.3 Report sensitivity data (a single value, a range, ‘Unclear’* or ‘not reported’, as appropriate). *Please describe. ds).


The final filter retrieved all the references in GS1 and GS2 which were identifiable as UK references in the searched fields of their MEDLINE records (see Table 3 in the paper). Sensitivity (recall) was 189 out of 209 references in GS1 development set, 188 out of 209 references in GS1 testing set, and 256 out of 285 references in GS2. The references that were not retrieved did not have anything identifying them as UK references in the fields searched.

D.4 Report precision data (a single value, a range, ‘Unclear’* or ‘not reported’ as appropriate). *Please describe. ).


Not reported. The additional trial search was used to test the precision of the draft filter and the modified filter although precision was not reported for the final filter (section E5 here describes precision data for the final filter based on a case study).

D.5 Report specificity data (a single value, a range, ‘Unclear’* or ‘not reported’ as appropriate). *Please describe. ).


Not reported

D.6 Other performance measures reported.


Not applicable

D.7 Other observations.


None

E. External validity testing (This section relates to testing the search filter on records that are different from the records used to identify the search terms).

E.1 How many filters were tested for external validity on records different from those used to identify the search terms?


One

E.2 Describe the validation set(s) of records, including the interface.


In testing external validation, 234 of the 266 publications in GS3 were identifiable as UK papers in their metadata on Ovid MEDLINE.

The case study validation set was taken from an EPPI-Centre systematic review on workplace-based learning for UK undergraduate and pre-registration health care professionals in which all the included MEDLINE references (117) were about the UK and were obtained without the use of geographic limits.

For each filter report the following information.

E.3 On which validation set(s) was the filter tested?


GS3 for external validity.

E.4 Report sensitivity data for each validation set (a single value, a range or ‘Unclear’ or ‘not reported’, as appropriate).


The validated filter achieved 87.6% relative recall against GS3. There was a 99.5% recall of identifiable references (Table 4 in the paper).

A later study by Ayiku et al (2020) assesses recall across 25 reviews. In this study, the MEDLINE filter achieved an average of 98.9% recall for the 25 reviews.

E.5 Report precision data for each validation set (report a single value, a range or ‘Unclear’ or ‘not reported’, as appropriate).


None of the validation sets were used to test precision. The case study was used to assess the precision of the final filter and demonstrated 11.4% precision at 100% recall compared with 1.5% precision when not applying the filter (Table 5 in the paper).

A later study by Ayiku et al (2020) assesses precision across 25 reviews. In this study, precision was increased by an average of 5.1 times.

E.6 Report specificity data for each validation set (a single value, a range or ‘Unclear’ or ‘not reported’, as appropriate).


Not reported

E.7 Other performance measures reported.


Reduction in NNR was reported for the case study. The MEDLINE filter demonstrated a NNR of 9.

E.8 Other observations


None

F. Limitations and Comparisons



F.1 Did the authors discuss any limitations to their research?

Yes

The UK setting of publications in the NICE guidance was not independently verified.

The search terms for the UK were based on team members’ knowledge.

The filter will not retrieve anything which does not have UK-related terms and MEDLINE records tend to lack these data in the title, abstract and subject heading fields.

The filter could potentially exclude UK studies which only refer to UK regions, counties, towns, villages, or individual organisations, without mention of the UK or its constituent countries.

The filter could pick up irrelevant content where terms contain a geographic place name (e.g., Jack London or Glasgow Coma Scale) but the research was not conducted in the UK; or where geographic places names have another meaning (e.g., Bath and bath).

F.2 Are there other potential limitations to this research that you have noticed?


In its current form, the filter might not be transferable to searches where specific regions, counties, towns, villages, or individual organisation are specifically relevant to the population or intervention.

F.3 Report any comparisons of the performance of the filter against other relevant published filters (sensitivity, precision, specificity or other measures).


Not applicable.

F.4 Include the citations of any compared filters.


Other validated geographic search filters for various locations are referenced in the literature review (Valderas, Pienaar). None are compared directly. However, the filters for Spain and Africa had a similar issue to the UK filter in that they cannot retrieve references which do not have any terms for the country-of-interest in the searchable fields of their MEDLINE records.

Valderas JM, Mendivil J, Parada A, Losada-Yáñez M, Alonso J. Development of a Geographic Filter for PubMed to Identify Studies Performed in Spain. Revista Española de Cardiología (English Edition) 2006;59:1244-51.

Pienaar E, Grobler L, Busgeeth K, Eisinga A & Siegfried N. Developing a geographic search filter to identify randomised controlled trials in Africa: finding the optimal balance between sensitivity and precision. Health Information and Libraries Journal 2011;28:210–215.

F.5 Other observations and / or comments.


None

G. Other comments. This section can be used to provide any other comments. Selected prompts for issues to bear in mind are given below.

G.1 Have you noticed any errors in the document that might impact on the usability of the filter?

No


G.2 Are there any published errata or comments (for example in the MEDLINE record)?

No

Checked 21 Dec 2022

G.3 Is there public access to pre-publication history and / or correspondence?

No

Checked 21 Dec 2022

G.4 Are further data available on a linked site or from the authors?

No


G.5 Include references to related papers and/or other relevant material.


G.6. Other comments


None