"Generally, the content indexed in WoS and Scopus was also shown to be highly overlap ping, with Scopus indexing a greater amount of unique sources not covered by WoS " Pranckute, 2021, p.7
Useful download video for Scopus : note at 12:50 gives upper limits for citation downloads.
"Thus, the main drawbacks of GS are the Publications 2021, 9, 12 4 of 59 lack of transparency, stability, precision and control. Moreover, one of the main setbacks in trying to use GS for large-scale analysis is the unavailability of its data export [23,31–33]. By simple experiment, it was also shown that GS can be easily manipulated using falsified data [34]. Therefore, GS will not be discussed in this paper since the reliability of GS as an appropriate data source for bibliometric use is still heavily debated. " Pranckute, 2021
"In 2018, Digital Sciences launched a new broad scope bibliographic DB—Dimensions. As WoS and Scopus, Dimensions also is a controlled subscription-based DB, but a free basic version is also available [35,36]. However, due to its recent introduction, relatively little is known regarding its comprehensiveness and validity as a reliable bibliographic data source, as the exploring of these features has only just began in the scientometric community [32,37–40]. Thus, Dimensions also will not be discussed in this work." Pranckute, 2021, p.4
From my initial searches (Mar 2024), there appear to be enough articles uniquely available in each database to warrant using all of them to gather a broad represenaiton of the published literature
Merging the data will be a challenge however Kumpulainen & Seppanen (2021) provide a rich protocol to achieve this.
Aim to gather as much literature as possible that relates to presence in VR to represent the most comprehensive account of global research publication
Exclude records with incomplete bibligraphic data
Aim for a wide interdisciplinary representation
High-quality records and papers (conference presentations and peer reviewed sources)
Pranckutė, R. (2021). Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World. Publications (Basel), 9(1), 12. https://doi.org/10.3390/publications9010012
"coverage in the DBs varies greatly across disciplines, with one discipline being covered more extensively than others. For instance, a large-scale comparison per formed at the journal level has shown that WoS and Scopus are both biased toward Natural Sciences, Engineering, and Biomedical Research, with Scopus offering wider coverage of all investigated broad fields, especially of Biomedical Research. Meanwhile, Natural Sciences and Engineering appeared to be overrepresented to a greater extent in WoS [47]. These results were confirmed by later comparisons that were performed at the publication level, showing that both DBs offer the widest coverage of Natural, Medicine, Health Sciences, and Technology, while Social Sciences and Humanities (SSH) are underrepresented in both DBs [40,49,53,56]. " Pranckute, 2021, p.7
"For example, it is well known that, in Computer sciences, the research findings tend to be published in conference papers [86,87], while books and text-books are more important sources in social sciences and even more so in humanities [26,46] " Pranckute, 2021, p.7
"the coverage of books and conference proceedings both in WoS and Scopus was generally determined to be insufficient" Pranckute, 2021, p.7
All of these have been increasingly address by WoS and Scopus yet still not a wide coverage (see end of Pranckute, 2021, p.7)
IMPORTANT: " The lists of all references (even ones that are not indexed in the DBs) and related documents can also be viewed " Pranckute, 2021, p.12
"Anaccurate disambiguation between authors is extremely important for performing bibliometric analyses, assessing research performance, evaluating research collaboration and mobility trends, or tracking personal careers [70,117] " Pranckute, 2021, p.13
"In Scopus, all of the authors listed in the indexed publications are automatically assigned with individual Author Identifiers (AUIDs) and personalized profiles of every author are created." but WoS does not do this. Pranckute, 2021, p.13
"Split identities" or merged identities in both databases is a problem. ie. one author may be given 2 profiles, or 2 authors given one identity. Pranckute, 2021, p.14 . "However, the disambiguation between authors using author profiles that were gener ated in the DBs may not be completely accurate. Apart from the fact that some surnames, and even both surnames and names, are very frequent, an accurate isolation of the exact author of interest is even more complicated by the presence of misspelled or by alternative, yet valid forms of name variants. Algorithms that are applied in the DBs often mistakenly recognize them as different ones, thus dividing the publications of one author into two or more separate author profiles with the same or slightly different names. " Pranckute, 2021, p.13 -- this is why data cleaning is so important.
That considered, "Despite that, Scopus AUIDs appear to be precise. When compared to the largest Japan funding database, the recall and precision of Scopus AUIDs were over 98% and 99%, respectively, with the majority of authors having only one AUID [122] " Pranckute, 2021, p.14
Institutional accuracy in WoS an Scopus are both not great (esp. compared to author) therefore analysis based on institution would have limited validity.
"Generally, a high quality of the journal is perceived as its inclusion in WoS and Scopus, because these DBsallegedly only index the highest quality sources carefully selected accord ing to strict selection procedures [134] " Pranckute, 2021, p.16
"These measures could be applied if the source fails to meet certain indexing criteria, e.g., when journal self-citation patterns drastically increase and exceed the established threshold, the irregularities in publishing are determined or unethical publishing practices are suspected [138,143]. ", journals my therefore be discontinued. Pranckute, 2021, p.18
IMPORTANT: Database errors: Yes, I need to clean the data but errors are low so the benefit of excessive data cleaning may not be worth the effort. Also, errors may be only more likely for low impact papers so the issue may be of even less importance. This just needs to be discussed as a limitation of such bibliometric analyses (p.21ff): "A recent large scale analysis has shown that, although these mistakes are present in both DBs, but, in Scopus, frequencies are lower for almost all error types in author names, " and "On the other hand, the incorrect assignment of part of the last name as a first name was relatively common in both DBs (12.61% in WoS and 10.19% in Scopus). Meanwhile, although, in both DBs, incomplete and mistyped last names were quite rare (rarely exceeding 1%), approximately two-times more of these mistakes were identified in WoS when compared to Scopus, while the frequency of omitted apostrophes in WoS wasalmost fifteen times higher than in Scopus. However, in both DBs in almost all last names (approx. 95%), containing diacritics (only approx. 1% of investigated names contained this feature), diacritics were omitted, but none of the diacritics were incorrectly imported [71]. " Pranckute, 2021, p.20
DOI's are no guarantee since errors are common. p. 21-22
Subject classifications and document types are highly inconsistent between the databases.
"Thus, it should be kept in mind that all of the search results in WoS are only retrieved from editions and years available by the institution’s subscription package. " p.32
"large data sets usually cannot be directly analyzed at the DBs’ web interface due to the limited online analysis capabilities" p.33
"citation metrics are calculated only using data from the particular DB, they are dependent on DBs coverage width and depth. " p.34
In general, WoS and Scopus calculate these quite differently and engender quite pronounces biases. These are not to be trusted. Normalisation is an attempt to address these issues but the challenges are vast.
WoS: Journal Impact Factor (JIF) : often misunderstood and biased. p.34
"H-index is a hybrid metric that is provided in the majority of bibliographic DBs and other data sites. The h-index was introduced by Hirsch in 2005 to quantify an individual’s scientific research output [243]. Its numerical value denotes the amount of top papers (h) in the collection of evaluated papers, each being cited at least h times. The indicator is robust, objective, simple, and easily calculated. However, the biggest advantage of h-index is that it combines productivity and impact in a single measure. Moreover, it can be applied at different levels—individual researchers, journals, institutions, or other paper collections. Therefore, although relatively new, the h-index rapidly became one of the most prevalent metrics in research evaluation practices [2,221,244]. However, as all other impact indicators, along with the aforementioned advantages, the h-index also has its own limitations. Firstly, its value can only increase. Additionally, as h-index is based only on highly cited publications, it is insensitive to the actual number Publications 2021, 9, 12 38 of 59 of citations. Therefore, two journals or researchers with the same h-index can have a significantly different number of total citations. Moreover, h-index strongly depends on the total number of publications and their lifetime, which is directly related to the number of citations, which makes it disadvantageous for new journals and young researchers. Differences in co-authorship are also not considered. Yet, probably the most important disadvantage of h-index is that it is not normalized across the subject field and, thus, cannot be used for comparisons between different disciplines [2,46,244,245]. Additionally, the h-index does not account for self-citations with an argument that, while self-citations may increase the h-index value, their effect on the index value is much smaller than on the total citation count [26,243]. However, a theoretical study has shown that the h-index, as well as its variants, are susceptible to possible manipulations by self-citations [246]. Scopus provides an opportunity to select and view the h-index value without self-citations by using “Analyze author output” tool in the author’s profile page. Meanwhile, this option is not available in WoS. Nevertheless, h-index can be a valuable tool for evaluations and comparisons, but only when used with an awareness of its limitations. " -- Pranckute, 2021, p.37-38
IMPORTANT: "research quality is a multidimensional concept and, thus, cannot be only assessed by quantitative measures. " -- Pranckute, 2021, p.42 -- "However, research quality is a multidimensional concept and, thus, cannot be only assessed by quantitative measures. Despite this, the scientific impact of publications, which is determined by citations, is now generally viewed as an indicator of the quality of research, since citations are considered to be proof that the knowledge encoded in the publication was used and, therefore, made an impact. Citations may be appropriate in assessing the scientific impact of the research, but they do not show the impact of the research outside the scientific community "
IMPORTANT: "Moreover, because the rates of errors are not very high, they should not significantly affect the results of the analyses, if they are properly taken into account. " Pranckute, 2021 p.46
"In recent years there has been a lively debate concerning which of the three major bibliometric databases is the best: Web of Science (WoS), Scopus, or Google Scholar (Gusenbauer, 2019; Harzing and Alakangas, 2016; Moral-Mu˜noz et al., 2020). Web of Science (WoS), launched by the Institute for Scientific Information (ISI) and now maintained by Clarivate Analytics, is a curated collection of over 21.000 peer-reviewed, high-quality scholarly journals published worldwide (including Open Access journals) in over 250 science, social sciences, and humanities disciplines. Scopus is Elsevier’s curated abstract and citation database, launched in November 2004, with currently over 25.100 journal titles from more than 5.000 international publishers. Currently, Scopus covers more than 18 million records (Scopus, 2021) and delivers the most comprehensive overview of the world’s research output in the fields of science, technology, medicine, social science, and arts and humanities. Google Scholar (GS) is a freely available website, launched in 2004, that indexes the full text or metadata of the scientific literature from the most peer-reviewed online academic journals, books, conference papers, theses, preprints, abstracts, technical reports, court opinions and patents. Although Google does not provide the number of records, it was estimated in 2018 that Google Scholar, with 389 million records, is currently the most comprehensive academic search engine (Gusenbauer, 2019). However, the database of Google Scholar is currently not supported by major software tools for bibliometric and science mapping analysis (Moral-Mu˜noz et al., 2020), which makes it unattractive for bibliometric research."
St.Pierre, M., Grawe, P., Bergstrom, J., & Neuhaus, C. (2022). 20 years after To Err Is Human: A bibliometric analysis of ‘the IOM report’s’ impact on research on patient safety. Safety Science, 147, 105593. https://doi.org/https://doi.org/10.1016/j.ssci.2021.105593