Also known as data wrangling
.bib files to Biblioshiny to exclude duplicates
.RIS files into Endnote records to detect duplicates and visually scan
Files can be cleaned using OpenRefine recommended by the VOSviewer manual p.31
OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
p.31 VOS Viewer Manual
"When creating a map based on bibliographic data, a VOSviewer thesaurus file can be used to merge different variants of a source title, an author name, an organization name, a country name, or a cited reference.13 This may for example be useful when the name of a researcher is written in different ways in different documents (e.g., with the first initial only or with all initials). A VOSviewer thesaurus file can then be used to indicate that different names in fact refer to the same researcher. When creating a map based on text data, a VOSviewer thesaurus file can be used to merge terms. This may be useful not only for merging synonyms (e.g., ‘h-index’ and ‘Hirsch index’), but also for correcting spelling differences (e.g., ‘behavior’ and 13 An alternative approach to data cleaning is the use of the OpenRefine tool available at https://openrefine.org. A tutorial explaining the use of OpenRefine to clean Web of Science or Scopus data can be found at https://bit.ly/2H9l31z. 32 ‘behaviour’). In addition, it may also be useful for merging abbreviated terms with full terms (e.g., ‘JIF’ and ‘journal impact factor’). A thesaurus file can also be used to ignore terms. For example, when working with titles and abstracts of scientific publications, one may want to ignore general terms such as ‘conclusion’, ‘method’, and ‘result’. We refer to Section 4.3 for a technical discussion of VOSviewer thesaurus files. "
The aim is to gather as many articles with the 'aboutness' of presence. This meant collecting far too many in the database search then filtering later. Lots of data cleaning but this was far more inclusive than trying to filter with a database or to do it via macros/algorithms. Human error increases with visual scanning however the trade off was to include more literature in the analysis. Conditional formatting and macros create the contrast that made visual scanning quite quick and with low error rate. If in doubt about an article, if it mentioned presence, immersion, in the context of virtual reality, I left it in. This may have included articles that did not refer to VR as immersive VR with HMD, however, this may not be a bad thing since this review is inclusive of these and may provide useful insights. I do wonder if leaving those articles in biases the data but I am assuming (check outcomes) that the VR and presence articles with greater cocitation will rise to the top and less relevant articles will naturally be relegated to lower in the impact and not be very present in this data analysis.
If 'presence of' was the only occurance of 'presence' then it was most often unrelated to this study. Colour contrasting for the additional words helped to isolate those that are relevant.
Use "Conditional formatting" feature: to enable you to highlight cells with desired strings (eg. "presence", "presence of", or "sense of presence"):
Here's how YOUTUBE: Highlight Cells that Contain a Specific Word - Excel Tip >> then sorted by colour "presence of"
Add this Macro code to enable you to highlight specific words or strings in each cell:
YOUTUBE: How to Highlight Specific Text within a Cell In Excel ; ( YOUTUBE: add "Developer" tab to Excel )
[ code is provided here on Patreon ; must acknowledge ]
Notes:
Text searches are case sensitive. ie. you need to search for "Presence" and "presence"
Run the macro in this order:
"presence" and "Presence" change to lighter pink: 7
"presence of" and "Presence of" , "presence/absence" "presence or absence" "Presence and absence" change to dark blue: 32
"sense of presence" and "Sense of presence". I also included "sensation of presence", "immersion", "copresence", "embodiment", "level of presence", "levels of presence", "feeling of presence", "feelings of presence", "co-presence", "feeling of presence", "presence questionnaire", "telepresence", "being there", "cybersickness", "virtual presence", "tele-presence", "psychological presence", "immersive presence", "spatial presence" and "social presence" for ease -- change to yellow: 6
This will enable you to do a quick visual scan through abstracts and titles to determine if "presence of" appears when presence is not mentioned per se
In a test on IEEE downloads (no citations available) I could mark 100 records in 5 minutes. 10,000 in 8 hours. Not too difficult because "presence of" usually appears on its own or with a technical term (like white blood cells) and easily identified and removed. Also, when unrelated to VR presence, there are usually only one or 2 'presence' in the text so it is easily identified.
need to delete "Retractions: also
Remove duplicates: "Title TI AND author AU AND year PY" : "Remove Duplicates"
Check for duplicate removals: Home tab >> Conditional Formatting >> Highlight Cell rules >> Duplicate values used. (This will highlight duplicates not deleted by the merge above).
Black: 1
White: 2
Red: 3
Green: 4
Blue: 5
Yellow: 6
Pink: 7
Dark green: 10
Aqua green: 14
Midnight blue: 11
Dark blue: 32
Orange: 46