By Jason Curtis Droboth
December 1, 2020
A short presentation for COMS 615: Research Methods
This study will attempt to uncover the large and small-scale reasons and motives that cause people to access scientific information on Wikipedia
This is what the vast majority of research on Wikipedia seems to be focused on (Okoli et al., 2014)
While there is still much research on this area, it represents less activity than production and content studies, yet 94% of users only consume the content (Okoli et al., 2014).
Most research on consumption (readership) dynamics focuses on (Okoli et al., 2014, p. 2393):
Ranking and popularity of Wikipedia
Wikipedia as a knowledge source for news, health information, and for scholars and librarians
Student readership
Perceptions of credibility
Grouped according to types of data sources
Methods used: Content analysis, page view data, and more
Sample Studies:
An analysis of the scientific accuracy of pages on controversial topics like climate change (Wilson & Likens, 2015) .
A study which examined surges in page view count data for particular pages correlating those surges with celebrity awareness campaigns of particular neurological diseases (Brigo et al., 2015).
Methods used: Online surveys, automatic surveys, questionnaires, and more.
Sample Studies:
One study aiming to understand how much French students of different ages (11 - post-secondary) trust Wikipedia as an information source deployed a questionnaire using a combination of structured and un-structured questions (Borås & Vol, 2018).
Another study designed a widget that would pop up when people accessed Wikipedia pages and would ask them to fill out an online survey with questions to interrogate why they were accessing that page (Lemmerich et al., 2019).
Methods used: Comparative content analysis, quantitative rankings (site traffic, search result rankings), algorithmic audits, etc.
Sample Studies:
Many studies will compare rank and popularity of different websites using traffic tools and webpage ranking software (Okoli et al., 2014, p. 2395).
One study developed software to infer how the proprietary (private) algorithms of different search engines work to create different referral paths to Wikipedia (Höchstötter & Lewandowski, 2009) .
Methods used: Controlling software, electronic experiments, etc.
Sample Studies:
One study designed a chrome extension that manipulated search returns on Google, restricting how accessible Wikipedia articles are in search results, showing just how much people depend on Google searches to access Wikipedia content (Mcmahon et al., 2017).
Another looked at the effect that the presence or absence of authors, references, sponsors, awards, or the affiliated encyclopedia had on the participants' perception of information credibility (Kubiszewski et al., 2011).
Methods used: Counting software, click path software, and more
Sample Studies:
A Chrome extension was created for one study which logged participant clicks and usage as they navigated from Google search results to Wikipedia finding that Wikipedia traffic is extremely vulnerable to Google search results (Mcmahon et al., 2017).
Methods used: Open-ended, semi-structured, structured, and more.
Sample Studies:
Though the results are likely outdated, one study used semi-structured interviews to discover that, with a very small sample (n=15), young people use Wikipedia for an extremely limited and largely inconsequential amount of tasks (Luyt et al., 2008).
Methods used: Eye tracking analysis, and more
Sample Studies:
One study aimed to better understand how people navigate, read, and make sense of Wikipedia by tracking participants' eye movements as they read certain articles (Kessler et al., 2020).
"Research on search, selection, and reception processes on Wikipedia is scarce" (Kessler, et al., 2020).
Especially with an eye towards non-students, non-researchers, non-english speakers, and the less educated.
And especially within the field of science communication.
A focus on non-students and non-academics using a mixed methods approach combining existing open-access metadata with self-reported user data to enhance our understanding of extrinsic and intrinsic divers towards Wikipedia.
Method: Time-correlated page view data analysis
This assumes more of a passive consumption influenced by external factors
Details:
While it would be most helpful to do a referral path and click path analysis, access to the necessary data is difficult (Gyllstrom & Moens, 2012). Though Wikipedia makes much user data freely and openly available, path data is not one of them, however, page view data is.
Example article:
Analyze the Chicxulub Crater article https://en.wikipedia.org/wiki/Chicxulub_crater
Method: Semi-structured to non-structured surveys and interviews
This assumes more of an active consumption/readership/engagement governed by intrinsic factors
Details:
Conduct a broader platform-level inquiry, which will not focus on just one single article
Group participants into general standard demographics,
Seek to quantify their stated interest in science
Attempt to understand behavior
By asking which scientific Wikipedia articles they might have accessed,
Why they did so,
What situations they believe would lead them to accessing scientific content on Wikipedia in the future
Other questions interrogating behaviour and motives.
Participants: Non-scientists, non-students, the less educated
Surveys: Track Habits. Interviews: Outline Sentiments
Theory: Use Grounded Theory
Iterative: Conduct in progressive iterations
This study does not really present any overly unique ethical concerns, but they are still of great importance
It's unclear what use might come of any insights produced, but results should be of benefit to the participants, not just science communicators, Wikipedia, or other software developers
Sloppy or otherwise inconsiderate means of grouping or coding for demographic differences can result in careless representation of individuals. Coding participant data means you speak for them. The utmost of care should be taken when creating codes and coding the data. To be effective at this, conduct the coding in continual reflexive iterations throughout.
Participant privacy and prior informed consent should remain a top priority
Though the mixed-methods approach will help round out the study, there are still major limitations
As this study will focus only on English Wikipedia, it is ignoring a large base of articles and users, thus missing out on producing meaningful insights into geographically and culturally-specific dynamics
Since the qualitative portion of this study would rely on self-reported data, even self-reported habits and sentiments might not represent actual habits and sentiments as demonstrated in some studies that show how cautious participants are admit that they trust and use Wikipedia, but do so in their actual usage habits (Okoli et al., 2014).
Even if time-correlations are found with anomalous click rates, it may be impossible to reliably infer their cause
Borås, P. Q. B. T. U. O., & Vol, S. (2018). How trust in Wikipedia evolves: a survey of students aged 11 to 25 (Vol. 23, Issue 1).
Brigo, F., Igwe, S. C., Nardone, R., Lochner, P., Tezzon, F., & Otte, W. M. (2015). Wikipedia and neurological disorders. Journal of Clinical Neuroscience, 22(7), 1170–1172. https://doi.org/10.1016/j.jocn.2015.02.006
Höchstötter, N., & Lewandowski, D. (2009). What users see - Structures in search engine results pages. Information Sciences, 179(12), 1796–1812. https://doi.org/10.1016/j.ins.2009.01.028
Kessler, S. H., Mede, N. G., & Schäfer, M. S. (2020). Eyeing CRISPR on Wikipedia: Using Eye Tracking to Assess What Lay Audiences Look for to Learn about CRISPR and Genetic Engineering. https://doi.org/10.1080/17524032.2020.1723668
Kubiszewski, I., Noordewier, T., & Costanza, R. (2011). Perceived credibility of Internet encyclopedias. Computers and Education, 56(3), 659–667. https://doi.org/10.1016/j.compedu.2010.10.008
Lemmerich, F., Sáez-Trumper, D., West, R., & Zia, L. (2019). Why the World Reads Wikipedia: Beyond English Speakers. ACM Reference Format. https://doi.org/10.1145/3289600.3291021
Luyt, B., Zainal, C. Z. B. C., Mayo, O. V. P., & Yun, T. S. (2008). Young people’s perceptions and usage of Wikipedia. Information Research, 13(4).
Mcmahon, C., Johnson, I., & Hecht, B. (2017). The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies. www.aaai.org
Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F. Å., & Lanamäki, A. (2014). Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readership. Journal of the Association for Information Science and Technology, 65(12), 2381–2403. https://doi.org/10.1002/asi.23162
Valde, K. S. (2018). Grounded theory. In M. Allen (Ed.), The SAGE Encyclopedia of Communication Research Methods (pp. 631–633). SAGE Publications, Inc. https://doi.org/10.1016/j.rcp.2018.08.002
Wilson, A. M., & Likens, G. E. (2015). Content volatility of scientific topics in Wikipedia: A cautionary tale. PLoS ONE, 10(8), 10–14. https://doi.org/10.1371/journal.pone.0134454