By Jason Curtis Droboth
November 3, 2020
A short paper for COMS 615: Research Methods
This is an exploratory paper in the initial stages of research design which provided me with a brief sketching of the questions I would like to explore, the methods that might be most fruitful, and the bounds of their limitations.
Reminder: It's always essential to start with a good research question first, then choose the methods, never the other way around.
Where do people get their scientific information from and how do they make sense of it? Much recent Science Communication research considers this question by examining internet media channels, mostly online news media and social media (Gregory, 2014). However, there’s one media platform that seems grossly overlooked by Science Communication research: Wikipedia. In fact, when searching through the three “quasi-flagship” Science Communication journals[1], very few articles (an average of twelve per journal) even mention Wikipedia, with only around half of those considering Wikipedia in even the most basic of detail, nearly all of which focus on the dynamics of content production. But what about the consumption of scientific information on Wikipedia?
What role does Wikipedia play in the public dissemination and explanation of scientific information? From this question, I believe three important questions result:
How are science and scientific information presented on Wikipedia?
How do people make sense of scientific information on Wikipedia?
What drives people to Wikipedia as a source of scientific information?
These three questions are very broad, to be sure. So, to begin exploring them, I propose to focus these each question on the sample of a single Wikipedia article entitled “Chicxulub crater” (2020)[2]. There are a few reasons for choosing this article. First, Dr. Alan Hildebrand, the scientist who identified this impact crater as the one associated with the asteroid which killed the dinosaurs, is a professor at the University of Calgary. Second, it’s directly related to the dinosaur extinction which is one of the most well-known and romanticized events from ancient history. Third, the article is of a length that most readers could reasonably read through in one sitting. Fourth, it’s representative of the type of articles that are most overlooked by research, relatively stabilized articles on generally uncontroversial scientific topics unlike, for example, the obsessively researched and the highly-political topic of climate change (Metcalfe, 2019). Finally, for the purpose of this study, it’s assumed that any Wikipedia article on a scientific topic is generally representative of many of the dominant dynamics present in most Wikipedia articles on scientific topics.
What follows are three radically different methods which might best help bring light to each question. It’s suggested that the first question is best explored through a thorough Content Analysis. The second question can benefit from methods to gather data through Unstructured Participant Interviews, leading to Grounded Theories. The final question requires a Mixed-methods approach, analyzing a combination of Qualitative and Quantitative existing open-access metadata and data collected from Surveys.
This is a question primarily of content and form asking “what does the “Chicxulub crater” (2020) article say and how does it say it”? Here I would like to interrogate the characteristics which may most contribute to the article’s presentation of themes of coherence and authority, and uncover other themes, through Thematic Analysis. Most of the content in the article is in textual form, broken into various sections and subsections, supplemented with a few images and GIFs. For example, a study merely of the text and form of the first sentence of the article yields much information.
The Chicxulub crater (/ˈtʃiːkʃʊluːb/; Mayan: [tʃʼikʃuluɓ]) is an impact crater buried underneath the Yucatán Peninsula in Mexico.[4]
Coding for an analysis of the syntax of this sentence alone might begin to highlight general structure and meaning. Or, coding for an examination of the use of references and hyperlinks might begin to trace the ways in which the article presents itself as authoritative and trustworthy. Initial themes will be most obviously signaled by the headings and subheadings. However, a Careful Reading, over multiple readings, will help to develop themes and their functions that might otherwise go unnoticed (Hawkins, 2018). For example, since the authors of the article are uncredited, how does the article present anonymity and objectivity?
There are 2 main potential pitfalls with this method as it stands. First, is the choice to use a single Wikipedia article. If, during the early stages of theme development, it seems that there is not enough content or cultural context with which to build, recalibrate, and analyze meaningful themes, a broader source of data should be considered. This will likely mean an expansion to more articles or past edits of the same.
The second issue is in the noticeable lack of clear resolve in the choice between an inductive and deductive approach. Before moving forward, more thought should be put into this. Will it be assumed even before reading the text that coherence and authority are indeed 2 important themes? If so, this deductive approach should be founded on a clear theoretical foundation. If not, then it should be resolved that no themes be assumed before a careful reading of the text, themes should instead be inductively inferred from the data. However, there may still be use for a middle approach. Even if the greatest of effort is made to be rid of any a priori preconceived themes, the researcher will carry an unconscious bias towards certain expected themes. A thorough reflexive interrogation into and description of such biases might help to inform a more conscious and effective research design before data collection.
Research on the public understanding of science is often too quick to limit an analysis of “understanding”. The bulk of the research is obsessed with research methods that test for “comprehension”. To a slightly lesser extent, science communication research is also saturated with methods of opinion polling, in an attempt to understand public “attitudes” towards science, in the hopes that public sentiment and approval for science can remain high. Both these methods hope that communicating science can inform the public as a means of persuasion. This is a superficial exploration of the sense-making process.
To explore in a more holistic way how people make sense of scientific information on Wikipedia, openness to reflexive processes and radical methodological adaptability are key. Research into this question will aim towards Grounded Theories (Valde, 2018). The goal here is to open new lines of inquiry into how people experience Wikipedia. Thus, as much agency as possible should remain with the participant, allowing them to offer new directions at every stage. This will occur through Participant Interviews, in which the participant will elucidate on how they experienced their reading of the “Chicxulub crater” (2020) article. There will be 4 stages of data collection and analysis.
The researcher will initiate a Close Reading of the “Chicxulub crater” article. This is merely to become well-affiliated with the article, begin thinking about coding categories, and identify themes and topics of discussions for interviews.
Insights from stage one will be used to generate a list of loose topics or areas of inquiry to serve as very gentle nudges during Unstructured Interviews (Cramer, 2018). Participants should be recruited randomly as best as resources permit. Given certain pandemic restrictions, interviews will occur online. A consent form will be distributed before the interview along with a survey to collect various demographic and attitudinal data (this survey protocol should not change throughout the research). The interview will begin with an assurance of confidentiality and obtaining signed consent. Compensation, depending on specific IRB protocols and suggestions, will be shared before starting and the researcher will provide an overview of what to expect specifying also the duration of the interview. To begin the interview, the interviewer will provide the article and the participant will be asked to read through it, making notes while they read. After finishing, an open unstructured recorded interview will proceed, using some of the nudges outlined on the interview protocol when necessary, being mindful of time. Immediately after the interview, the researcher will make personal reflective notes, including things that stood out most. Then, the interview will be transcribed, the data coded, and analyzed.
This will follow the exact same process as stage two, but will integrate the codes and analysis of all previous interviews, allowing them to guide all steps in the interview process including further recruitment, to help alter the sample to satisfy standards of representation. This process will continue until data saturation is reached.
Once data saturation is met final synthesis and analysis can begin. Since methods may have changed at every stage with each participant, the method of synthesis is impossible to lay out here. However, through countless progressive iterations of unitizing and coding, much of the codes and directions will have already been developed by this point. All participants should remain involved in conversations on the progress of the research, the nature of the results, and the avenues for public release of results.
A mixed-methods approach can help to uncover the societal and personal causes which motivate people to seek out scientific information on Wikipedia. Assuming that users must actively seek to engage with Wikipedia, a Uses and Gratifications theoretical foundation might best help to describe these primary driving factors, especially within the socio-cognitive model outlined by LaRose et al (2001) which distinguishes between gratifications sought and gratifications obtained. An inquiry into this question should involve the following methods: 1) analysis of Existing Open-access Metadata, and 2) Survey Research.
While it would be most helpful to do a referral path and click path analysis, access to the necessary data is difficult (Gyllstrom & Moens, 2012). Though Wikipedia makes much user data freely and openly available, path data is not one of them, however, pageview data is. Significant surges in view counts over the lifetime of the “Chicxulub crater” (2020) article, compiled and available on sites like https://www.wikishark.com/, might correlate with other notable media events, providing a potential source to infer possible causes. For example, there may be time correlations between surges in pageviews of the article, Google searches related to the Chicxulub crater, and news coverage of the topic, suggesting a link between user searches and news media coverage.
Self-reported data, via surveys, specifically questionnaires, could greatly enhance the coarse digital data by describing what factors drive an individual to access Wikipedia for scientific content. However, using a single article presents a major issue with this method. How could the researcher discover who has accessed or read the “Chicxulub crater” article and how could they get in contact with them? This means that a single article approach will simply not work. Thus, in this case, the starting point will have to change from the single article to participant responses on a broader platform-level inquiry. Random surveys should group participants into general standard demographics, seek to quantify their stated interest in science, and then attempt to understand behavior by asking which scientific Wikipedia articles they might have accessed, why they did so, which “needs” it did or did not “gratify”, what situations they believe would lead them to accessing scientific content on Wikipedia again, and other questions interrogating behaviour and motives. This survey should be distributed in progressive iterations to test general success, interrogate methods and ethics of demographic categorization, and discover opportunities for other data and lines of questions before being distributed en-mass.
Given the nature of the data, it’s likely that any inferred conclusions will offer very general superficial insights, depending for example on the amount and categorization of participants, depth of responses, and methods of coding. However, by combining the open-access metadata and the collected survey data, the large societal trends and individual habits, might work to mutually contextualize each other. At a minimum, these insights could provide inspiration for new lines of inquiry and methods via in-depth interviews, controlled experiments, and robust referral and click path data.
This overview has broadly explored a few ways in which the very general question, “What role does Wikipedia play in the public dissemination and explanation of scientific information?” can be examined. This can occur through an inquiry into any of the three questions, by limiting the sample to a single text, and by utilizing a combination of Textual Analysis, Surveys, Participant Interviews, and Existing Open-access Metadata. Any of these approaches can offer new insights and spark new modes of inquiry into the main question of interest.
To be sure, the use of a single article, the broad scope of the questions, and the openness of the research methods do present obstacles. If it appears that other articles are needed in order to provide better data, then one or more articles should be added to the sample. An article on another topic might help to infer better generalizability across Wikipedia, whereas using the same article but different versions over time might provide longitudinal insights into historical trends. If necessary, the questions should be more focused, however, this must happen very early on. A decision, informed by deeper research, also needs to be made early on the extent to which the methods will be adaptable or structured, inductive or deductive, theory-driving or theory-driven.
Chicxulub crater. (2020, October 25). In Wikipedia. https://en.wikipedia.org/wiki/Chicxulub_crater
Cramer, E. (2018). Interviews for Data Gathering. In M. Allen (Ed.), The SAGE Encyclopedia of Communication Research Methods (pp. 804–807). SAGE Publications Ltd. https://doi.org/10.4135/9781483381411.n276
Gyllstrom, K., & Moens, M. F. (2012). Surfin’ Wikipedia: An analysis of the Wikipedia (non-random) surfer’s behavior from aggregate access data. IIiX 2012 - Proceedings 4th Information Interaction in Context Symposium: Behaviors, Interactions, Interfaces, Systems, 155–163. https://doi.org/10.1145/2362724.2362752
Hawkins, J. M. (2018). Thematic Analysis. In The SAGE Encyclopedia of Communication Research Methods (pp. 1757–1760). SAGE Publications, Inc. https://doi.org/10.4135/9781483381411.n624
Larose, R., Mastro, D., & Eastin, M. S. (2001). Understanding Internet Usage. Social Science Computer Review, 19(4), 395–413. https://doi.org/10.1177/089443930101900401
Metcalfe, J. E. (2019). Rethinking science communication models in practice. The Australian National University.
Valde, K. S. (2018). Grounded theory. In M. Allen (Ed.), The SAGE Encyclopedia of Communication Research Methods (pp. 631–633). SAGE Publications, Inc. https://doi.org/10.1016/j.rcp.2018.08.002