Data Literacy is what helps readers understand the charts, graphs, and visual aids often found in academic research. This skill is important as we engage in an increasingly data-filled world. Many of the articles and research studies found within the library's resources and databases will contain data that you will need to translate and understand.
Review the examples and visual aid that explains the process.
Answer the self-check questions at the end.
Go to the final step in this guide, the Quiz.
Why is Data Literacy important?
Being able to read and understand the math and data visualization tools like the charts and graphs often found in academic research and on the internet is a vital step in making research-informed decisions for your academic research as well as data-informed life choices.
How do I read and understand data?
Being able to comprehend the data presented in academic research involves reading, analyzing, and interpreting the research behind the numbers. Data visualization tools can be helpful, but sometimes they are an obstacle to drawing correct conclusions from the research studies. When you see a chart or a graph, don't take it at face value. Stop, read the numbers behind the visual tools, consider how the data is presented, what the sample size of the research study is, and what argument the study authors might be trying to make with the data they choose to represent or not.
What are some definitions to remember in Data Literacy?
Each of these terms are common pitfalls that many researchers can fall into when not taking the time to slow down, read, and understand how studies are structured and the research is presented.
Sample Size: This definition refers to the number of individuals, individual samples measured, or observations used in a survey or experiment. In rare occurrences researchers may have access to only a small dataset and researchers have to work with what they have.
Misleading Data Visualization: This term encompasses many different tools used to misrepresent trends in data including; correlation doesn't equal causation, omitting the baseline, manipulating the x or y axis, using the wrong chart, or omitting or cherry-picking data.
Statistically Significant: If a study doesn't mention that the data findings were statistically significant in any way, the results are likely not reproducible and inconclusive.
Review the examples below that explain several scenarios, and define many terms that are common in analyzing and interpreting data. Then continue on to the "Self-Check" section.
Whenever you are examining the research, it is important to read what methods the study used and how they structured it. Consider the factors within sample size; number, composition, and selection. If an instructor creates a questionnaire where they interview ten of their students and ask how much they each enjoy the class, the instructor could easily skew how they present that data. If 9 out of 10 students state that they enjoy the instructor's class, the instructor could state "90% of surveyed participants love my class" and technically the instructor would not be lying. However, the instructor could have randomly selected their 10 students, they could have chosen students who all had good grades, or only chosen students who were failing and might feel pressure to state enthusiasm for their instructor in exchange for a good grade. The sample size also calls into question the students the instructor could have chosen from; does the instructor usually teach small classes of less than 10 students, or do they have usually have more than 50 students in the classroom. If so, the fact that the instructor surveyed such a small portion of the population would mean that anyone who tried to analysis the data would be confused and the results would not be easily reproduced. This can happen on a large scale in society as well, with many populations left out of research or surveys skewed to include data that supports the desired outcome.
Misleading Sample Size Example
In many data visualization tools, two charts may be presented that seem to create a relationship between two different data sets. However, the relationship between the two data sets could be spurious (false, misleading), despite the appearance of similar spikes and falls. It's a data best practice to examine the actual numbers behind the data, and consider external factors like location, weather, time, to determine whether there is actually a relationship between two data sets.
Plaue, M. (2020, February 25). A correlation measure based on Theil-Sen regression. Towards Data Science.
Misleading Correlation Example
Local Restaurant staff want to convince the owners that they should close on Fridays. They created the chart below to indicate that business is slow that day, so the restaurants profits will not suffer. At first glance, this chart may appear to prove the restaurant staff's point that since Friday is the least busy day of work they should be able to close. However, this chart is misleading because the scale does not start at 0, it begins at 40. If the scale does not begin at zero, there should be a break in the line to indicate the jump to viewers. Additionally, the graph also uses colors to mislead the viewer. Every weekday is represented by a bar in a shade of blue except Friday, calling further attention to Friday's seemingly low number of customers.