In this class, we will discuss the kinds of biases that might be involved with our data collection, data visualization, and data interpretation processes. These include issues related to how human attentional, visual perception, and cognitive processing systems work, and the ways in which these might differ across groups. This page also includes supplemental resources for the cognitive biases activity we will do in class, as well as "big picture" dashboard guidelines and a list of well-known books on data visualization.
Preattentive Attributes are more basic features of the visualization that your brain can perceive without more complicated processing (e.g., linguistic processing or other kinds of symbolic processing, in which the brain processes images using the other kinds of mental representations as mediators). Below is a summary of these attributes, as put forward in Nussbaumer Knaflic's book. You can also watch her talk to Google.
Many parts of the human perceptual system are logrithmic, rather than linear.
You will be using Oxford's Catalogue of Cognitive Biases for at least one exercise in this class, but there may be other resources you should consider as well, including these:
Youtube Series on Cognitive Biases: https://www.youtube.com/playlist?list=PL0oKoJ8t-87x9EiDevqWe8rMDIYc10nvq
Impact of Cognitive Bias on User Interaction Sequences in Visualizations: https://www.youtube.com/watch?v=7HDnnmhWhvA
Please also be aware of these important concepts:
Anchoring Effect: Anchoring effects occur when a person uses one set of data (or other context clues) as a reference point for interpreting others. For example, if a student sees data that shows they are performing at 80% on a particular task, they might interpret that information differently depending on whether or not the data shows that the rest of the class is performing at either 95%, 80%, 30%, etc.
Distinction Effect: occurs when someone is presented with two stimuli simultaneously, which might inflate their perception of how big the differences are between those stimuli (compared to situations where they are exposed to each stimuli separately).
Dunning Kruger Effect: Describes a known patten where under-qualified individuals overestimate their own ability and knowledge (because they don't understand how difficult/large the knowledge space is) while experts tend to underestimate their own ability (because they are aware of how big the knowlege space is).
Fechner's Law and Weber's Law: These are laws of human perception first put forward by Gustav Fechner in 1860. They describe a phenomena in human perception where differences in two stimuli become noticeable in a logrithmic fashion.
Goodheart's Law: Named after economist Charles Goodheart, this law is related to the observer effect. It refers to how people seek to optimize the measurement that they are being evaluated upon. For example, if students figured out that they were being rewarded for spending more time reading a page in their learning software, and that we were operationalizing "reading time" by how long the page was open, they might leave pages open longer even though they were not reading them.
Golem Effect: This is the opposite of the Rosenthal/Pygmalian effect. It occurs when teachers' low expectations induce poor performance by their students.
Halo Effect: Halo effects occur when a positive evaluation of a person in one context/domain is applied to them in another domain. For example, if a student was really good at math and so a teacher expected them to also be good at reading or science.
Hawthorn Effect: This is also known as the observer effect. It occurs when research subjects change their behavior because they know they are being observed. This effect may or may not be useful to us in designing dashboards, depending on the students reactions.
Matthew Effect: The Matthew effect was first identified by sociologist Robert Merton in 1968 (see citation in reading list on your syllabus), and is used to describe how inequalities perpetuate themselves. In English, it is sometimes described with the phrase "the rich get richer and the poor get poorer," but it can also apply to learning measurements.
Model Drift: Model drift occurs when the concept you are modeling or the data you are using to model it, changes. For example, temperatures shift seasonally, and if you only included winter temperatures in your original model of "normal temperatures," it wouldn't take very long for your model to be inaccurate. Likewise, you might see model drift if your upstream data changes. For example, if your meteorologists started measuring temperature in Celsius instead of Fahrenheit, it would impact your modeling.
Overfitting & Underfitting: Overfitting occurs when your model describes the data it was trained on very well, but it does not generalize to new contexts/populations. Underfitting occurs when your model does not describe your data very well. It is also unlikely to transfer to new contexts/populations.
Pygmalian/Rosenthal Effect: This is sometimes also known as the Pygmalian effect. It was advanced by Rosenthal and Jacobson's (1984) book, which claims that raising teacher expectations raises student performance. It is the opposite of the Golem effect.
Serial Position Effect: When a person is given a list, they are usually best able to recall the first and last items, while the middle items are more difficult to remember.
Simpsons Paradox: This is a statistical phenomenon that occurs when combining groups washes out trends that are present in subpopulations. (You might check TowardsDataScience's explanation and animation of this process.)
Survivorship Bias: Occurs when you only examine the subjects that are performing relatively well. A classic example includes the analysis of war planes, where the US military applied extra armor where they found bullet holes in post-battle airplanes. They were initially surprised that this did not improve survivorship rates until someone pointed out that the planes that were shot down were likely shot in different locations.
Focus on the most important metrics/Tailor these to your audience: Choosing which metrics are most important can be challenging, but presenting your audience with too much information will not be effective either. Focus on metrics that meet the needs of the audience. This may include considering which metrics are most actionable, although some audiences/stakeholders may not be able to directly affect your data. To decide on which metrics are most important, start by making a short list of the benefits/risks involved if your audience does not have (and understand) this data.
Choose the right type of graph: Different types of data require different types of graphs. Here are a few examples, but we will cover more over the course of the semester. Bar graphs are good for comparing across categories, while line graphs and area graphs are useful for showing trends across time. Scatter plots are good for exploring the relationship between two variables, but may require additional information if there is a third variable mediating that relationship. Pie charts are popular for showing part/whole relationships, but can distort information, especially if too many categories are present. Heat maps are useful for understanding geograpahic distribution, while histographs show distribution of frequencies across a range of values. You might also check out one of the graph selection ("pick a chart") tools that we have provided for you.
Provide context and comparisons: Your audience needs labels, titles, and descriptions, but you should also consider how anchoring effects (see below) can be used to improve your audience's understanding of the meaning and significance of your data.
Avoid misrepresenting the data: Avoid misrepresenting the data by using appropriate scales, axes, coloring, etc., and avoid choices that might contribute to distortion. We will cover this issue in more detail over the course of the semester, but also consider some of the concepts related to visualization science and cognitive biases (e.g., Anchoring Effect, Distinction Effect, preattentive attributes, JNDs, Weber's/Fechner's Law, etc.). However, you should also consider the construct labels you have for your data. If your audience does not understand, for example, that you have defined "time reading" as the time in which the student has a particular page open, they may not act in an appropriate fashion when presented with that data.
Make it interactive: Interactive features (e.g., hover-over effects, clickable elements, and drill-down capabilities) allow users to explore the data in more detail, but use them appropriately. They can also allow you to present additional data that stakeholders might need, but would be too messy to include in the initial visualization.
Test and iterate: Test the visualization with your target audience (or, in this class, your peers) and iterate your design accordingly. You should not assume that everyone will approach this data in the same way or will have the same level as data literacy levels, so it is important to connect with your audience to understand their needs. Look for indications that you either need to declutter (remove info) or focus (add info), and keep track of what that means about your audience's data literacy as you move on to your next design.
You might also consider Stephen Few's guidelines for balancing information vs. aesthetics in your visualizations.
This class will focus on research articles, drawing largely from the most recent work in learning analytics. But, we are in an emerging and interdisiplinary field. Some important works you might want to know about are included in this non-exhaustive list.
Agresti, A. (2012). Categorical data analysis (Vol. 792). John Wiley & Sons.
Bertin, J. (1983). Semiology of graphics. University of Wisconsin press.
Blasius, J., & Greenacre, M. (1998). Visualization of categorical data. Academic Press.
Cairo, A. (2019). How charts lie: Getting smarter about visual information. WW Norton & Company.
Evergreen, S. D. (2019). Effective data visualization: The right chart for the right data. SAGE publications.
Evergreen, S. D. H. (2018). Presenting data effectively: Communicating your findings for maximum impact (2nd ed.). Thousand Oaks, CA: Sage.
Few, S., & Edge, P. (2007). Data visualization: past, present, and future. IBM Cognos Innovation Center, 1-12.
Few, Stephen. Now You See It: Simple Visualization Techniques for Quantitative Analysis. 1st edition. Oakland, Calif: Analytics Press, 2009.
Huff, D. (2023). How to lie with statistics. Penguin UK.
Jones, Ben. Communicating Data with Tableau: Designing, Developing, and Delivering Data Visualizations 1 edition. Sebastopol, CA: O’Reilly Media, 2014.
Knaflic, Cole Nussbaumer. (2015). Storytelling with data: A visualization guide for business professionals. Hoboken, NJ: John Wiley & Sons, Inc.
Kress, G., & Van Leeuwen, T. (2020). Reading images: The grammar of visual design. Routledge.Riche, N. H., Hurter, C., Diakopoulos, N., & Carpendale, S. (Eds.). (2018). Data-driven storytelling. CRC Press.
Sahin, M., & Ifenthaler, D. (2021). Visualizations and dashboards for learning analytics. Springer
Schwabish, J., Popkin, S. J., & Feng, A. (2022). Do No Harm Guide: Centering Accessibility in Data Visualization.
Sleeper, Ryan. Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master. 1 edition. Beijing: O’Reilly Media, 2018. [Ebook]
Tufte, E. R. (2001). The visual display of quantitative information (Vol. 2, p. 9). Cheshire, CT: Graphics press.
Tufte, E. R. (2006). Beautiful evidence (Vol. 1). Cheshire, CT: Graphics Press.
Tufte, E. R., & Robins, D. (1997). Visual explanations (p. 52). Cheshire, CT: Graphics.
Wexler, S. (n.d.). The Big Picture: How to Use Data Visualization to Make Better Decisions--Faster. McGraw-Hill Education.
Wexler, S., Shaffer, J., & Cotgreave, A. (2017). The big book of dashboards: visualizing your data using real-world business scenarios. John Wiley & Sons.
Yau, Nathan. Data Points: Visualization That Means Something. Indianapolis, IN: John Wiley & Sons, Inc, 2013.