Module 03

Visualizations

Introduction

  • Before conducting data analysis, it is necessary to inspect the data or check if there are any data errors, for example, some data input error (a response of "5" in a Likert scale mis-input as "55") will harmfully distort the results of data analysis, which is mainly based on composite scores of scales. However, raw data checking by cell could be a bit tedious if the data point is too large (e.g., national data)

  • Data visualisation would be good practice for preliminary data inspection, as it provides information about

    • the general tendency (like trends and pattern)

    • outliner

  • After pre-processing the data (covered in Module 01 and Module 02), we have obtained more meaningful measurements (e.g., the openness subscale of personality) for each individual participant. However, with tens or even hundreds of participants, we need a concise way to have a brief look of a measure across participants. In this module, we will focus on how to visualise our data using jamovi.

Recap of level of measurements

    • Please refer to the codebook of the PROCRAST questionnaire for the level of measurements of each variable. Here are some of the examples:

      • Nominal - Gender, Hostel

      • Ordinal - Likert Scale

      • Scale - GPA, Hours of sleep

    • Each level of measurement is recommended to use different methods for visualisation. This module will only provide examples of visualising single variable. To visualise more than one variable, such as two scale variables using a scatterplot, will be covered in later module.

1. Nominal data visualization

As nominal data (also commonly known as categorical data) do not provide any quantitative value, they can neither be ordered nor measured. Therefore, when we look at the descriptives of the nominal data, we usually look at the count of the total number and the categories of the variables.

Given this information, we can select an appropriate type of graph to visualize the result. Bar charts and pie charts are some common graphical representations for nominal data. Here, we will look at how to generate a bar chart for a nominal variable.

For example, suppose we want to present the following data in graphs: which faculty students belong to (the variable “Faculty”), which hostel they live in (the variable “Hostel”) and what relationship status they're in (the variable “RelStatus”).

Q: How do we visualize the results of students’ faculty, hotel and relationship status?

A: We use the “Descriptives” under the “Exploration” in jamovi.


Example 3.1 Descriptive_visualize_nomial.mp4

2. Non-nominal data visualization

While for non-nominal data (ordinal, ratio, interval data), they provide orderable and measurable quantitative values. So, in addition to frequencies (described above), we can also look at the following information in descriptives, apart from N and Missing:

      • Mean

      • Standard deviation

      • Median

      • Range

With this information, we can also select some appropriate graphs to visualize the result. Histogram, density plots, box and whisker plots, and line graphs are some common graphical representations for non-nominal data. Here, we will look at how to generate a histogram and a box and whisker plot for a numerical variable.

For example, we want to present the data of “Sleep” and “GPA” of students in graphs, instead of numeric values. We will need to visualize them.

Q: How do we visualize the results of students’ sleep and GPA?

A: We use the “Descriptives” under the “Exploration” in jamovi.


[Typo: In the labels for Box Plot during 0:41 - 0:45 in the following video, the labels for "25th percentile" and "75th percentile" should be swapped, i.e., the correct order of the labels (from top to bottom) should be: Max. > 75th percentile > Median (50th perceptile) > 25th percentile > Min.

Example 3.2 Descriptive_visualize_scale.mp4

3. Outliers

Outliers are data points that are very different from the rest of the data. There are many possible reasons for outliers to exist, e.g., there was some error during data collection, data entry, or data pre-processing, or it could be a "real outlier" among all observations (e.g., the exam score of a student who had been absent for the entire course including the exam). Regardless of the reason of its existence, an outlier could have huge influence and implications on our interpretation and analysis of the data (e.g., mean score of the whole class could be dragged down by the absent student). Therefore, before conducting any data analysis, it is important to identify outliers.

One of the advantages for visualizing data is to identify outliers (if any) in a sample.

For example, we can visualize the perceived intelligence of students with a box plot and observe whether there is/are outlier(s). For example, there could be a few students who viewed themselves as super intelligent (e.g., giving themselves a score of 100), or super unintelligent (e.g., giving themselves a score of 0).

Example 3.3 Outlier.mp4

Module Exercise

Complete the exercise!

    • Now, if you think you're ready for the exercise, you can check your email for the link.

    • Remember to submit your answers before the deadline in order to earn the credits!