Data Preparation

Steps

  1. Prepare all the raw data in paper form or other electronic format.

  2. Input or import all the data into SPSS. Save the data sets to SPSS data files.

  3. Create a data specification table / coding scheme to list the level of measurement and valid values for each of the variables.

  4. Specify all the required variable information in the SPSS data editor, especially the levels of measurement.

  5. Check for issues in the data and fix all of them. Suggested tasks:

    • Create case numbers if needed.

    • Rename the variables (e.g., using Q1, Q2, Q3, etc.).

    • Recode answers as numerics and add value labels (for nominal and ordinal variables) if necessary. Then change their type to numeric.

    • Set the correct levels of measurements.

    • Recode multiple response items.

    • Check all the missing values.

    • Look for invalid values.

    • Check for any other possible issues in the data.

FAQs

1. How to code multiple response items?

Multiple response items as recorded in Google Form and some other online survey tools look like "A,B,C", "A,C", "B,C", etc., where the appearance of an option (A, B, or C here) indicates that the option has been selected by the respondent. For example, "A,C" means that the respondent has selected both A and C. This kind of data cannot be analyzed directly and must be recoded first.

You should create one variable for each option and use 1 or 0 (or leaving it blank) to denote whether the option is chosen or not. For example, if there are three options (A, B, C) under Q5, you can create the variables Q5.A, Q5.B, Q5.C. If someone answers A and C, then you can put Q5.A=1, Q5.B=0, and Q5.C=1. (For simplicity, you can also ignore the 0 and leave it as missing.)

In practice, you may use a spreadsheet program such as Excel and Google Sheets to help you with the recoding. Alternatively, if you feel comfortable with the SPSS programming syntax, you can use the char.index command in SPSS syntax mode to speed up the process. For example, the following command will check if the characters A, B, and C exist in the variable Q5 respectively. If so, it returns a 1 in the corresponding variables Q5.A, Q5.B, or Q5.C. Otherwise, it returns a 0.

compute Q5.A = char.index(Q5,'A')>0.

compute Q5.B = char.index(Q5,'B')>0.

compute Q5.C = char.index(Q5,'C')>0.

execute.

After the recoding, you can group the variables and analyze the frequency distribution. See: https://www.dummies.com/education/math/statistics/creating-and-using-a-multiple-response-set-in-spss/ for details.

2. How to code ranked items?

Ranked items can be coded in a similar way as multiple response items except that you put down the ranks (e.g. 1, 2, 3, 4…) instead of whether it is selected or not (1, 0) as the data. Alternatively, you can use one variable for each rank and put down the option in that rank. For example, if the data under Q4 is A, C, B, D then you can create four variables Q4.1, Q4.2, Q4.3, Q4.4 and put A, C, B, D in them respectively.

3. How to code open-ended items?

Open-ended items can be entered as is. Just remember to use string as variable type, and make sure that the width of the string is long enough to hold your data.

4. How to detect and fix abnormalities in the data?

Try using Data -> Validation to quickly check the distribution and range of the data. You can also define rules for SPSS to do the checking. You may also check whether there are strange-looking data, e.g., the same option is chosen in all the items. This probably implies that the respondent was not giving their true responses.

5. Shall I delete the cases with missing data?

Usually not. SPSS is good at handling missing data and for most cases it will ignore the missing data in the analysis. However, in some occasions it may be better off discarding all the incomplete cases (cases with missing data). For example, if most of the answers in the case are missing, then the other answers may not be reliable neither.

6. How to import data from Google Form into SPSS?

You can download your data in CSV format directly from your Google Form, or you can create a spreadsheet from it and then download the spreadsheet as Excel format. In either way, you can import the file into SPSS after download.

7. What should I do if SPSS says the worksheet is empty when I import data from Excel?

This is usually caused by illegal characters (often Chinese characters in non-UTF-8 encoding scheme), or other problems in the Excel file. In that case, try CSV instead. If that does not resolve the problem, try to copy the paste the data from Excel. The shortcoming is that it does not copy the variable names and you have to input the variable names one by one.

8. What should I do if the Chinese characters are not shown properly on import into SPSS?

This is usually due to non-UTF-8 encoding in Chinese characters. You can open the data file in Excel and then save it as another file while changing the character encoding setting (under web options in the save dialogue box) to UTF-8 when you save.

Alternatively, you can tell SPSS to read those non-UTF-8 encoded characters. To do so, close all your data sets and restart SPSS. Then, with an empty data set, choose Edit -> Options -> Language and then change the Locale setting in the Locale's writing system section.

9. What level of measurement should I use for Likert scale data?

Likert scale is ordinal level although some people regard it as scale level.

10. Should I use string or numeric when coding nominal and ordinal data?

Numeric is more desirable. First, you don’t have to worry about mis-spelling if you input your data by hand. Second, some procedures (e.g. ANOVA) requires that the grouping variable be numeric.

11. How to recode string data to numeric?

First, use either Automatic Recode or Recode into Same Variables or Recode into Different Variables to recode the data. Then, in the variable view, change the variable type from string to numeric. (Warning: Do not reverse the order of these two steps. If you change the variable type to numeric first, SPSS will convert all your string data to missing. You will end up losing your data!)

12. How can I regroup some of the categories in categorical data (ordinal or nominal)?

You can use recode. For example, if your original data have three categories: 1, 2, 3, and you want to combine 2 and 3 into a new group, then you can use Transform -> Recode into Same Variables to recode 1 to 1 (1->1) and then 2 and 3 to 2 (2->2, 3->2). The new variable will then contain the regrouped categories.

13. Shall I remove the cases that are screened out by my screening questions?

I would suggest keeping them in the data set so that at least you know how many of the cases are screened out. To skip these cases in your analysis, you can use Data -> Select cases to select only the cases you want to analyze.

14. What is the actual difference between nominal and ordinal data in SPSS?

Ordinal data have ordered categories, which is essential if you want SPSS to follow a particular order when displaying these data in tables and charts. If you set ordinal data as nominal, SPSS may get the ordering wrong.

15. Where can I download SPSS for use at home?

SPSS is a proprietary software and you need to pay to use it.

However, you can apply for a 14-day free trial at https://www.ibm.com/hk-en/analytics/spss-trials.

16. What are some good SPSS alternatives?

If you do not want to use SPSS, you may install Jamovi (https://www.jamovi.org/) as an alternative. This software is similar to SPSS but open source. It is not as powerful as SPSS but can be used in case of contingency. Please install version 1.2.0 or above as earlier versions do not support import from Excel.

Some students are using a 32-bit Windows, which is not supported by Jamovi. In that case, you may also try the 32-bit version of JASP (https://jasp-stats.org) developed by the same team of people as Jamovi. However, JASP's user interface is not as intuitive as Jamovi, and thus is more difficult to learn.