NSCC 328 - STATISTICAL SOFTWARE
Lesson 3. Running a Test of Relationship on SPSS
Introduction:
Testing the degree of correlation between two variables is one of the most commonly used of all statistical methods. A test of correlation establishes whether there is a linear relationship between two different variables. The two variables are usually designated as Y the dependent, outcome, or response variable and X the independent, predictor, or explanatory variable. The correlation coefficient r has a number of limitations. First, in most applications, r tests whether there is a linear relationship between the two variables. If there is a considerable degree of nonlinearity present, a nonsignificant r could result even though there is in reality a more complex relationship between X and Y. Pearson's correlation coefficient (r) is one of the most widely used of all statistics.
[1] PEARSON R CORRELATION COEFFICIENT
In this quick SPSS tutorial, we’ll look at how to calculate the Pearson correlation coefficient in SPSS, and how to interpret the result.
For the purposes of this class, we’re using a data set below:
We’re interested in two variables, Score and Time.
Score is the number of questions that people get right. Time is the amount of time in seconds it takes them to complete the test. We want to find out if these two things are correlated. Put simply, do people get more questions right if they take longer answering each question?
Pearson’s correlation coefficient will help us to answer this question.
Assumptions of the Test
To use Pearson correlation, your data must meet the following requirements:
Two or more continuous variables (i.e., interval or ratio level)
Cases must have non-missing values on both variables
Linear relationship between the variables
Independent cases (i.e., independence of observations)
There is no relationship between the values of variables between cases. This means that: the values for all variables across cases are unrelated
for any case, the value for any variable cannot influence the value of any variable for other cases
no case can influence another case on any variable
The bivariate Pearson correlation coefficient and corresponding significance test are not robust when independence is violated.
Bivariate normality
Each pair of variables is bivariate normally distributed
Each pair of variables is bivariate normally distributed at all levels of the other variable(s)
This assumption ensures that the variables are linearly related; violations of this assumption may indicate that non-linear relationships among variables exist. Linearity can be assessed visually using a scatterplot of the data.
Random sample of data from the population
No outliers
Data Set-Up
Your dataset should include two or more continuous numeric variables, each defined as scale, which will be used in the analysis.
Each row in the dataset should represent one unique subject, person, or unit. All of the measurements taken on that person or unit should appear in that row. If measurements for one subject appear on multiple rows -- for example, if you have measurements from different time points on separate rows -- you should reshape your data to "wide" format before you compute the correlations.
Run a Bivariate Pearson Correlation
To run a bivariate Pearson Correlation in SPSS, click Analyze > Correlate > Bivariate.
The Bivariate Correlations window opens, where you will specify the variables to be used in the analysis. All of the variables in your dataset appear in the list on the left side. To select variables for the analysis, select the variables in the list on the left and click the blue arrow button to move them to the right, in the Variables field.
A. Variables: The variables to be used in the bivariate Pearson Correlation. You must select at least two continuous variables, but may select more than two. The test will produce correlation coefficients for each pair of variables in this list.
B. Correlation Coefficients: There are multiple types of correlation coefficients. By default, Pearson is selected. Selecting Pearson will produce the test statistics for a bivariate Pearson Correlation.
C. Test of Significance: Click Two-tailed or One-tailed, depending on your desired significance test. SPSS uses a two-tailed test by default.
D. Flag significant correlations: Checking this option will include asterisks (**) next to statistically significant correlations in the output. By default, SPSS marks statistical significance at the alpha = 0.05 and alpha = 0.01 levels, but not at the alpha = 0.001 level (which is treated as alpha = 0.01)
E. Options: Clicking Options will open a window where you can specify which Statistics to include (i.e., Means and standard deviations, Cross-product deviations and covariances) and how to address Missing Values (i.e., Exclude cases pairwise or Exclude cases listwise). Note that the pairwise/listwise setting does not affect your computations if you are only entering two variables, but can make a very large difference if you are entering three or more variables into the correlation procedure.
Example:
PROBLEM STATEMENT
Score is the number of questions that people get right. Time is the amount of time in seconds it takes them to complete the test. We want to find out if these two things are correlated. Put simply, do people get more questions right if they take longer answering each question?
You can use a bivariate Pearson Correlation to test whether there is a statistically significant linear relationship between Time and Score, and to determine the strength and direction of the association.
BEFORE THE TEST
In the sample data, we will use two variables: “Time” and “Score.” The variable “Time” is a continuous measure exhibits a range of values from 160 to 974 (Analyze > Descriptive Statistics > Descriptive). The variable “Score” is a continuous measure of weight in pounds and exhibits a range of values from 8.00 to 15.00
Before we look at the Pearson correlations, we should look at the scatterplots of our variables to get an idea of what to expect. In particular, we need to determine if it's reasonable to assume that our variables have linear relationships. Click Graphs > Legacy Dialogs > Scatter/Dot. In the Scatter/Dot window, click Simple Scatter. Move variable Score to the X Axis box, and move variable Time to the Y Axis box. When finished, click OK.
From the scatterplot, we can’t see some linear relationship. Usually if the result looks like this, we don’t run the bivariate Pearson Correlation anymore but for our lecture purposes and to prove if this true or not, we run a bivariate Pearson Correlation. To do that, click Analyze >> Correlate >> Bivariate. Output for the analysis will display in the Output Viewer.
Syntax
CORRELATIONS
/VARIABLES=score time
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
OUTPUT
Tables. The results will display the correlations in a table, labeled Correlations.
DECISION AND CONCLUSIONS
Based on the results, we can state the following:
Score and Time have no statistically significant linear relationship (r=.133, p > .01).
The magnitude, or strength, of the association is approximately negligible (0.0 < | r | < .2).
Example 2:
Test if there exist a significant relationship between Age and Income. Use the following data below:
Note: Use alpha = 0.05
SPEARMAN RANK CORRELATION
[2] Spearman Rank Correlation
Assumptions of the Test
Steps on how to do the test
Introduction:
The Spearman rank-order correlation coefficient (Spearman’s correlation, for short) is a nonparametric measure of the strength and direction of association that exists between two variables measured on at least an ordinal scale. It is denoted by the symbol rs (or the Greek letter ρ, pronounced rho). The test is used for either ordinal variables or for continuous data that has failed the assumptions necessary for conducting the Pearson's product-moment correlation. For example, you could use a Spearman’s correlation to understand whether there is an association between exam performance and time spent revising; whether there is an association between depression and length of unemployment; and so forth. If you would like some more background information about this test, which does not include instructions for SPSS Statistics, see our more general statistical guide: Spearman's rank-order correlation. Possible alternative tests to Spearman's correlation are Kendall's tau-b or Goodman and Kruskal's gamma.
This "quick start" guide shows you how to carry out a Spearman’s correlation using SPSS Statistics. We show you the main procedure to carry out a Spearman’s correlation in the Procedure section. First, we introduce you to the assumptions that you must consider when carrying out a Spearman’s correlation.
Assumptions
When you choose to analyze your data using Spearman’s correlation, part of the process involves checking to make sure that the data you want to analyze can actually be analyzed using a Spearman’s correlation. You need to do this because it is only appropriate to use a Spearman’s correlation if your data "passes" three assumptions that are required for Spearman’s correlation to give you a valid result. In practice, checking for these three assumptions just adds a little bit more time to your analysis, requiring you to click of few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task. These three assumptions are:
Assumption #1:
Your two variables should be measured on an ordinal, interval or ratio scale. Examples of ordinal variables include Likert scales (e.g., a 7-point scale from "strongly agree" through to "strongly disagree"), amongst other ways of ranking categories (e.g., a 3-pont scale explaining how much a customer liked a product, ranging from "Not very much", to "It is OK", to "Yes, a lot"). Examples of interval/ratio variables include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. You can learn more about ordinal, interval and ratio variables in our article: Types of Variable.
Assumption #2:
Your two variables represent paired observations. For example, imagine that you were interested in the relationship between daily cigarette consumption and amount of exercise performed each week. A single paired observation reflects the score on each variable for a single participant (e.g., the daily cigarette consumption of "Participant 1" and the amount of exercise performed each week by "Participant 1"). With 30 participants in the study, this means that there would be 30 paired observations.
Assumption #3:
There is a monotonic relationship between the two variables. A monotonic relationship exists when either the variables increase in value together, or as one variable value increases, the other variable value decreases. Whilst there are a number of ways to check whether a monotonic relationship exists between your two variables, we suggest creating a scatterplot using SPSS Statistics, where you can plot one variable against the other, and then visually inspect the scatterplot to check for monotonicity. Your scatterplot may look something like one of the following:
Just remember that if you do not test these assumptions correctly, the results you get when running a Spearman's correlation might not be valid.
Note: Spearman's correlation determines the degree to which a relationship is monotonic. Put another way, it determines whether there is a monotonic component of association between two continuous or ordinal variables. As such, monotonicity is not actually an assumption of Spearman's correlation. However, you would not normally want to pursue a Spearman's correlation to determine the strength and direction of a monotonic relationship when you already know the relationship between your two variables is not monotonic. Instead, the relationship between your two variables might be better described by another statistical measure of association. For this reason, it is not uncommon to view the relationship between your two variables in a scatterplot to see if running a Spearman's correlation is the best choice as a measure of association or whether another measure would be better.
It is also worth noting that a Spearman’s correlation can be used when your two variables are not normally distributed. It is also not very sensitive to outliers, which are observations within your data that do not follow the usual pattern. Since Spearman’s correlation is not very sensitive to outliers, this means that you can still obtain a valid result from using this test when you have outliers in your data.
Example
A teacher is interested in whether those who do better at English also do better in Math. To test whether this is the case, the teacher records the scores of her 10 students in their end-of-year examinations for both English and Math. Therefore, one variable records the English scores and the second variable records the Math scores for the 10 pupils.
Setup in SPSS Statistics
In SPSS Statistics, we created two variables so that we could enter our data: English_Mark (i.e., English scores) and Maths_Mark (i.e., maths scores). In our enhanced Spearman's correlation guide, we show you how to correctly enter data in SPSS Statistics to run a Spearman's correlation. You can learn about our enhanced data setup content on our Features: Data Setup page. Alternately, see our generic, "quick start" guide: Entering Data in SPSS Statistics.
Test Procedure in SPSS Statistics
The four steps below show you how to analyze your data using Spearman’s correlation in SPSS Statistics when neither of the three assumptions in the previous section, Assumptions, have been violated. At the end of these four steps, we show you how to interpret the results from this test. If you are looking for help to assess whether the relationship between your two variables is monotonic, we show you how to do this in our enhanced guide. You can learn more about our enhanced guides on our Features: Overview page.
Click Analyze > Correlate > Bivariate... on the main menu as shown below:
You will be presented with the following Bivariate Correlations dialogue box:
Transfer the variables English_Mark and Maths_Mark into the Variables: box by dragging-and-dropping the variables or by clicking each variable and then clicking on the Right arrow button. You will end up with a screen similar to the one below:
Make sure that you uncheck the Pearson checkbox (it is selected by default in SPSS Statistics) and select the Spearman checkbox in the –Correlation Coefficients– area. You will end up with a screen similar to below:
Click on the OK button. This will generate the results.
Output
SPSS Statistics generates a single table following the Spearman’s correlation procedure that you ran in the previous section. If your data passed assumption #3 (i.e., there is a monotonic relationship between your two variables), you will only need to interpret this one table. However, since you should have tested your data for monotonicity, you will also need to interpret the SPSS Statistics output that was produced when you tested for it (i.e., your scatterplot results). If you do not know how to do this, we show you in our enhanced Spearman’s correlation guide. If you have tested your data for these assumptions, we provide a complete explanation of the output you will have to interpret in our enhanced Spearman’s guide. Remember that if your data failed this assumption, the output that you get from the Spearman’s correlation procedure (i.e., the table we discuss below), might be misleading.
However, in this "quick start" guide, we focus on the results from the Spearman’s correlation procedure only. Therefore, after running the Spearman’s correlation procedure, you will be presented with the Correlations table, as shown below:
The results are presented in a matrix such that, as can be seen above, the correlations are replicated. Nevertheless, the table presents Spearman's correlation, its significance value and the sample size that the calculation was based on. In this example, we can see that Spearman's correlation coefficient, rs, is 0.669, and that this is statistically significant (p = .035).
Reporting the Output
In our example, you might present the results as follows:
General
A Spearman's rank-order correlation was run to determine the relationship between 10 students' English and maths exam marks. There was a strong, positive correlation between English and maths marks, which was statistically significant (rs(8) = .669, p = .035).
Example 2. Test if there exist a linear relationship between the number of cigarettes used and person’s life expectancy at 0.05 level of significance.
KENDALL'S TAU CORRELATION
[3] Kendall’s Tau Correlation
Assumptions of the Test
Steps on how to do the test
Kendall's tau-b (τb) correlation coefficient (Kendall's tau-b, for short) is a nonparametric measure of the strength and direction of association that exists between two variables measured on at least an ordinal scale. It is considered a nonparametric alternative to the Pearson’s product-moment correlation when your data has failed one or more of the assumptions of this test. It is also considered an alternative to the nonparametric Spearman rank-order correlation coefficient (especially when you have a small sample size with many tied ranks).
For example, you could use Kendall's tau-b to understand whether there is an association between exam grade and time spent revising (i.e., where there were six possible exam grades – A, B, C, D, E and F – and revision time was split into five categories: less than 5 hours, 5-9 hours, 10-14 hours, 15-19 hours, and 20 hours or more). Alternately, you could use Kendall's tau-b to understand whether there is an association between customer satisfaction and delivery time (i.e., where delivery time had four categories – next day, 2 working days, 3-5 working days, and more than 5 working days – and customer satisfaction was measured in terms of the level of agreement customers had with the following statement, "I am satisfied with the time it took for my parcel to be delivered", where the level of agreement had five categories: strongly agree, agree, neither agree nor disagree, disagree and strongly disagree).
Assumptions
When you choose to analyze your data using Kendall's tau-b, part of the process involves checking to make sure that the data you want to analyze can actually be analyzed using Kendall's tau-b. You need to do this because it is only appropriate to use Kendall's tau-b if your data "passes" the assumptions that are required for Kendall's tau-b to give you a valid result. In practice, checking for these assumptions just adds a little bit more time to your analysis, requiring you to click of few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task. These two assumptions are:
Assumption #1:
Your two variables should be measured on an ordinal or continuous scale. Examples of ordinal variables include Likert scales (e.g., a 7-point scale from strongly agree through to strongly disagree), amongst other ways of ranking categories (e.g., a 5-point scale explaining how much a customer liked a product, ranging from "Not very much" to "Yes, a lot"). Examples of continuous variables (i.e., interval or ratio variables) include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth.
Assumption #2:
Kendall's tau-b determines whether there is a monotonic relationship between your two variables. As such, it is desirable if your data would appear to follow a monotonic relationship, so that formally testing for such an association makes sense, but it is not a strict assumption or one that you are often able to assess.
Example & Data Setup in SPSS Statistics
Taxes have the ability to elicit strong responses in many people, with some thinking they are too high, whilst others think they should be higher. A researcher conducted a simple study where they presented participants with the statement: "Tax is too high in this country", and asked them how much they agreed with this statement. They had four options how to respond: "Strongly Disagree", "Disagree", "Agree" or "Strongly Agree". These ordered responses were the categories of the dependent variable, tax_too_high. The researcher also asked participants to state whether they had a "low", "middle" or "high" income, where each of these categories had specified income ranges (e.g., a low income was any income under £18,000 per annum). The income level of participants was recorded in the variable, income.
Therefore, in the Variable View of SPSS Statistics two ordinal variables were created so that the data collected could be entered: income and tax_too_high. Next, the data from 24 participants was entered into the Data View of SPSS Statistics.
Test Procedure in SPSS Statistics
The Correlate > Bivariate... procedure below shows you how to analyse your data using Kendall's tau-b in SPSS Statistics when neither of the two assumptions in the previous section, Assumptions, have been violated. At the end of these four steps, we show you how to interpret the results from this test.
[1] Click Analyze > Correlate > Bivariate... on the top menu, as shown below:
You will be presented with the Bivariate Correlations dialogue box, as shown below:
[2] Transfer the variables income and tax_too_high into the Variables: box by clicking on the Right arrow button. You will end up with the following screen:
[3] Deselect the [ ] Pearson checkbox and select the [ ] Kendall's tau-b checkbox in the –Correlation Coefficients– area, as shown below:
[4] Select the Show only the lower triangle checkbox and then deselect the Show diagonal checkbox, as shown below:
[5] Click on the OK button.
Interpreting the Results for Kendall's Tau-b
SPSS Statistics generates one main table for the Kendall's tau-b procedure that you ran in the previous section. If your data passed assumption #2 (i.e., there is a monotonic relationship between your two variables), which we explained earlier in the Assumptions section, you will only need to interpret this one table. However, since you should have tested your data for this assumption, you will also need to interpret the SPSS Statistics output that was produced when you tested for it (i.e., your scatterplot results). Remember that if your data failed this assumption, the output that you get from the Kendall's tau-b procedure (i.e., the table we discuss below) will no longer be correct.
In this "quick start" guide we focus on the results from the Kendall's tau-b procedure only, assuming that your data met this assumption. Therefore, if you ran the Kendall's tau-b procedure in the previous section using SPSS Statistics versions 27 or 28 (or the subscription version of SPSS Statistics), you will be presented with the Correlations table below:
Note: If you ran the Kendall's tau-b procedure using SPSS Statistics version 26 or an earlier version of SPSS Statistics, the Correlations table will look like the one below:
The results in this table are identical to those produced in versions 27 and 28 (and the subscription version of SPSS Statistics), but are simply displayed using a different layout (i.e., the results are displayed in a matrix where the correlations are replicated).
The Correlations table presents Kendall's tau-b correlation, its significance value and the sample size that the calculation was based on. In this example, we can see that Kendall's tau-b correlation coefficient, τb, is 0.535, and that this is statistically significant (p = 0.003).
Reporting the Results for Kendall's Tau-b
In our example, you might present the results as follows:
General
A Kendall's tau-b correlation was run to determine the relationship between income level and views towards income taxes amongst 24 participants. There was a strong, positive correlation between income level and the view that taxes were too high, which was statistically significant (τb = .535, p = .003).
Example 2.
Two interviewers ranked 12 candidates (A through L) for a position. The results from most preferred to least preferred are:
Interviewer 1: ABCDEFGHIJKL.
Interviewer 2: ABDCFEHGJILK.
Calculate the Kendall Tau correlation and test if there is an association between the choices of two interviewer.