NSCC 328 - STATISTICAL SOFTWARE
Lesson 3. Running a Test of Difference on SPSS
One-sample T-test
Assumptions of the Test
Steps on how to do the test
The one-sample t-test is used to determine whether a sample comes from a population with a specific mean. This population mean is not always known, but is sometimes hypothesized. For example, you want to show that a new teaching method for pupils struggling to learn English grammar can improve their grammar skills to the national average. Your sample would be pupils who received the new teaching method and your population mean would be the national average score. Alternately, you believe that doctors that work in Accident and Emergency (A & E) departments work 100 hour per week despite the dangers (e.g., tiredness) of working such long hours. You sample 1000 doctors in A & E departments and see if their hours differ from 100 hours.
This "quick start" guide shows you how to carry out a one-sample t-test using SPSS Statistics, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a one-sample t-test to give you a valid result. We discuss these assumptions next.
When you choose to analyze your data using a one-sample t-test, part of the process involves checking to make sure that the data you want to analyze can actually be analyzed using a one-sample t-test. You need to do this because it is only appropriate to use a one-sample t-test if your data "passes" four assumptions that are required for a one-sample t-test to give you a valid result. In practice, checking for these four assumptions just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task.
Before we introduce you to these four assumptions, do not be surprised if, when analyzing your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met). This is not uncommon when working with real-world data rather than textbook examples, which often only show you how to carry out a one-sample t-test when everything goes well! However, don’t worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First, let’s take a look at these four assumptions:
Assumption #1: Your dependent variable should be measured at the interval or ratio level (i.e., continuous). Examples of variables that meet this criterion include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. You can learn more about interval and ratio variables in our article: Types of Variable.
Assumption #2: The data are independent (i.e., not correlated/related), which means that there is no relationship between the observations. This is more of a study design issue than something you can test for, but it is an important assumption of the one-sample t-test.
Assumption #3: There should be no significant outliers. Outliers are data points within your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a small variation between students, one student had a score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The problem with outliers is that they can have a negative effect on the one-sample t-test, reducing the accuracy of your results. Fortunately, when using SPSS Statistics to run a one-sample t-test on your data, you can easily detect possible outliers. In our enhanced one-sample t-test guide, we: (a) show you how to detect outliers using SPSS Statistics; and (b) discuss some of the options you have in order to deal with outliers.
Assumption #4: Your dependent variable should be approximately normally distributed. We talk about the one-sample t-test only requiring approximately normal data because it is quite "robust" to violations of normality, meaning that the assumption can be a little violated and still provide valid results. You can test for normality using the Shapiro-Wilk test of normality, which is easily tested for using SPSS Statistics. In addition to showing you how to do this in our enhanced one-sample t-test guide, we also explain what you can do if your data fails this assumption (i.e., if it fails it more than a little bit).
You can check assumptions #3 and #4 using SPSS Statistics. Before doing this, you should make sure that your data meets assumptions #1 and #2, although you don't need SPSS Statistics to do this. When moving on to assumptions #3 and #4, we suggest testing them in this order because it represents an order where, if a violation to the assumption is not correctable, you will no longer be able to use a one-sample t-test. Just remember that if you do not run the statistical tests on these assumptions correctly, the results you get when running a one-sample t-test might not be valid.
A researcher is planning a psychological intervention study, but before he proceeds he wants to characterize his participants' depression levels. He tests each participant on a particular depression index, where anyone who achieves a score of 4.0 is deemed to have 'normal' levels of depression. Lower scores indicate less depression and higher scores indicate greater depression. He has recruited 40 participants to take part in the study. Depression scores are recorded in the variable dep_score. He wants to know whether his sample is representative of the normal population (i.e., do they score statistically significantly differently from 4.0).
For a one-sample t-test, there will only be one variable's data to be entered into SPSS Statistics: the dependent variable, dep_score, which is the depression score.
The 5-step Compare Means > One-Sample T Test... procedure below shows you how to analyse your data using a one-sample t-test in SPSS Statistics when the four assumptions in the previous section, Assumptions, have not been violated. At the end of these five steps, we show you how to interpret the results from this test. If you are looking for help to make sure your data meets assumptions #3 and #4, which are required when using a one-sample t-test, and can be tested using SPSS Statistic
Click Analyze > Compare Means > One-Sample T Test... on the main menu:
You will be presented with the One-Sample T Test dialogue box, as shown below:
2. Transfer the dependent variable, dep_score, into the Test Variable(s): box by selecting it (by clicking on it) and then clicking on the Transfer button. Enter the population mean you are comparing the sample against in the Test Value: box, by changing the current value of "0" to "4". Keep Estimate effect sizes selected. You will end up with the following screen:
3. Click on the option button. You will be presented with the One-Sample T Test: Options dialogue box, as shown below:
For this example, keep the default 95% confidence intervals and Exclude cases analysis by analysis in the –Missing Values– area.
Note 1: By default, SPSS Statistics uses 95% confidence intervals (labelled as the Confidence Interval Percentage in SPSS Statistics). This equates to declaring statistical significance at the p < .05 level. If you wish to change this you can enter any value from 1 to 99. For example, entering "99" into this box would result in a 99% confidence interval and equate to declaring statistical significance at the p < .01 level. For this example, keep the default 95% confidence intervals.
Note 2: If you are testing more than one dependent variable and you have any missing values in your data, you need to think carefully about whether to select Exclude cases analysis by analysis or Exclude cases listwise) in the –Missing Values– area. Selecting the incorrect option could mean that SPSS Statistics removes data from your analysis that you wanted to include. We discuss this further and what options to select in our enhanced one-sample t-test guide.
4. Click on the button. You will be returned to the One-Sample T Test dialogue box.
5. Click OK button to generate the output.
SPSS Statistics generates two main tables of output for the one-sample t-test that contains all the information you require to interpret the results of a one-sample t-test.
If your data passed assumption #3 (i.e., there were no significant outliers) and assumption #4 (i.e., your dependent variable was approximately normally distributed for each category of the independent variable), which we explained earlier in the Assumptions section, you will only need to interpret these two main tables. However, since you should have tested your data for these assumptions, you will also need to interpret the SPSS Statistics output that was produced when you tested for them (i.e., you will have to interpret: (a) the boxplots you used to check if there were any significant outliers; and (b) the output SPSS Statistics produces for your Shapiro-Wilk test of normality to determine normality). If you do not know how to do this, we show you in our enhanced one-sample t-test guide. Remember that if your data failed any of these assumptions, the output that you get from the one-sample t-test procedure (i.e., the tables we discuss below), will no longer be relevant, and you will need to interpret these tables differently.
However, in this "quick start" guide, we take you through each of the two main tables in turn, assuming that your data met all the relevant assumptions:
You can make an initial interpretation of the data using the One-Sample Statistics table, which presents relevant descriptive statistics:
It is more common than not to present your descriptive statistics using the mean and standard deviation ("Std. Deviation" column) rather than the standard error of the mean ("Std. Error Mean" column), although both are acceptable. You could report the results, using the standard deviation, as follows:
GENERAL:
Mean depression score (3.72 ± 0.74) was lower than the population 'normal' depression score of 4.0.
APA:
Mean depression score (M = 3.72, SD = 0.74) was lower than the population 'normal' depression score of 4.0.
However, by running a one-sample t-test, you are really interested in knowing whether the sample you have (dep_score) comes from a 'normal' population (which has a mean of 4.0). This is discussed in the next section.
The One-Sample Test table reports the result of the one-sample t-test. The top row provides the value of the known or hypothesized population mean you are comparing your sample data to, as highlighted below:
In this example, you can see the 'normal' depression score value of "4" that you entered in earlier. You now need to consult the first three columns of the One-Sample Test table, which provides information on whether the sample is from a population with a mean of 4 (i.e., are the means statistically significantly different), as highlighted below:
Moving from left-to-right, you are presented with the observed t-value ("t" column), the degrees of freedom ("df"), and the statistical significance (p-value) ("Sig. (2-tailed)") of the one-sample t-test. In this example, p < .05 (it is p = .022). Therefore, it can be concluded that the population means are statistically significantly different. If p > .05, the difference between the sample-estimated population mean and the comparison population mean would not be statistically significantly different.
Note: If you see SPSS Statistics state that the "Sig. (2-tailed)" value is ".000", this actually means that p < .0005. It does not mean that the significance level is actually zero.
SPSS Statistics also reports that t = -2.381 ("t" column) and that there are 39 degrees of freedom ("df" column). You need to know these values in order to report your results, which you could do as follows:
GENERAL:
Depression score was statistically significantly lower than the population normal depression score, t(39) = -2.381, p = .022.
APA:
Depression score was statistically significantly lower than the population normal depression score, t(39) = -2.381, p = .022.
The breakdown of the last part (i.e., t(39) = -2.381, p = .022) is as follows:
You can also include measures of the difference between the two population means in your written report. This information is included in the columns on the far-right of the One-Sample Test table, as highlighted below:
This section of the table shows that the mean difference in the population means is -0.28 ("Mean Difference" column) and the 95% confidence intervals (95% CI) of the difference are -0.51 to -0.04 ("Lower" to "Upper" columns). For the measures used, it will be sufficient to report the values to 2 decimal places. You could write these results as:
GENERAL:
Depression score was statistically significantly lower by 0.28 (95% CI, 0.04 to 0.51) than a normal depression score of 4.0, t(39) = -2.381, p = .022.
APA:
Depression score was statistically significantly lower by a mean of 0.28, 95% CI [0.04 to 0.51], than a normal depression score of 4.0, t(39) = -2.381, p = .022.
You can report the findings, without the tests of assumptions, as follows:
GENERAL:
Mean depression score (3.73 ± 0.74) was lower than the normal depression score of 4.0, a statistically significant difference of 0.28 (95% CI, 0.04 to 0.51), t(39) = -2.381, p = .022.
APA:
Mean depression score (M = 3.73, SD = 0.74) was lower than the normal depression score of 4.0, a statistically significant mean difference of 0.28, 95% CI [0.04 to 0.51], t(39) = -2.381, p = .022.
Adding in the information about the statistical test you ran, including the assumptions, you have:
GENERAL:
A one-sample t-test was run to determine whether depression score in recruited subjects was different to normal, defined as a depression score of 4.0. Depression scores were normally distributed, as assessed by Shapiro-Wilk's test (p > .05) and there were no outliers in the data, as assessed by inspection of a boxplot. Mean depression score (3.73 ± 0.74) was lower than the normal depression score of 4.0, a statistically significant difference of 0.28 (95% CI, 0.04 to 0.51), t(39) = -2.381, p = .022.
APA:
A one-sample t-test was run to determine whether depression score in recruited subjects was different to normal, defined as a depression score of 4.0. Depression scores were normally distributed, as assessed by Shapiro-Wilk's test (p > .05) and there were no outliers in the data, as assessed by inspection of a boxplot. Mean depression score (M = 3.73, SD = 0.74) was lower than the normal depression score of 4.0, a statistically significant mean difference of 0.28, 95% CI [0.04 to 0.51], t(39) = -2.381, p = .022.
You can write the result in respect of your null and alternative hypothesis as:
GENERAL:
There was a statistically significant difference between means (p < .05). Therefore, we can reject the null hypothesis and accept the alternative hypothesis.
APA:
There was a statistically significant difference between means (p < .05). Therefore, we can reject the null hypothesis and accept the alternative hypothesis.
The independent-samples t-test (or independent t-test, for short) compares the means between two unrelated groups on the same continuous, dependent variable. For example, you could use an independent t-test to understand whether first year graduate salaries differed based on gender (i.e., your dependent variable would be "first year graduate salaries" and your independent variable would be "gender", which has two groups: "male" and "female"). Alternately, you could use an independent t-test to understand whether there is a difference in test anxiety based on educational level (i.e., your dependent variable would be "test anxiety" and your independent variable would be "educational level", which has two groups: "undergraduates" and "postgraduates").
When you choose to analyze your data using an independent t-test, part of the process involves checking to make sure that the data you want to analyze can actually be analyzed using an independent t-test. You need to do this because it is only appropriate to use an independent t-test if your data "passes" six assumptions that are required for an independent t-test to give you a valid result. In practice, checking for these six assumptions just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task.
Before we introduce you to these six assumptions, do not be surprised if, when analyzing your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met). This is not uncommon when working with real-world data rather than textbook examples, which often only show you how to carry out an independent t-test when everything goes well! However, don't worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First, let's take a look at these six assumptions:
Assumption #1: Your dependent variable should be measured on a continuous scale (i.e., it is measured at the interval or ratio level). Examples of variables that meet this criterion include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. You can learn more about continuous variables in our article: Types of Variable.
Assumption #2: Your independent variable should consist of two categorical, independent groups. Example independent variables that meet this criterion include gender (2 groups: male or female), employment status (2 groups: employed or unemployed), smoker (2 groups: yes or no), and so forth.
Assumption #3: You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves. For example, there must be different participants in each group with no participant being in more than one group. This is more of a study design issue than something you can test for, but it is an important assumption of the independent t-test. If your study fails this assumption, you will need to use another statistical test instead of the independent t-test (e.g., a paired-samples t-test). If you are unsure whether your study meets this assumption, you can use our Statistical Test Selector, which is part of our enhanced content.
Assumption #4: There should be no significant outliers. Outliers are simply single data points within your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a small variation between students, one student had a score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The problem with outliers is that they can have a negative effect on the independent t-test, reducing the validity of your results. Fortunately, when using SPSS Statistics to run an independent t-test on your data, you can easily detect possible outliers. In our enhanced independent t-test guide, we: (a) show you how to detect outliers using SPSS Statistics; and (b) discuss some of the options you have in order to deal with outliers. You can learn more about our enhanced independent t-test guide here.
Assumption #5: Your dependent variable should be approximately normally distributed for each group of the independent variable. We talk about the independent t-test only requiring approximately normal data because it is quite "robust" to violations of normality, meaning that this assumption can be a little violated and still provide valid results. You can test for normality using the Shapiro-Wilk test of normality, which is easily tested for using SPSS Statistics. In addition to showing you how to do this in our enhanced independent t-test guide, we also explain what you can do if your data fails this assumption (i.e., if it fails it more than a little bit). Again, you can learn more here.
Assumption #6: There needs to be homogeneity of variances. You can test this assumption in SPSS Statistics using Levene’s test for homogeneity of variances. In our enhanced independent t-test guide, we (a) show you how to perform Levene’s test for homogeneity of variances in SPSS Statistics, (b) explain some of the things you will need to consider when interpreting your data, and (c) present possible ways to continue with your analysis if your data fails to meet this assumption.
You can check assumptions #4, #5 and #6 using SPSS Statistics. Before doing this, you should make sure that your data meets assumptions #1, #2 and #3, although you don't need SPSS Statistics to do this. When moving on to assumptions #4, #5 and #6, we suggest testing them in this order because it represents an order where, if a violation to the assumption is not correctable, you will no longer be able to use an independent t-test (although you may be able to run another statistical test on your data instead). Just remember that if you do not run the statistical tests on these assumptions correctly, the results you get when running an independent t-test might not be valid.
The concentration of cholesterol (a type of fat) in the blood is associated with the risk of developing heart disease, such that higher concentrations of cholesterol indicate a higher level of risk, and lower concentrations indicate a lower level of risk. If you lower the concentration of cholesterol in the blood, your risk of developing heart disease can be reduced. Being overweight and/or physically inactive increases the concentration of cholesterol in your blood. Both exercise and weight loss can reduce cholesterol concentration. However, it is not known whether exercise or weight loss is best for lowering cholesterol concentration. Therefore, a researcher decided to investigate whether an exercise or weight loss intervention is more effective in lowering cholesterol levels. To this end, the researcher recruited a random sample of inactive males that were classified as overweight. This sample was then randomly split into two groups: Group 1 underwent a calorie-controlled diet and Group 2 undertook the exercise-training programme. In order to determine which treatment programme was more effective, the mean cholesterol concentrations were compared between the two groups at the end of the treatment programmes.
In SPSS Statistics, we separated the groups for analysis by creating a grouping variable called Treatment (i.e., the independent variable), and gave the "diet group" a value of "1" and the "exercise group" a value of "2" (i.e., the two groups of the independent variable). Cholesterol concentrations were entered under the variable name Cholesterol (i.e., the dependent variable). In our enhanced independent t-test guide, we show you how to correctly enter data in SPSS Statistics to run an independent t-test (see here).
The eight steps below show you how to analyze your data using an independent t-test in SPSS Statistics when the six assumptions in the previous section, Assumptions, have not been violated. At the end of these eight steps, we show you how to interpret the results from this test. If you are looking for help to make sure your data meets assumptions #4, #5 and #6, which are required when using an independent t-test, and can be tested using SPSS Statistics, you can learn more here.
Click Analyze > Compare Means > Independent-Samples T Test... on the top menu, as shown below:
You will be presented with the Independent-Samples T Test dialogue box, as shown below:
2. Transfer the dependent variable, Cholesterol, into the Test Variable(s): box, and transfer the independent variable, Treatment, into the Grouping Variable: box, by highlighting the relevant variables and pressing the blue arrow buttons. You will end up with the following screen.
You then need to define the groups (treatments). Click on the "define Groups" button. You will be presented with the Define Groups dialogue box, as shown below:
Enter "1" into the Group 1: box and enter "2" into the Group 2: box. Remember that we labelled the Diet Treatment group as 1 and the Exercise Treatment group as 2.
Note: If you have more than 2 treatment groups in your study (e.g., 3 groups: diet, exercise and drug treatment groups), but only wanted to compared two (e.g., the diet and drug treatment groups), you could type in 1 to Group 1: box and 3 to Group 2: box (i.e., if you wished to compare the diet with drug treatment).
5. Click continue button
6. If you need to change the confidence level limits or change how to exclude cases, click the "options" button. You will be presented with the following:
7. Click the Continue Button button. You will be returned to the Independent-Samples T Test dialogue box.
8. Click the Continue Button button.
SPSS Statistics generates two main tables of output for the independent t-test. If your data passed assumption #4 (i.e., there were no significant outliers), assumption #5 (i.e., your dependent variable was approximately normally distributed for each group of the independent variable) and assumption #6 (i.e., there was homogeneity of variances), which we explained earlier in the Assumptions section, you will only need to interpret these two main tables. However, since you should have tested your data for these assumptions, you will also need to interpret the SPSS Statistics output that was produced when you tested for them (i.e., you will have to interpret: (a) the boxplots you used to check if there were any significant outliers; (b) the output SPSS Statistics produces for your Shapiro-Wilk test of normality to determine normality; and (c) the output SPSS Statistics produces for Levene's test for homogeneity of variances). If you do not know how to do this, we show you in our enhanced independent t-test guide here. Remember that if your data failed any of these assumptions, the output that you get from the independent t-test procedure (i.e., the tables we discuss below) might not be valid and you might need to interpret these tables differently.
This table provides useful descriptive statistics for the two groups that you compared, including the mean and standard deviation.
Unless you have other reasons to do so, it would be considered normal to present information on the mean and standard deviation for this data. You might also state the number of participants that you had in each of the two groups. This can be useful when you have missing values and the number of recruited participants is larger than the number of participants that could be analyzed.
A diagram can also be used to visually present your results. For example, you could use a bar chart with error bars (e.g., where the error bars could use the standard deviation, standard error or 95% confidence intervals). This can make it easier for others to understand your results. Again, we show you how to do this in our enhanced independent t-test guide.
This table provides the actual results from the independent t-test.
You can see that the group means are statistically significantly different because the value in the "Sig. (2-tailed)" row is less than 0.05. Looking at the Group Statistics table, we can see that those people who undertook the exercise trial had lower cholesterol levels at the end of the programme than those who underwent a calorie-controlled diet.
Based on the results above, you could report the results of the study as follows (N.B., this does not include the results from your assumptions tests or effect size calculations):
General
This study found that overweight, physically inactive male participants had statistically significantly lower cholesterol concentrations (5.80 ± 0.38 mmol/L) at the end of an exercise-training programme compared to after a calorie-controlled diet (6.15 ± 0.52 mmol/L), t(38)=2.428, p=0.020.
Introduction
The dependent t-test (called the paired-samples t-test in SPSS Statistics) compares the means between two related groups on the same continuous, dependent variable. For example, you could use a dependent t-test to understand whether there was a difference in smokers' daily cigarette consumption before and after a 6 week hypnotherapy programme (i.e., your dependent variable would be "daily cigarette consumption", and your two related groups would be the cigarette consumption values "before" and "after" the hypnotherapy programme). If your dependent variable is dichotomous, you should instead use McNemar's test.
This "quick start" guide shows you how to carry out a dependent t-test using SPSS Statistics, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a dependent t-test to give you a valid result. We discuss these assumptions next.