17.1 Uses and Requirements
From the preceding chapters, recall that both the z-test and the one-sample t-Test are used to compare a sample mean to a known or assumed population mean, respectively, with the intention being to determine whether that sample mean differs significantly from that value. In both cases there was no independent variable being manipulated; rather, subjects in a sample were selected because they were a unique group from a population, and that unique group was compared to the general population from which it came.
The limitation of the z-Test and one-sample t-Test is they cannot be used to compare performance between levels of an independent variable. To do this you must use a bivariate test. In this chapter I will cover examples where an independent variable is manipulated between groups of individuals. The type of statistical analysis used in such cases is an independent-groups t-Test. As you will see, the procedures, assumptions, and test are not all that much different here than in the one sample t-Test.
17.2 Mean Differences and Variance between Independent Populations
The independent groups t-Test is used to determine whether two sample means, drawn from different populations, are statistically different. Before getting into the independent groups t-Test, I introduce a few basic assumptions about the population distributions that each sample came from and the sampling distributions of the means of those populations.
First, in an independent groups design there are different samples (groups) of subjects (hence the name, 'independent groups'). It is assumed the subjects in each group are identical except for the level of the independent variable that the subjects in each group are exposed. Subjects in each sample are assumed to come from a different population; that is, subjects in “Sample/Group A” are assumed to have come from a “Population A” and subjects in “Sample/Group B” are assumed to have come from a “Population B.” These populations are assumed to differ only in the independent variable. In short, it is assumed that you have two independent samples, each coming from their own independent population, which differ only by the levels of some independent variable. Thus, any difference found between the sample means should also exist between the population means, and any difference between the populations means must be due to the difference in the independent variable.
You can visualize this scenario in the figure below, which presents two population distributions. Subjects in each population are assumed to be equal, except for the level of some independent variable they are being exposed to. The distance between the peaks (means) of the two populations is assumed to be due to the difference in the independent variable. Thus, the amount that the distributions overlap reflects how much (or little) of an effect the independent variable has on whatever measure of performance is being measured: The more overlap, the less of an effect the independent variable has.
Recall from Chapter 9, that if you were to generate all samples of a specific size (n) from a population and calculate the mean of each of those samples, you would create a 'sampling distribution of the mean'. This sampling distribution of the mean approximates a normal distribution as n à ∞, as per the central limit theorem. The same is true if you have two populations and you sample every possible combination of n scores from population A and every sample every possible combination of n scores from population B: You end up with a sampling distribution of means for population A and a second sampling distributions of means for population B. Each of these sampling distributions of the mean has a mean equal to the μ of its parent population, and has a standard deviation equal to σ√n, the standard error of the mean. If we assume the sample mean is the best approximation to μ for each population; then any difference between sample means should reflect the true difference between population means. Thus, any difference in the sample means can be through of as the effect of the independent variable!
Also, recall that inferential tests are ratios of some measure of effect to some measure of error. The difference in sample means can be thought of as the effect of the independent variable. So what is the error? Maybe we could use the standard error? Nice guess...but, the standard error of the mean for each population represents only one population. Because we have two populations we need to account for the variance in both populations, or the variance in the difference between the populations.
Consider this: If you were to generate every sample of a certain size (n) from a population you end up with a sampling distribution of means, right? So, if you did this for both populations, but calculated the difference between every possible pair of sample means of a given sample size (n) you would end up with a sampling distribution of mean differences between independent groups. That is, if you sampled every possible combination of n scores from population A and every possible combination of n scores from population B, then calculated the difference between every possible pair of those sample means, you end up with a single distribution. This single distribution would be a distribution of mean differences (not sample means) between samples of size n.
Say that you generate a sample from population A and find it has a sample mean of 10, and you generate a sample of the same size from population B and find it has a sample mean of 11. The differences (subtracting B from A and A from B) are 10 – 11 = -1 and 11 – 10 = +1 If you continue to do this for every possible combination of n scores from population A and population B, you will find that some differences are greater than zero and some differences are less than zero, but overall the positive differences would cancel out the negative differences and vice versa. Also, after calculating every possible difference between sample means you would find that the resulting sampling distributing of mean differences between independent groups has a mean difference equal to the difference between the popoulation means (μA - μB). The standard deviation of this sampling distributing of mean differences between independent groups is equal to the standard error of the mean difference between independent groups (or simply the standard error of the difference):
Each subscript (1 and 2) refers to a different population (I used A and B above). The standard error of the difference basically combines the variance from each independent population into one measure. Conceptually, the standard error of the difference is the 'average difference between sample means of a given sample size (n)'. Note that the standard error of the difference is actually just the summed standard error of the mean for each distribution:
However, this formula is only useful when several conditions are met: (1) The population variances are known for both populations, which is almost never the case. (2) The variances in each population are equal, which even if known, is almost never the case. (3) The N’s are equal. If just one of these three conditions is not met you must estimate the standard error of the mean difference between independent groups (or simply estimate the standard error of the difference). Because you will almost never know σ2 for the populations and because the values will almost never be equal, you will most certainly need to estimate the standard error of the difference.
17.3 Pooled Variance and the Standard Error of the Difference
Calculating the estimated standard error of the difference is a two-step process. First, you calculate the pooled variance estimate, which is the combined average estimated variance for both populations based on samples of size n1 and n2:
The subscripts 1 and 2, reference different samples. It does not matter which sample is labeled '1' and which sample is labeled '2', so long as you are consistent. The pooled variance estimate is the average estimated variability between the two independent samples. Because the product of an estimated population variance from a sample and the degrees of freedom of that sample is equal to the sum of squares for a sample, the pooled variance estimate can also be calculated as follows:
The pooled variance estimate is then used in calculating the standard error of the difference:
This is the error for the test statistic in an independent group t-Test. It represents the estimated average difference between sample means of size n1 and n2 that were selected from independent populations.
17.4 Hypotheses in the Independent Groups Design
Recall from the preceding section that the mean of the sampling distribution of the mean differences between independent samples will have a mean equal to the difference between the population means (μA - μB). Usually, it is assumed that this difference between the population means is equal to zero, which would indicate there is no effect of some independent variable (i.e., μA - μB = 0) . Any deviation from zero (the population difference) between the two sample means is assumed to be due to the independent variable. Nonetheless, under the null hypothesis the difference between means from two independent groups is generally expected to be zero:
H0: μ1 - μ2 = 0 which is the same as H0: μ1 = μ2
In short, if under the null hypothesis the two population means are expected to be equal, then the alternative hypothesis is:
H1: μ1 - μ2 ≠ 0 which is the same as H1: μ1 ≠ μ2
Thus, under the alternative hypothesis the difference between the two population means is not expected to be zero. The null and alternate hypotheses above reflect non-directional (two-tailed) hypotheses; that is, the alternative hypothesis states there would be some difference between the two means, but not state specifically whether the the mean of population A would be greater/less than the mean of population B. It is also possible to generate directional (one-tailed) hypotheses for an independent group t-Test. For example, say I predict the mean of group 1 to be greater than the mean of group 2. The null and alternative hypotheses are:
H0: μ1 - μ2 ≤ 0 which is the same as H0: μ1 ≤ μ2
H1: μ1 - μ2 > 0 which is the same as H1: μ1 > μ2
The alternative hypothesis is predicting the mean of group 1 will be greater than the mean of group 2; hence, the difference between mean 1 and mean 2 will be greater than zero. The null hypothesis is stating that mean 1 will be less than or equal to mean 2; that is, the difference between mean 1 and mean 2 will be less than or equal to zero. Note that these hypotheses will be set up the same way for the correlated sample t-Test covered in the next chapter.
17.5 Degrees of Freedom in the Independent Groups t-Test
Why did I make a section dedicated to degrees of freedom? Because, to put it bluntly, this is the one thing I specifically cover in every statistics class and ~50% of students still manage to get it wrong. PAY ATTENTION MAN!
Recall in Chapter 9 I said that degrees of freedom are equal to n – 1 per independent sample. In an independent groups design you have two independent samples of subjects. Each group has their own n – 1 degrees of freedom. For example, say group/sample A has n = 10 people and group/sample B has n = 12 people. In group A, df = 10 – 1 = 9 and in group B, df = 12 – 1 = 11 When dealing with an independent groups t-Test (and when determining the critical t-Value) we need to account for the total degrees of freedom (dfT). Most students assume the total degrees of freedom are equal to the total number of subjects across both groups/samples minus one. Thus, most students guess that the degrees of freedom in this example would be df = nT – 1 = 22 – 1 = 21, because there are nT = 22 people. But, in this case, you have only accounted for 21 of the 20 total degrees of freedom!
Why are there 20 degrees of freedom and not 21? Remember, the total degrees of freedom are equal to n – 1 for each independent sample/group. Because there are two groups we need to account for the degrees of freedom in each group. Thus, there are 9 degrees of freedom in group A and 11 degrees of freedom in group B, so the total degrees of freedom is df = 9 + 11 = 20 An easier way to calculate the degrees of freedom in an independent groups design is df = nT – 2 where nT is the total number of subjects tested across both groups (22). Hence, dfT = 22 – 2 = 20
17.6 Example of the Independent Groups t-Test
Okay, now we get an example of the independent groups t-Test. Many colleges and universities require students to take a “Freshman Seminar” course, where new students are acclimated to college life and taught study skills and time-management skills. Let’s say at Faber College, all freshmen are required to take such a course and this course has always covered basic study skills. One year, a psychologist who conducts research in learning and cognition develops a new study technique where students acquire study skills working in groups, rather than studying alone. This new system should increase GPA and the psychologist wants to examine whether this is the case.
The researcher randomly samples n = 10 freshmen and puts them into a traditional seminar course (Basic Study) and randomly samples n = 10 different freshmen and puts them into the new seminar course (Group Study). All students complete this course and at the end of their freshman year the GPAs from all 20 students are collected and the means compared between groups. The researcher predicts that the mean GPA in the Group Study condition will be greater than the mean GPA in the Basic Study condition. Thus, the hypotheses are:
H0: μGroup ≤ μBasic
H1: μGroup > μBasic
The data (GPAs) for both groups at the end of the semester are presented in the table below. You can see there are n = 10 different students in each condition and each GPA is measured to the nearest thousandth.
Before calculating any means and performing the t-Test, we first need to determine our critical t-Value. To determine tα you need the alpha level (α), whether you have a directional or non-directional hypothesis, and the degrees of freedom. Above, I stated that the researcher was predicting the Group Study Condition to have a greater mean GPA than the Basic Study Condition; thus, this is a directional (one-tailed) hypothesis. Say that the psychologist selects an alpha-level of α = .01 The total degrees of freedom are equal to the total number of subjects in this study minus two, thus dfT = 20 – 2 = 18. From the t-Tables, the critical t-Value is tα = +2.552
The steps for conducting the independent groups t-Test, which are going to be similar to those for the correlated samples t-test in the next chapter, are as follows:
Steps 1 – 3 have been alluded to earlier and the formulas displayed in previous sections, so they will not be elaborated on here. First, we need to calculate the means and sums of squares of each group. This is done in the table below, using the data from above:
Using the sums of squares, we can calculate the pooled variance estimate:
Now that we have the pooled variance estimate we can estimate the standard error of the mean difference between independent groups:
This value (0.283) is the estimated standard error of the difference for the t-Test; that is, it is the average estimated mean difference between population means for the Basic Study condition and Group Study condition. The next step is to perform the independent group t-Test and calculate our obtained t-Value:
The numerator includes the actual difference between the two sample means and the hypothesized difference between the two population means. The difference between the two population means is usually assumed to be zero under the null hypothesis. In such cases, the term μ1 - μ2 be equal to zero and can be dropped form the t-test:
The next step is substituting in the sample means and performing the t-Test. Now, this is very important: If you have a non-directional (two-tailed) hypothesis, then it does not matter which sample mean that you subtract from the other. This is because, with a non-directional hypothesis, you are predicting 'some' difference, and that difference could be positive or negative, it does not matter. In contrast, when you have a directional (one-tailed) hypothesis, whihc we do in this example, then it does matter which sample mean that you subtract from the other! Follow these rules if you have a directional (one-tailed) hypothesis: (1) If you predict one sample mean to be greater than the other sample mean, then you are predicting a positive difference between the sample means and predicting a positive obtained t-Value. The mean that you are predicting to be greater/larger than the other should be the sample mean with the subscript '1' (the one that comes first) in the numerator for the expression for the t-Test above. (2) If you predict one sample mean to be less than the other sample mean, then you are predicting a negative difference between the sample means and predicting a negative obtained t-Value. The mean that you are predicting to be smaller/less than the other should be the sample mean with the subscript '1' (the one that comes first) in the numerator for the expression for the t-Test above. Please note that for both of these rules, it does not matter whether this is what you find in the means; the placement of the sample means into the t-Test is based on what you predict to find.
Substituting in our sample means and the estimated standard error of the mean difference calculated earlier, we have:
This obtained t-Value (t = 0.813) is our test-statistic. We compare this value to our critical t-Value, which was tα = 2.552 Because the absolute value of our obtained value is smaller than the absolute value of our critical value, we conclude there is not enough evidence to claim a significant difference in GPA between the Basic Study condition and the Group Study condition. That is, the mean difference is not statistically significant. Therefore we retain the null hypothesis and make no decision regarding the alternate hypothesis. In layman’s terms, we might conclude that exposing freshmen students to a new for of Group Study technique resulted in a small non-significant increase in end of year GPAs compared to students taking a Basic Study skills course.
17.6 Reporting in the literature
Several parameters need to be reported with the independent groups t-Test: (a) either both sample means or the mean difference between the sample means, (b) the obtained t-Vlue, (c) the estimated standard error of the mean difference between independent samples, (d) the total degrees of freedom, and (e) the alpha-level. Below, I present a generic example of how the results from Section 17.5:
Twenty Faber College students were randomly sampled and randomly assigned into a Basic Study condition or into a Group Study condition, with n = 10 students per independent group. At the end of these students’ freshman year, the GPA of each student was recorded and the mean GPA in the Basic Study condition was compared to the mean GPA in the Group Study condition. An independent group t-Test revealed a non-significant difference between the mean of the Basic Study condition (M = 2.700) and the Group Study condition (M = 2.930), t(18) = 0.813, SE = 0.283, p < .05 (one-tailed). Thus, students learning to study as a group had a small and non-significant increase on GPA.
17.6 Confidence Intervals Around the Mean Difference
Confidence intervals can be calculated the difference between the sample means. In an independent groups t-Test most researchers calculate confidence interval around each sample mean. To do this, you estimate the standard error of the mean for each sample and using the critical t-Value based on the degrees of freedom for each sample multiply that estimated standard error by tα to get the confidence interval around each sample mean. By doing this you can see if each sample mean lies within the confidence interval of the other sample mean. Others like to calculate confidence interval around the mean difference between the sample means. To do this, you use the standard error of the difference and tα (based on the total degrees of freedom) from the independent groups t-Test. Thus, for the example from Section 17.5, we have (note, because α = .01, this would be the 99% confidence interval around the mean difference) :
Recall that the null hypothesis predicts that the difference between means will be equal to zero. If zero falls within this confidence interval we can assume that the mean difference is not significantly different from the expected mean difference of zero, which is the case in the example here.
17.7 Effect Size
One question that should arise is how much of an effect the independent variable had on the dependent variable; that is, from the example above, is a mean difference of 0.230 large or small? To know how much of an effect the independent variable had on the dependent variable (to know how different the two means really are from each other), we must calculate a measure of effect size. Normally, measures of effect size are more meaningful when there is a significant difference, but they can still be calculated for a non-significant difference, as in our example. The effect size measure called eta-squared (η2) is basically the same as r2, the coefficient of determination, the proportion of variance in the dependent variable attributable to the effect of the independent variable. The effect size is the ratio of total treatment variance ('effect') to total variance in a set of data:
To find the total sum of squares you need to calculate the grand mean over all scores in both groups. Thus, the grand mean is the sum of all scores divided by the total number of scores. In the example from Section 17.5 the sum of all the scores (all GPAs from both groups) was ΣX = 56.300 and the total number of scores (subjects) was nT = 20. Thus, the grand mean is 56.3/20 = 2.815 Note, this is the same value you would get by adding the mean of each condition and dividing by two.
The sum of squares total is found by subtracting the grand mean from each of the individual scores in the data set, squaring these differences, and adding these squared differences. This is done in the table below. The sum of squares effect is calculated by subtracting the grand mean from the sample mean associated with each individual. For example, in our example, the student Stew was in the Group study condition, and that condition had a mean of 2.930 The grand mean (2.815) is subtracted from the mean of Stew’s group (2.930), which results in a treatment effect of .115 for Stew. This is done for each individual in Stew’s group, and the same is done for each individual in the other group with the grand mean being subtracted from the mean for that other group (2.700). Thus, Bob and all of the others in the Basic Study condition have a treatment effect of 2.700 - 2.815 = -.115. This is shown in the table below:
In the table above, I have also calculated sum of squares error (SSError), which is the total variation in the dependent variable that cannot be attributed to the independent variable. This is calculated by subtracting the sample mean of a group from each individual’s score in that group. The sums of squares total, effect, and error are presented in the table above. Notice that SSTotal = SSError + SSError, which will be true all the time, within rounding error. Using these values, the effect size is:
Thus, about 3.5% of the variance in the dependent variable (GPA) can be accounted for by the effect of the independent variable (Study Condition). The proportion of variance that cannot be explained (error) is:
Thus, about 96.5% of the variance cannot be explained by the effect of the independent variable. Of course, this process would take some time, and there is a simpler method for calculating eta-squared, which makes use of you total degrees of freedom and your obtained t-value:
Using the total degrees of freedom and tα values from earlier, eta-squared is:
Another popular index of effect size is Cohen’s d. There are several methods for calculating Cohen’s d, but the most appropriate is to divide the difference in the sample means by the estimated pooled standard deviation (square root of the pooled variance estimate). What is nice about Cohen’s d is that it provides a standardized measure that can be used to compare across different studies. Specifically, Cohen’s d is a measure of the standardized difference between sample means; that is, it is a distance between two sample means in standard deviations.
From our example, the estimated pooled standard deviation is:
Cohen’s d is:
Thus, the two sample means in our example are separated by only 0.363 standard deviations.