From the preceding chapters, recall that both the z-test and the one-sample t-Test are used to compare a sample mean to a known or assumed population mean, respectively, with the intention being to determine whether that sample mean differs significantly from that value. In both cases there was no independent variable being manipulated; rather, subjects in a sample were selected because they were a unique group from a population, and that unique group was compared to the general population from which it came.
The limitation of the z-Test and one-sample t-Test is they cannot be used to compare performance between levels of an independent variable. To do this you must use a bivariate test. In this chapter I will cover examples where an independent variable is manipulated between groups of individuals. The type of statistical analysis used in such cases is an
The independent groups t-Test is used to determine whether two sample means, drawn from different populations, are statistically different. Before getting into the independent groups t-Test, I introduce a few basic assumptions about the population distributions that each sample came from and the sampling distributions of the means of those populations.
First, in an independent groups design there are different samples (groups) of subjects (hence the name, 'independent groups'). It is assumed the subjects in each group are identical You can visualize this scenario in the figure below, which presents two population distributions. Subjects in each population are assumed to be equal,
Recall from Chapter 9, that if you were to generate all samples of a specific size (
Also, recall that inferential tests are ratios of some measure of . The difference in sample means can be thought of as the effect of the independent variable. So what is the error? Maybe we could use the standard error? Nice guess...but, the standard error of the mean for each population represents only one population. Because we have two populations we need to account for the variance in both populations, or the variance in the errordifference between the populations.
Consider this: If you were to generate every sample of a certain size ( n scores from population A and every possible combination of n scores from population B, then calculated the difference between every possible pair of those sample means, you end up with a single distribution. This single distribution would be a distribution of mean differences (not sample means) between samples of size n.
Say that you generate a sample from population A and find it has a sample mean of 10, and you generate a sample of the same size from population B and find it has a sample mean of 11. The differences (subtracting B from A and A from B) are 10 – 11 = -1 and 11 – 10 = +1 If you continue to do this for every possible combination of ):standard error of the difference
Each subscript (1 and 2) refers to a different population (I used A and B above). The standard error of the difference basically combines the variance from each independent population into one measure. Conceptually, the standard error of the difference is the 'average difference between sample means of a given sample size (
or
However, this formula is only useful when several conditions are met: (1) The population variances are known for both populations, which is almost never the case. (2) The variances in each population are equal, which even if known, is almost never the case. (3) The N’s are equal. If just one of these three conditions is not met you must estimate). Because you will almost never know the standard error of the differenceσ^{2} for the populations and because the values will almost never be equal, you will most certainly need to estimate the standard error of the difference.
Calculating the estimated standard error of the difference is a two-step process. First, you calculate the n_{1} and n_{2}:
The subscripts 1 and 2, reference different samples. It does not matter which sample is labeled '1' and which sample is labeled '2', so long as you are consistent. The pooled variance estimate is the average estimated variability between the two independent samples. Because the product of an estimated population variance from a sample and the degrees of freedom of that sample is equal to the sum of squares for a sample, the pooled variance estimate can also be calculated as follows:
The pooled variance estimate is then used in calculating the standard error of the difference:
This is the error for the test statistic in an independent group t-Test. It represents the estimated average difference between sample means of size
Recall from the preceding section that the mean of the sampling distribution of the mean differences between independent samples will have a mean equal to the difference between the population means (μ H
In short, if under the null hypothesis the two population means are expected to be equal, then the alternative hypothesis is: H
Thus, under the alternative hypothesis the difference between the two population means is H H The alternative hypothesis is predicting the mean of group 1 will be greater than the mean of group 2; hence, the difference between mean 1 and mean 2 will be greater than zero. The null hypothesis is stating that mean 1 will be less than or equal to mean 2; that is, the difference between mean 1 and mean 2 will be less than or equal to zero. Note that these hypotheses will be set up the same way for the correlated sample t-Test covered in the next chapter.
Why did I make a section dedicated to degrees of freedom? Because, to put it bluntly, this is the one thing I specifically cover in every statistics class and ~50% of students still manage to get it wrong.
Recall in Chapter 9 I said that degrees of freedom are equal to _{T} – 1 = 22 – 1 = 21, because there are n_{T} = 22 people. But, in this case, you have only accounted for 21 of the 20 total degrees of freedom!
Why are there 20 degrees of freedom and not 21? Remember, the total degrees of freedom are equal to
Okay, now we get an example of the independent groups t-Test. Many colleges and universities require students to take a “Freshman Seminar” course, where new students are acclimated to college life and taught study skills and time-management skills. Let’s say at Faber College, all freshmen are required to take such a course and this course has always covered basic study skills. One year, a psychologist who conducts research in learning and cognition develops a new study technique where students acquire study skills working in groups, rather than studying alone. This new system should increase GPA and the psychologist wants to examine whether this is the case.
The researcher randomly samples H H
The data (GPAs) for both groups at the end of the semester are presented in the table below. You can see there are
Before calculating any means and performing the t-Test, we first need to determine our critical t-Value. To determine
The steps for conducting the independent groups t-Test, which are going to be similar to those for the correlated samples t-test in the next chapter, are as follows: - Determine the mean and sum of squares for each sample
- Using the sums of squares, calculate the estimated variance
- Estimate the standard error of the mean difference between independent groups
- Conduct the independent groups t-Test
- Determine the significance of the t-Test and make decisions about hypotheses.
Steps 1 – 3 have been alluded to earlier and the formulas displayed in previous sections, so they will not be elaborated on here. First, we need to calculate the means and sums of squares of each group. This is done in the table below, using the data from above:
Using the sums of squares, we can calculate the pooled variance estimate:
Now that we have the pooled variance estimate we can estimate the standard error of the mean difference between independent groups:
This value (0.283) is the estimated standard error of the difference for the t-Test; that is, it is the average estimated mean difference between population means for the Basic Study condition and Group Study condition. The next step is to perform the independent group t-Test and calculate our obtained t-Value:
The numerator includes the actual difference between the two sample means and the hypothesized difference between the two population means. The difference between the two population means is usually assumed to be zero under the null hypothesis. In such cases, the term
The next step is substituting in the sample means and performing the t-Test. Now, this is Substituting in our sample means and the estimated standard error of the mean difference calculated earlier, we have:
This obtained t-Value (
Several parameters need to be reported with the independent groups t-Test: (a) either both sample means or the mean difference between the sample means, (b) the obtained t-Vlue, (c) the estimated standard error of the mean difference between independent samples, (d) the total degrees of freedom, and (e) the alpha-level. Below, I present a generic example of how the results from Section 17.5: Results Twenty Faber College students were randomly sampled and randomly assigned into a Basic Study condition or into a Group Study condition, with n = 10 students per independent group. At the end of these students’ freshman year, the GPA of each student was recorded and the mean GPA in the Basic Study condition was compared to the mean GPA in the Group Study condition. An independent group t-Test revealed a non-significant difference between the mean of the Basic Study condition (
Confidence intervals can be calculated the difference between the sample means. In an independent groups t-Test most researchers calculate confidence interval around each sample mean. To do this, you estimate the standard error of the mean for each sample and using the critical t-Value based on the degrees of freedom for each sample multiply that estimated standard error by
Recall that the null hypothesis predicts that the difference between means will be equal to zero. If zero falls within this confidence interval we can assume that the mean difference is not significantly different from the expected mean difference of zero, which is the case in the example here.
One question that should arise is how much of an effect the independent variable had on the dependent variable; that is, from the example above, is a mean difference of 0.230 large or small? To know how much of an effect the independent variable had on the dependent variable (to know how (eta-squaredη^{2}) is basically the same as r^{2}, the coefficient of determination, the proportion of variance in the dependent variable attributable to the effect of the independent variable. The effect size is the ratio of total treatment variance ('effect') to total variance in a set of data:
To find the total sum of squares you need to calculate the n_{T} = 20. Thus, the grand mean is 56.3/20 = 2.815 Note, this is the same value you would get by adding the mean of each condition and dividing by two.
The sum of squares total is found by subtracting the grand mean from each of the individual scores in the data set, squaring these differences, and adding these squared differences. This is done in the table below. The sum of squares effect is calculated by subtracting the grand mean from the sample mean associated with each individual. For example, in our example, the student Stew was in the Group study condition, and that condition had a mean of 2.930 The grand mean (2.815) is subtracted from the mean of Stew’s group (2.930), which results in a
In the table above, I have also calculated SS_{Error}), which is the total variation in the dependent variable that cannot be attributed to the independent variable. This is calculated by subtracting the sample mean of a group from each individual’s score in that group. The sums of squares total, effect, and error are presented in the table above. Notice that SS_{Total} = SS_{Error} + SS_{Error}, which will be true all the time, within rounding error. Using these values, the effect size is:
Thus, about 3.5% of the variance in the dependent variable (GPA) can be accounted for by the effect of the independent variable (Study Condition). The proportion of variance that
Thus, about 96.5% of the variance
Using the total degrees of freedom and
Another popular index of effect size is From our example, the estimated pooled standard deviation is:
Cohen’s d is: Thus, the two sample means in our example are separated by only 0.363 standard deviations. |