Independent samples t-test assumptions

  1. The two independent samples are simple random samples from two distinct populations.

  2. For the two distinct populations:

    • if the sample sizes are small, the distributions are important (should be normal)

    • if the sample sizes are large, the distributions are not important (need not be normal)


Note: The test comparing two independent population means with unknown and possibly unequal population standard deviations is called the Aspin-Welch t-test. The degrees of freedom formula was developed by Aspin-Welch.


The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples. In order to account for the variation, we take the difference of the sample means, X1 - X2, and divide by the standard error in order to standardize the difference. The result is a t-score test statistic.

Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error, of the difference in sample means.

Where: s1 and s2, the sample standard deviations, are estimates of σ1 and σ2, respectively. σ1 and σ1 are the unknown population standard deviations. X1 and X2 are the population means.

The number of degrees of freedom (df) requires a somewhat complicated calculation. However, a computer or calculator calculates it easily. The df are not always a whole number. The test statistic calculated previously is approximated by the Student’s t-distribution with df as follows:

When both sample sizes n1 and n2 are five or larger, the Student’s t approximation is very good. Notice that the sample variances (s1)2 and (s2)2 are not pooled. (If the question comes up, do not pool the variances.)


Note: It is not necessary to compute this by hand. A calculator or computer easily computes it.

__________________________________

EXAMPLE

The average amount of time boys and girls aged seven to 11 spend playing sports each day is believed to be the same. A study is done and data are collected, resulting in the data in the table below. Each populations has a normal distribution.


Is there a difference in the mean amount of time boys and girls aged seven to 11 play sports each day? Test at the 5% level of significance.

Solution:

The population standard deviations are not known. Let g be the subscript for girls and b be the subscript for boys. Then, μg is the population mean for girls and μb is the population mean for boys. This is a test of two independent groups, two population means.

Random variable: Xg - Xb = difference in the sample mean amount of time girls and boys play sports each day.


H0:μg=μb; H0:μg−μb=0


Ha:μg≠μb; Ha: μg-μb≠ 0

The words “the same” tell you H0 has an equal sign. Since there are no other words to indicate Ha, assume it says “is different.” This is a two-tailed test.

Distribution for the test: Use tdf where df is calculated using the df formula for independent groups, two population means. Using a calculator, df is approximately 18.8462. Do not pool the variances.

Calculate the p-value using a Student’s t-distribution: p-value = 0.0054

sg = 0.866

sb = 1

So, Xg- Xb = 2 - 3.2 = -1.2

Half the p-value is below –1.2 and half is above 1.2.

Make a decision: Since α > p-value, reject H0. This means you reject μg = μb. The means are different.

  • Press STAT.

  • Arrow over to TESTS and press 4:2-SampTTest.

  • Arrow over to Stats and press ENTER.

  • Arrow down and enter 2 for the first sample mean,

  • 0.866

  • 0.866

  • for Sx1, 9 for n1, 3.2 for the second sample mean, 1 for Sx2, and 16 for n2.

  • Arrow down to μ1: and arrow to does not equal μ2.

  • PressENTER.

  • Arrow down to Pooled: andNo.

  • Press ENTER.

  • Arrow down to Calculate and press ENTER.

The p-value is p = 0.0054, the dfs are approximately 18.8462, and the test statistic is –3.14.

Do the procedure again but instead of Calculate do Draw.

Conclusion: At the 5% level of significance, the sample data show there is sufficient evidence to conclude that the mean number of hours that girls and boys aged seven to 11 play sports per day is different (mean number of hours boys aged seven to 11 play sports per day is greater than the mean number of hours played by girls OR the mean number of hours girls aged seven to 11 play sports per day is greater than the mean number of hours played by boys).

__________________________________

References:

  1. https://courses.lumenlearning.com/introstats1/chapter/two-population-means-with-unknown-standard-deviations/

CC LICENSED CONTENT, SHARED PREVIOUSLY

ALL RIGHTS RESERVED CONTENT

  • One-tailed and two-tailed tests | Inferential statistics | Probability and Statistics | Khan Academy. Authored by: Khan Academy. Located at: https://www.youtube.com/embed/mvye6X_0upA. License: All Rights Reserved. License Terms: Standard YouTube License