1.6. Welch's two-sample t-test: two populations with different variances

In statistics, Welch's t-test, or unequal variances t-test, is a two-sample location test which is used to test the (null) hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, and is an adaptation of Student's t-test, and is more reliable when the two samples have unequal variances and possibly unequal sample sizes. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping.

Comparing Student's versus Welch's Test

When we want to compare the means of two independent groups which both groups of data are sampled from populations that follow a normal distribution. We can choose between using two different tests [1]:

Student’s t-test: this test assumes that both populations have the same variance.
Welch’s t-test: this test does not assume that those two populations have the same variance.

The main difference between the test is the way to compute degree of freedom (df) as it could be seen in next subsection.

Welch's t-test

Suppose are provided 2 T-student distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider μ1 and μ2 to be the population mean, and x̄1 and x̄2 to be the observed sample mean. Here, our null hypothesis could be like this:

Null hypothesis (Ho): There is no difference between the means, i.e., μ1 - μ2 = 0.
Alternate hypothesis (Ha): Then the mean had been affected, i.e., μ1 - μ2 ≠ 0.
And the formula for calculating the t-test score tobs is given by [2, 3]:

Where x1 = first sample mean, x2 = second sample mean, s1 = first sample standard deviation, s2 = second sample standard deviation, n1 = first sample size, n2 = second sample size, and df (degree of freedom) = (s1²/n1 + s2²/n2)2 / { [ (s1² / n1)² / (n1 – 1) ] + [ (s2² / n2)² / (n2 – 1) ] }.

A numerical example of a two-sample Welch's t-test

Let’s consider that the first factory shares 21 samples of ball bearings where the mean diameter of the sample comes out to be 10.5 cm. On the other hand, the second factory shares 25 samples with a mean diameter of 9.5 cm. The first sample has a standard deviation of 1 cm, but the second has 2 cm:

Step 1: Pose the research question and determine the proper statistical test.

The company wants to determine if the performance of the employees in Factory 1 is different from the performance of the employees in the Factory 2. To do this, we will use a two-sample t-test for means.

Step 2: Obtain the samples statistics from the two factories (populations).

Factory 1: x̄1 = 10.5, s1 = 1.
Factory 2: x̄2 = 9.5, s2 = 2.

Step 3: Formulate the null and alternate hypotheses and set the level of significance for the test.
- Null hypothesis (Ho): There is no difference between the performance of employees at different Factories. There is no difference between the means, i.e., μ1 - μ2 = 0.
- Alternate hypothesis (Ha): There is a difference in the performance of the employees. Then the mean had been affected, i.e., μ1 - μ2 ≠ 0.
- We will perform two-tailed test using confidence level (alpha), α = 5%.
- Df (degree of freedom) = df (degree of freedom) = (s1²/n1 + s2²/n2)2 / { [ (s1² / n1)² / (n1 – 1) ] + [ (s2² / n2)² / (n2 – 1) ] } = 42.854 ≈ 40.

Step 4: Use the formula for two-sample t-test for means to calculate the z-test statistic tobs.
- tobs = (x̄1 – x̄2 ) / √((s1 )²/n1 + (s2)²/n2)
- tobs = (1) / √((1)²/21 + (2)²/25))
- tobs = 2.194

Step 5: Compare tobs with the critical value t α/2 from the next table [4].

Step 6: Conclude that Since tobs = 2.194 is higher than critical value t α/2 = 2.021, then we can reject the Null hypothesis.

A numerical example of a two-sample Welch's t-test in Python

First, let's create a special function to compute Welch's t-test degree of freedom.

import numpy as np

def welchs_degrees_of_freedom(s1_sq, s2_sq, n1, n2):

numerator = (s1_sq/n1 + s2_sq/n2)**2

denominator = ((s1_sq/n1)**2 / (n1 - 1)) + ((s2_sq/n2)**2 / (n2 - 1))

df = numerator / denominator

return df

# Example of use:

#s1_sq = np.var([1, 2, 3, 4, 5], ddof=1) # variance sample 1

#s2_sq = np.var([2, 4, 6, 8, 10], ddof=1) # variance sample 2

s1_sq = 1 # variance sample 1

s2_sq = 2 # variance sample 2

n1 = 21 # sample size 1

n2 = 25 # sample size 2

df = welchs_degrees_of_freedom(s1_sq, s2_sq, n1, n2)

print(f"Degrees of freedom: {df}")

Degrees of freedom: 42.854415274463015

Now, it is possible to apply all previous Python code employed to perform a two-sample t-test.

import numpy as np

import scipy.stats as stats

import matplotlib.pyplot as plt

# Step 2: Sample statistics

x1 = 10.5 # Sample mean of Factory 1

s1 = 1 # Sample standard deviation of Factory 1

n1 = 21 # Sample size of Factory 1

x2 = 9.5 # Sample mean of Factory 2

s2 = 2 # Sample standard deviation of Factory 2

n2 = 25 # Sample size of Factory 2

# Degrees of freedom

df = welchs_degrees_of_freedom(s1, s2, n1, n2)

# Step 4: Calculate the t-test statistic

t_obs = (x1 - x2) / np.sqrt((s1 ** 2) / n1 + (s2 ** 2) / n2)

print(f"Tobs: {t_obs:.3f}")

# Step 5: Critical values for a two-tailed test at alpha = 0.05

alpha = 0.05

t_critical = stats.t.ppf(1 - alpha/2, df)

print(f"Critical value (t_alpha/2): ±{t_critical:.3f}")

# Step 6: Conclusion

if abs(t_obs) > t_critical:

conclusion = "Reject the null hypothesis"

else:

conclusion = "Fail to reject the null hypothesis"

print(conclusion)

# Plotting the results

x = np.linspace(-4, 4, 1000)

y = stats.norm.pdf(x)

plt.figure(figsize=(10, 6))

plt.plot(x, y, label='Standard Normal Distribution')

# Critical regions

plt.fill_between(x, y, where=(x < -t_critical) | (x > t_critical), color='red', alpha=0.3, label='Critical regions')

# Tobs line

plt.axvline(t_obs, color='blue', linestyle='--', label=f'Tobs = {t_obs:.3f}')

plt.text(t_obs - 0.1, max(y)*0.5, f'Tobs = {t_obs:.3f}', color='blue', ha='right')

# Critical region text

plt.text(-t_critical, max(y)*0.1, f'Critical region: -{t_critical:.3f}', color='red', ha='center')

plt.text(t_critical, max(y)*0.1, f'Critical region: {t_critical:.3f}', color='red', ha='center')

# Formatting the plot

plt.title('Two-Tailed t-Test')

plt.xlabel('t-value')

plt.ylabel('Probability Density')

plt.legend()

plt.grid(True)

plt.show()

Tobs: 2.195

Critical value (t_alpha/2): ±2.017

Reject the null hypothesis

The Python code with the data, and detailed computation to employ two-sample Welch's t-test to verify if the two classes have the same mean is given at:

https://colab.research.google.com/drive/15NUdabuGffqYGDkIXwTGFOBEpwIzQhoS?usp=sharing

Reference

[1] https://www.statology.org/welchs-t-test/

[2] https://www.datacamp.com/tutorial/an-introduction-to-python-t-tests

[3] https://medium.com/codex/hypothesis-testing-paired-and-unpaired-two-sample-t-test-and-z-test-in-r-python-and-google-c7fff80d01ac

[4] https://jimgrange.wordpress.com/2015/12/05/statistics-tables-where-do-the-numbers-come-from/

Page updated

Google Sites

Report abuse