1. Concepts & Definitions
1.1. A Review on Parametric Statistics
1.2. Parametric tests for Hypothesis Testing
1.3. Parametric vs. Non-Parametric Test
1.4. One sample z-test and their relation with two-sample z-test
1.5. One sample t-test and their relation with two-sample t-test
1.6. Welch's two-sample t-test: two populations with different variances
1.7. Non-Parametric test for Hypothesis Testing: Mann-Whitney U Test
1.8. Non-Parametric test for Hypothesis Testing: Wilcoxon Sign-Rank Test
1.9. Non-Parametric test for Hypothesis Testing: Wilcoxon Sign Test
1.10. Non-Parametric test for Hypothesis Testing: Chi-Square Goodness-of-Fit
1.11. Non-Parametric test for Hypothesis Testing: Kolmogorov-Smirnov
1.12. Non-Parametric for comparing machine learning
2. Problem & Solution
2.1. Using Wilcoxon Sign Test to compare clustering methods
2.2. Using Wilcoxon Sign-Rank Test to compare clustering methods
2.3. What is A/B testing and how to combine with hypothesis testing?
2.4. Using Chi-Square fit to check if Benford-Law holds or not
2.5. Using Kolmogorov-Smirnov fit to check if Pareto principle holds or not
What is a Welch's t-test?
In statistics, Welch's t-test, or unequal variances t-test, is a two-sample location test which is used to test the (null) hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, and is an adaptation of Student's t-test, and is more reliable when the two samples have unequal variances and possibly unequal sample sizes. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping.
Comparing Student's versus Welch's Test
When we want to compare the means of two independent groups which both groups of data are sampled from populations that follow a normal distribution. We can choose between using two different tests [1]:
Student’s t-test: this test assumes that both populations have the same variance.
Welch’s t-test: this test does not assume that those two populations have the same variance.
The main difference between the test is the way to compute degree of freedom (df) as it could be seen in next subsection.
Welch's t-test
Suppose are provided 2 T-student distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider μ1 and μ2 to be the population mean, and x̄1 and x̄2 to be the observed sample mean. Here, our null hypothesis could be like this:
Null hypothesis (Ho): There is no difference between the means, i.e., μ1 - μ2 = 0.
Alternate hypothesis (Ha): Then the mean had been affected, i.e., μ1 - μ2 ≠ 0.
And the formula for calculating the t-test score tobs is given by [2, 3]:
Where x1 = first sample mean, x2 = second sample mean, s1 = first sample standard deviation, s2 = second sample standard deviation, n1 = first sample size, n2 = second sample size, and df (degree of freedom) = (s1²/n1 + s2²/n2)2 / { [ (s1² / n1)² / (n1 – 1) ] + [ (s2² / n2)² / (n2 – 1) ] }.
A numerical example of a two-sample Welch's t-test
Let’s consider that the first factory shares 21 samples of ball bearings where the mean diameter of the sample comes out to be 10.5 cm. On the other hand, the second factory shares 25 samples with a mean diameter of 9.5 cm. The first sample has a standard deviation of 1 cm, but the second has 2 cm:
Step 1: Pose the research question and determine the proper statistical test.
The company wants to determine if the performance of the employees in Factory 1 is different from the performance of the employees in the Factory 2. To do this, we will use a two-sample t-test for means.
Step 2: Obtain the samples statistics from the two factories (populations).
Factory 1: x̄1 = 10.5, s1 = 1.
Factory 2: x̄2 = 9.5, s2 = 2.
Step 3: Formulate the null and alternate hypotheses and set the level of significance for the test.
Null hypothesis (Ho): There is no difference between the performance of employees at different Factories. There is no difference between the means, i.e., μ1 - μ2 = 0.
Alternate hypothesis (Ha): There is a difference in the performance of the employees. Then the mean had been affected, i.e., μ1 - μ2 ≠ 0.
We will perform two-tailed test using confidence level (alpha), α = 5%.
Df (degree of freedom) = df (degree of freedom) = (s1²/n1 + s2²/n2)2 / { [ (s1² / n1)² / (n1 – 1) ] + [ (s2² / n2)² / (n2 – 1) ] } = 42.854 ≈ 40.
Step 4: Use the formula for two-sample t-test for means to calculate the z-test statistic tobs.
tobs = (x̄1 – x̄2 ) / √((s1 )²/n1 + (s2)²/n2)
tobs = (1) / √((1)²/21 + (2)²/25))
tobs = 2.194
Step 5: Compare tobs with the critical value t α/2 from the next table [4].
Step 6: Conclude that Since tobs = 2.194 is higher than critical value t α/2 = 2.021, then we can reject the Null hypothesis.
A numerical example of a two-sample Welch's t-test in Python
First, let's create a special function to compute Welch's t-test degree of freedom.
import numpy as np
def welchs_degrees_of_freedom(s1_sq, s2_sq, n1, n2):
numerator = (s1_sq/n1 + s2_sq/n2)**2
denominator = ((s1_sq/n1)**2 / (n1 - 1)) + ((s2_sq/n2)**2 / (n2 - 1))
df = numerator / denominator
return df
# Example of use:
#s1_sq = np.var([1, 2, 3, 4, 5], ddof=1) # variance sample 1
#s2_sq = np.var([2, 4, 6, 8, 10], ddof=1) # variance sample 2
s1_sq = 1 # variance sample 1
s2_sq = 2 # variance sample 2
n1 = 21 # sample size 1
n2 = 25 # sample size 2
df = welchs_degrees_of_freedom(s1_sq, s2_sq, n1, n2)
print(f"Degrees of freedom: {df}")
Degrees of freedom: 42.854415274463015
Now, it is possible to apply all previous Python code employed to perform a two-sample t-test.
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
# Step 2: Sample statistics
x1 = 10.5 # Sample mean of Factory 1
s1 = 1 # Sample standard deviation of Factory 1
n1 = 21 # Sample size of Factory 1
x2 = 9.5 # Sample mean of Factory 2
s2 = 2 # Sample standard deviation of Factory 2
n2 = 25 # Sample size of Factory 2
# Degrees of freedom
df = welchs_degrees_of_freedom(s1, s2, n1, n2)
# Step 4: Calculate the t-test statistic
t_obs = (x1 - x2) / np.sqrt((s1 ** 2) / n1 + (s2 ** 2) / n2)
print(f"Tobs: {t_obs:.3f}")
# Step 5: Critical values for a two-tailed test at alpha = 0.05
alpha = 0.05
t_critical = stats.t.ppf(1 - alpha/2, df)
print(f"Critical value (t_alpha/2): ±{t_critical:.3f}")
# Step 6: Conclusion
if abs(t_obs) > t_critical:
conclusion = "Reject the null hypothesis"
else:
conclusion = "Fail to reject the null hypothesis"
print(conclusion)
# Plotting the results
x = np.linspace(-4, 4, 1000)
y = stats.norm.pdf(x)
plt.figure(figsize=(10, 6))
plt.plot(x, y, label='Standard Normal Distribution')
# Critical regions
plt.fill_between(x, y, where=(x < -t_critical) | (x > t_critical), color='red', alpha=0.3, label='Critical regions')
# Tobs line
plt.axvline(t_obs, color='blue', linestyle='--', label=f'Tobs = {t_obs:.3f}')
plt.text(t_obs - 0.1, max(y)*0.5, f'Tobs = {t_obs:.3f}', color='blue', ha='right')
# Critical region text
plt.text(-t_critical, max(y)*0.1, f'Critical region: -{t_critical:.3f}', color='red', ha='center')
plt.text(t_critical, max(y)*0.1, f'Critical region: {t_critical:.3f}', color='red', ha='center')
# Formatting the plot
plt.title('Two-Tailed t-Test')
plt.xlabel('t-value')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()
Tobs: 2.195
Critical value (t_alpha/2): ±2.017
Reject the null hypothesis
The Python code with the data, and detailed computation to employ two-sample Welch's t-test to verify if the two classes have the same mean is given at:
https://colab.research.google.com/drive/15NUdabuGffqYGDkIXwTGFOBEpwIzQhoS?usp=sharing