1. Concepts & Definitions
1.1. A Review on Parametric Statistics
1.2. Parametric tests for Hypothesis Testing
1.3. Parametric vs. Non-Parametric Test
1.4. One sample z-test and their relation with two-sample z-test
1.5. One sample t-test and their relation with two-sample t-test
1.6. Welch's two-sample t-test: two populations with different variances
1.7. Non-Parametric test for Hypothesis Testing: Mann-Whitney U Test
1.8. Non-Parametric test for Hypothesis Testing: Wilcoxon Sign-Rank Test
1.9. Non-Parametric test for Hypothesis Testing: Wilcoxon Sign Test
1.10. Non-Parametric test for Hypothesis Testing: Chi-Square Goodness-of-Fit
1.11. Non-Parametric test for Hypothesis Testing: Kolmogorov-Smirnov
1.12. Non-Parametric for comparing machine learning
2. Problem & Solution
2.1. Using Wilcoxon Sign Test to compare clustering methods
2.2. Using Wilcoxon Sign-Rank Test to compare clustering methods
2.3. What is A/B testing and how to combine with hypothesis testing?
2.4. Using Chi-Square fit to check if Benford-Law holds or not
2.5. Using Kolmogorov-Smirnov fit to check if Pareto principle holds or not
How to combine clustering methods and Wilcoxon Sign Test?
The concept of using Gaussian Mixture to separate data was extended to INTTRA-based data set for the creation of a three-class distributions in Track 06, section 2.6:
And a comparison of Gaussian Mixture and K-means for the INTTRA-based data set was made in Track 10, section 2.1:
Let's start from the results obtained from the following Google Colab (click on the link):
https://colab.research.google.com/drive/1J2IgGQbgCmvyxgDZHw3bkMNS79gikelb?usp=sharing
At the end of the previous code the following result had been obtained:
Classification using GMM:
[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2]
Classification using Kmeans:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
2 2 2]
Now, suppose that:
ygmm = [0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2]
ykme = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
2 2 2]
One interesting question is: the two labels provided by the two clustering methods have enough difference that it could be classified as different clustering methods?
Consider the development made at Track 11, section 1.6 about Wilcoxon Sign Test.
Since Sign test considers the direction of the difference between two related variables. The hypothesis will be posed as:
The null hypothesis (H0): posits that the median difference is zero.
The alternative hypothesis (H1): suggests that the difference is positive.
Here, if we see a similar number of positive and negative differences then the null hypothesis is true. Otherwise, if we see more positive signs then the null hypothesis is false.
For the clustering example the hypothesis test should be reformuled as:
The null hypothesis (H0): There is no significant difference between the models.
The alternative hypothesis (H1): suggests that there is a significant difference between the models.
The previous developments are summarized in the following code:
import numpy as np
import pandas as pd
from scipy.stats import binom
from scipy.stats import binom_test
# Data
model_a = ygmm
model_b = ykme
# Create DataFrame
df = pd.DataFrame({
'Model A': model_a,
'Model B': model_b
})
# Compute Diff, Sign(Diff), |Diff|
df['Diff'] = df['Model A'] - df['Model B']
df['Sign(Diff)'] = np.sign(df['Diff'])
df['|Diff|'] = df['Diff'].abs()
# Rank the absolute differences
df['Rank'] = df['|Diff|'].rank()
df['Sign(Diff)*Rank'] = df['Sign(Diff)'] * df['Rank']
# Display the DataFrame
print(df)
# Perform the sign test
# Count the number of positive and negative differences
positive_diff_count = (df['Sign(Diff)'] > 0).sum()
negative_diff_count = (df['Sign(Diff)'] < 0).sum()
# Perform binomial test (two-sided)
n = positive_diff_count + negative_diff_count
test_statistic = min(positive_diff_count, negative_diff_count)
#calculate binomial probability
test_critical = binom.cdf(k=test_statistic, n=n, p=0.5)
# Display results
print(f"Number of positive differences: {positive_diff_count}")
print(f"Number of negative differences: {negative_diff_count}")
print(f"Test Statistic (minimum of positive and negative differences): {test_statistic}")
print(f"Test critical: {test_critical}")
# Interpretation of results
alpha = 0.05
if test_critical < alpha:
print("Reject the null hypothesis: There is a significant difference between the two models.")
else:
print("Fail to reject the null hypothesis: There is no significant difference between the two models.")
Model A Model B Diff Sign(Diff) |Diff| Rank Sign(Diff)*Rank
0 0 0 0 0 0 19.0 0.0
1 0 0 0 0 0 19.0 0.0
2 0 0 0 0 0 19.0 0.0
3 0 0 0 0 0 19.0 0.0
4 0 0 0 0 0 19.0 0.0
5 0 0 0 0 0 19.0 0.0
6 0 0 0 0 0 19.0 0.0
7 0 0 0 0 0 19.0 0.0
8 0 0 0 0 0 19.0 0.0
9 0 0 0 0 0 19.0 0.0
10 0 0 0 0 0 19.0 0.0
11 0 0 0 0 0 19.0 0.0
12 1 0 1 1 1 39.0 39.0
13 1 0 1 1 1 39.0 39.0
14 1 1 0 0 0 19.0 0.0
15 1 1 0 0 0 19.0 0.0
16 1 1 0 0 0 19.0 0.0
17 1 1 0 0 0 19.0 0.0
18 1 1 0 0 0 19.0 0.0
19 1 1 0 0 0 19.0 0.0
20 1 1 0 0 0 19.0 0.0
21 1 1 0 0 0 19.0 0.0
22 1 1 0 0 0 19.0 0.0
23 1 1 0 0 0 19.0 0.0
24 1 1 0 0 0 19.0 0.0
25 1 1 0 0 0 19.0 0.0
26 2 1 1 1 1 39.0 39.0
27 2 2 0 0 0 19.0 0.0
28 2 2 0 0 0 19.0 0.0
29 2 2 0 0 0 19.0 0.0
30 2 2 0 0 0 19.0 0.0
31 2 2 0 0 0 19.0 0.0
32 2 2 0 0 0 19.0 0.0
33 2 2 0 0 0 19.0 0.0
34 2 2 0 0 0 19.0 0.0
35 2 2 0 0 0 19.0 0.0
36 2 2 0 0 0 19.0 0.0
37 2 2 0 0 0 19.0 0.0
38 2 2 0 0 0 19.0 0.0
39 2 2 0 0 0 19.0 0.0
Number of positive differences: 3
Number of negative differences: 0
Test Statistic (minimum of positive and negative differences): 0
Test critical: 0.125
Fail to reject the null hypothesis: There is no significant difference between the two models.
The Python code with all the steps is summarized in this Google Colab (click on the link):
https://colab.research.google.com/drive/1Y7FK0TR88eIrsaji848uHyPrjAUIXIns?usp=sharing