2.1. Using Wilcoxon Sign Test to compare clustering methods

1. Concepts & Definitions

2. Problem & Solution

2.2. Using Wilcoxon Sign-Rank Test to compare clustering methods

2.3. What is A/B testing and how to combine with hypothesis testing?

2.4. Using Chi-Square fit to check if Benford-Law holds or not

2.5. Using Kolmogorov-Smirnov fit to check if Pareto principle holds or not

2.6. Discount vs. No Discount: non-parametric tests

How to combine clustering methods and Wilcoxon Sign Test?

The concept of using Gaussian Mixture to separate data was extended to INTTRA-based data set for the creation of a three-class distributions in Track 06, section 2.6:

https://sites.google.com/view/statistics-on-customs/in%C3%ADcio/track06/gaussian-mixture-on-inttra-database

And a comparison of Gaussian Mixture and K-means for the INTTRA-based data set was made in Track 10, section 2.1:

https://sites.google.com/view/statistics-on-customs/in%C3%ADcio/track10/gaussian-mixture-x-k-means-on-hs6-weight

Let's start from the results obtained from the following Google Colab (click on the link):

https://colab.research.google.com/drive/1J2IgGQbgCmvyxgDZHw3bkMNS79gikelb?usp=sharing

At the end of the previous code the following result had been obtained:

Classification using GMM:

[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2

2 2 2]

Classification using Kmeans:

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

2 2 2]

Now, suppose that:

ygmm = [0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2

2 2 2]

ykme = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

2 2 2]

One interesting question is: the two labels provided by the two clustering methods have enough difference that it could be classified as different clustering methods?

Consider the development made at Track 11, section 1.6 about Wilcoxon Sign Test.

Since Sign test considers the direction of the difference between two related variables. The hypothesis will be posed as:

The null hypothesis (H0): posits that the median difference is zero.
The alternative hypothesis (H1): suggests that the difference is positive.

Here, if we see a similar number of positive and negative differences then the null hypothesis is true. Otherwise, if we see more positive signs then the null hypothesis is false.

For the clustering example the hypothesis test should be reformuled as:

The null hypothesis (H0): There is no significant difference between the models.
The alternative hypothesis (H1): suggests that there is a significant difference between the models.

The previous developments are summarized in the following code:

import numpy as np

import pandas as pd

from scipy.stats import binom

from scipy.stats import binom_test

# Data

model_a = ygmm

model_b = ykme

# Create DataFrame

df = pd.DataFrame({

'Model A': model_a,

'Model B': model_b

})

# Compute Diff, Sign(Diff), |Diff|

df['Diff'] = df['Model A'] - df['Model B']

df['Sign(Diff)'] = np.sign(df['Diff'])

df['|Diff|'] = df['Diff'].abs()

# Rank the absolute differences

df['Rank'] = df['|Diff|'].rank()

df['Sign(Diff)*Rank'] = df['Sign(Diff)'] * df['Rank']

# Display the DataFrame

print(df)

# Perform the sign test

# Count the number of positive and negative differences

positive_diff_count = (df['Sign(Diff)'] > 0).sum()

negative_diff_count = (df['Sign(Diff)'] < 0).sum()

# Perform binomial test (two-sided)

n = positive_diff_count + negative_diff_count

test_statistic = min(positive_diff_count, negative_diff_count)

#calculate binomial probability

test_critical = binom.cdf(k=test_statistic, n=n, p=0.5)

# Display results

print(f"Number of positive differences: {positive_diff_count}")

print(f"Number of negative differences: {negative_diff_count}")

print(f"Test Statistic (minimum of positive and negative differences): {test_statistic}")

print(f"Test critical: {test_critical}")

# Interpretation of results

alpha = 0.05

if test_critical < alpha:

print("Reject the null hypothesis: There is a significant difference between the two models.")

else:

print("Fail to reject the null hypothesis: There is no significant difference between the two models.")

Model A Model B Diff Sign(Diff) |Diff| Rank Sign(Diff)*Rank

0 0 0 0 0 0 19.0 0.0

1 0 0 0 0 0 19.0 0.0

2 0 0 0 0 0 19.0 0.0

3 0 0 0 0 0 19.0 0.0

4 0 0 0 0 0 19.0 0.0

5 0 0 0 0 0 19.0 0.0

6 0 0 0 0 0 19.0 0.0

7 0 0 0 0 0 19.0 0.0

8 0 0 0 0 0 19.0 0.0

9 0 0 0 0 0 19.0 0.0

10 0 0 0 0 0 19.0 0.0

11 0 0 0 0 0 19.0 0.0

12 1 0 1 1 1 39.0 39.0

13 1 0 1 1 1 39.0 39.0

14 1 1 0 0 0 19.0 0.0

15 1 1 0 0 0 19.0 0.0

16 1 1 0 0 0 19.0 0.0

17 1 1 0 0 0 19.0 0.0

18 1 1 0 0 0 19.0 0.0

19 1 1 0 0 0 19.0 0.0

20 1 1 0 0 0 19.0 0.0

21 1 1 0 0 0 19.0 0.0

22 1 1 0 0 0 19.0 0.0

23 1 1 0 0 0 19.0 0.0

24 1 1 0 0 0 19.0 0.0

25 1 1 0 0 0 19.0 0.0

26 2 1 1 1 1 39.0 39.0

27 2 2 0 0 0 19.0 0.0

28 2 2 0 0 0 19.0 0.0

29 2 2 0 0 0 19.0 0.0

30 2 2 0 0 0 19.0 0.0

31 2 2 0 0 0 19.0 0.0

32 2 2 0 0 0 19.0 0.0

33 2 2 0 0 0 19.0 0.0

34 2 2 0 0 0 19.0 0.0

35 2 2 0 0 0 19.0 0.0

36 2 2 0 0 0 19.0 0.0

37 2 2 0 0 0 19.0 0.0

38 2 2 0 0 0 19.0 0.0

39 2 2 0 0 0 19.0 0.0

Number of positive differences: 3

Number of negative differences: 0

Test Statistic (minimum of positive and negative differences): 0

Test critical: 0.125

Fail to reject the null hypothesis: There is no significant difference between the two models.

The Python code with all the steps is summarized in this Google Colab (click on the link):

https://colab.research.google.com/drive/1Y7FK0TR88eIrsaji848uHyPrjAUIXIns?usp=sharing