1.11. Non-Parametric test for Hypothesis Testing: Komolgorov-Smirnov

1. Concepts & Definitions

2. Problem & Solution

2.1. Using Wilcoxon Sign Test to compare clustering methods

2.2. Using Wilcoxon Sign-Rank Test to compare clustering methods

2.3. What is A/B testing and how to combine with hypothesis testing?

2.4. Using Chi-Square fit to check if Benford-Law holds or not

2.5. Using Kolmogorov-Smirnov fit to check if Pareto principle holds or not

2.6. Discount vs. No Discount: non-parametric tests

What is Kolmogorov-Smirnov Test?

Kolmogorov–Smirnov Test is a completely efficient manner to determine if two samples are significantly one of a kind from each other. It is normally used to check the uniformity of random numbers. Uniformity is one of the maximum important properties of any random number generator and the Kolmogorov–Smirnov check can be used to check it [1].

The Kolmogorov–Smirnov test is versatile and can be employed to evaluate whether two underlying one-dimensional probability distributions vary. It serves as an effective tool to determine the statistical significance of differences between two sets of data.

Kolmogorov Distribution

The Kolmogorov distribution, often denoted as D, represents the cumulative distribution function (CDF) of the maximum difference between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution.

The probability distribution function (PDF) of the Kolmogorov distribution itself is not expressed in a simple analytical form. Instead, tables or statistical software are commonly used to obtain critical values for the test. The distribution is influenced by sample size, and the critical values depend on the significance level chosen for the test.

where:

n is the sample size.
x is the normalized Kolmogorov-Smirnov statistic.
k is the index of summation in the series.

The next code is designed to generate and plot the probability density function (PDF) of the Kolmogorov-Smirnov distribution for a specified sample size n. These are the following key points of the code:

1. Import Libraries: The code imports necessary libraries: numpy for numerical operations, matplotlib.pyplot and seaborn for plotting, and scipy.stats for statistical functions.

2. Set Sample Size: A variable n is set to 10, indicating the sample size for which the Kolmogorov-Smirnov distribution will be evaluated.

3. Generate Uniformly Distributed Random Values: An array x of 1000 random values is generated from a uniform distribution between 0 and 1 using np.random.uniform.

4. Calculate PDF of Kolmogorov-Smirnov Distribution: The PDF of the Kolmogorov-Smirnov distribution for sample size n is calculated at the points specified in x using stats.kstwo.pdf.

5. Plot the Results:

A line plot is created using seaborn to visualize the PDF of the Kolmogorov-Smirnov distribution. The x-axis represents the uniformly distributed random values, and the y-axis represents the corresponding PDF values.
The plot includes a title indicating the sample size n, and labels for the x and y axes.

# Calculate samples

n = 10

x = np.random.uniform(0, 1, 1000)

y = stats.kstwo.pdf(x, n = n)

plt.figure(figsize = (8,5))

sns.lineplot(x = x, y = y)

plt.title(f"Kolmogorov-Smirnov Distribution for en={n}")

Text(0.5, 1.0, 'Kolmogorov-Smirnov Distribution for en=10')

This next code will create a plot that shows the cumulative distribution function (CDF) of the Kolmogorov-Smirnov distribution.

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from scipy import stats

# Calculate samples

n = 10

x = np.random.uniform(0, 1, 1000)

y_cdf = stats.kstwo.cdf(x, n=n)

# Set seaborn style

sns.set(style="whitegrid")

# Create the figure and axes

plt.figure(figsize=(10, 6))

# Plot the CDF line

sns.lineplot(x=x, y=y_cdf, color='green', label='CDF')

# Title and labels

plt.title(f"Kolmogorov-Smirnov Cumulative Distribution for n={n}", fontsize=16)

plt.xlabel('x', fontsize=14)

plt.ylabel('Cumulative Probability', fontsize=14)

# Show the legend

plt.legend()

# Show the plot

plt.show()

The Python code with the data, and detailed computation to apply Goodness-of-Fit Test is given at:

https://colab.research.google.com/drive/1FHK7ICgAZVQCRd4_e5G76hFNIwzfvf5V?usp=sharing

When use Kolmogorov-Smirnov Test?

The main idea behind using this Kolmogorov-Smirnov Test is to check whether [1]:

One Sample Kolmogorov-Smirnov Test: determine whether a sample comes from a specific distribution.
Two-Sample Kolmogorov–Smirnov Test: compare two independent samples to assess whether they come from the same distribution.

The next table presents a comparison between one-sample and two-sample KS test features:

Let’s detail in what situations is possible to apply Kolmogorov-Smirnov and the expected output:

Comparison of Probability Distributions: The test is used to evaluate whether two samples exhibit the same probability distribution.
Compare the shape of the distributions: If we assume that the shapes or probability distributions of the two samples are similar, the test assesses the maximum absolute difference between the cumulative probability distributions of the two functions.
Check Distributional Differences: The test quantifies the maximum difference between the cumulative probability distributions, and a higher value indicates greater dissimilarity in the shape of the distributions.

How does one sample Kolmogorov-Smirnov (KS) Test work?

Below are the steps for how the Kolmogorov-Smirnov (KS) test works [1]:

Hypotheses Formulation:

Null Hypothesis : The sample follows a specified distribution.
Alternative Hypothesis: The sample does not follow the specified distribution.

Selection of a Reference Distribution: A theoretical distribution (e.g., normal, exponential) is decided against which you want to test the sample distribution. This distribution is usually based on theoretical expectations or prior knowledge.
Calculation of the Test Statistic (D): For a one-sample Kolmogorov-Smirnov test, the test statistic (Dn) represents the maximum vertical deviation between the empirical distribution function (EDF) of the sample and the cumulative distribution function (CDF) of the reference distribution employing the equation Dn = max(difference). For a two-sample Kolmogorov-Smirnov test, the test statistic compares the EDFs of two independent samples.
Determination of Critical Value or P-value: The test statistic (D) is compared to a critical value from the Kolmogorov-Smirnov distribution table or, more commonly, a p-value is calculated. If the p-value is less than the significance level (commonly 0.05), the null hypothesis is rejected, suggesting that the sample distribution does not match the specified distribution.
Interpretation of Results: If the null hypothesis is rejected, it indicates that there is evidence to suggest that the sample does not follow the specified distribution. The alternative hypothesis, suggesting a difference, is accepted.

The next python code provides a visual understanding of how the one sample Kolgomorov-Smirnov test could be applied to verify if a certain data set follows or not a normal distribution.

import numpy as np

import scipy.stats as stats

import matplotlib.pyplot as plt

# Generate a small sample size (e.g., 25 samples) from a normal distribution

np.random.seed(0) # For reproducibility

sample_size = 25

sample = np.random.normal(loc=0, scale=1, size=sample_size)

# Perform the Kolmogorov-Smirnov test for normality

d_statistic, p_value = stats.kstest(sample, 'norm')

# Print the KS test result

print(f"KS Statistic: {d_statistic}")

print(f"P-Value: {p_value}")

# Plot the empirical distribution function (EDF) of the sample

sorted_sample = np.sort(sample)

y_vals = np.arange(1, sample_size + 1) / sample_size

# Plot the cumulative distribution function (CDF) of the reference normal distribution

x_vals = np.linspace(min(sample), max(sample), 100)

cdf_vals = stats.norm.cdf(x_vals)

# Plotting

plt.figure(figsize=(10, 6))

plt.step(sorted_sample, y_vals, where='post', label='Empirical CDF')

plt.plot(x_vals, cdf_vals, label='Reference Normal CDF', color='red')

# Highlight the KS statistic on the plot

# Find the point of maximum difference

d_max_index = np.argmax(np.abs(y_vals - stats.norm.cdf(sorted_sample)))

d_max = np.abs(y_vals[d_max_index] - stats.norm.cdf(sorted_sample[d_max_index]))

plt.plot([sorted_sample[d_max_index], sorted_sample[d_max_index]],

[stats.norm.cdf(sorted_sample[d_max_index]), y_vals[d_max_index]],

'k--', label=f'KS Statistic = {d_statistic:.3f}')

# Adding labels and legend

plt.xlabel('Sample Values')

plt.ylabel('Cumulative Probability')

plt.title('Kolmogorov-Smirnov Test for Normality')

plt.legend()

plt.grid()

# Show plot

plt.show()

KS Statistic: 0.26842179992563575

P-Value: 0.044235532757121

The Python code with the data, and detailed computation to apply Goodness-of-Fit Test is given at:

https://colab.research.google.com/drive/1FHK7ICgAZVQCRd4_e5G76hFNIwzfvf5V?usp=sharing

One Sample Kolmogorov-Smirnov Test applied to sample data

The next python is useful to better understand the step 4 which is about:

4. Determination of Critical Value or P-value: The test statistic (Dn) is compared to a critical value from the Kolmogorov-Smirnov distribution table or, more commonly, a p-value is calculated. If the p-value is less than the significance level (commonly 0.05), the null hypothesis is rejected, suggesting that the sample distribution does not match the specified distribution. This could be translated as the following rules:

4.1. Using the determination of critical value (CV):

Reject H0: Dn > CV
Do not reject H0: Dn <= CV

4.2. Using p-value and significance level alpha:

Reject H0: p-value < alpha
Do not reject H0: p-value >= alpha

It is important to remember the meaning of the Null and Alternative Hypothesis:

Null Hypothesis H0: The sample follows a specified distribution.
Alternative Hypothesis Ha: The sample does not follow the specified distribution.

And the meaning of rejecting or not the Null Hypothesis:

Reject H0: The sample does not follow the specified distribution.
Do not reject H0: The sample follows a specified distribution.

The next Python illustrates the previous rules for verifying if a given sample follows or not a normal distribution.

import numpy as np

import matplotlib.pyplot as plt

from numpy.random import seed, poisson

from scipy.stats import kstest, kstwobign, norm

# Set seed (e.g., make this example reproducible)

seed(0)

# Generate a sample dataset of 100 values that follow a Poisson distribution with mean=5

sample = poisson(5, 100)

# Perform the Kolmogorov-Smirnov test against a normal distribution

ks_statistic, ks_p_value = kstest(sample, 'norm')

# Step 5: Comparing

alpha = 0.05

# Obtain the critical value from the Kolmogorov-Smirnov distribution

critical_value = kstwobign.ppf(1 - alpha)

print(f"Kolmogorov-Smirnov Statistic: {ks_statistic}")

print(f"Critical value: {critical_value}")

print(f"Alpha: {alpha}")

print(f"P-value: {ks_p_value}")

if ks_statistic > critical_value or ks_p_value < alpha:

print("Reject the null hypothesis. The sample does not come from the specified distribution.")

else:

print("Fail to reject the null hypothesis. The sample comes from the specified distribution.")

Kolmogorov-Smirnov Statistic: 0.9072498680518208

Critical value: 1.3580986393225505

Alpha: 0.05

P-value: 1.0908062873170218e-103

Reject the null hypothesis. The sample does not come from the specified distribution.

The Python code with the data, and detailed computation to apply Goodness-of-Fit Test is given at:

https://colab.research.google.com/drive/1FHK7ICgAZVQCRd4_e5G76hFNIwzfvf5V?usp=sharing

One Sample Kolmogorov-Smirnov Test applied to raw data

Suppose a data set from 30 days of the demand for a product. The objective is to know if the data follow a normal distribution with a mean of 50 and a standard deviation of 10 [3]:

data = [67, 63, 33, 69, 53, 51, 49, 78, 48, 42, 72, 52, 47, 66, 58, 44, 44, 56, 28, 25, 36, 32, 61, 57, 38, 35, 76, 58, 48, 59] .

The next structured approach ensures a thorough examination of the data's adherence to normality, leveraging statistical methods and visual aids to elucidate the results:

1. Data Initialization: The provided data, representing the demand of a product over 30 days, is encapsulated into a pandas DataFrame.

2. Data Ordering: The data is sorted in ascending order based on the demand values to facilitate subsequent calculations.

3. Frequency Calculation: A frequency column is appended to the DataFrame, indicating the count of occurrences for each demand value using the cumcount() method.

4. Observed Relative Cumulative Frequency: An observed relative cumulative frequency column is computed by calculating the cumulative count of occurrences and normalizing it by the total number of observations.

5. Expected Relative Cumulative Frequency: The expected cumulative frequency for each demand value is derived using the cumulative distribution function (CDF) of the normal distribution N(50,10). This calculation leverages the statistics.NormalDist class.

6. Difference Calculation: A new column is created to store the absolute differences between the observed and expected cumulative frequencies, quantifying the deviation for each demand value.

7. Kolmogorov-Smirnov Statistic (Dn): The maximum difference (Dn) is identified from the difference column, representing the Kolmogorov-Smirnov statistic.

8. Critical Value Comparison: The calculated Dn is compared against a predefined critical value (0.24 for a sample size of 30). Based on this comparison, the script concludes whether the data conforms to the hypothesized normal distribution N(50,10).

9. Visualization: A graphical representation is generated to visually juxtapose the empirical cumulative distribution function (CDF) against the theoretical normal CDF. Additionally, the maximum difference is highlighted on the plot to illustrate the KS statistic.

import pandas as pd

import numpy as np

from scipy import stats

import statistics

import matplotlib.pyplot as plt

# Given data

data = [67, 63, 33, 69, 53, 51, 49, 78, 48, 42, 72, 52, 47, 66, 58, 44, 44, 56, 28, 25, 36, 32, 61, 57, 38, 35, 76, 58, 48, 59]

# Step 1 - Create DataFrame and Order the data

df = pd.DataFrame(data, columns=["Demand of a Product"])

df_1 = df.sort_values(by="Demand of a Product").reset_index(drop=True)

# Step 2 - Add a frequency column indicating how many times each number appears

df_1["Frequency"] = df_1.groupby('Demand of a Product', sort=False).cumcount() + 1

df_2 = df_1

# Step 3 - Add Observed relative cumulative frequency column

df_2["Count"] = np.arange(1, len(df_2) + 1)

df_2["Obs. % Cum. Freq."] = df_2["Count"] / len(df_2)

df_3 = df_2

# Step 4 - Add expected relative cumulative frequency column

normal = statistics.NormalDist(50, 10)

df_3["Exp. % Cum. Freq."] = df_3["Demand of a Product"].apply(lambda x: normal.cdf(x))

df_4 = df_3

print(df_4.head())

# Step 5 - Add difference column

df_4["Difference"] = abs(df_4["Obs. % Cum. Freq."] - df_4["Exp. % Cum. Freq."])

df_5 = df_4

print(df_5.head())

# Step 6 - Get the max of the difference

Dn = max(df_5["Difference"])

print(f"Dn: {Dn}")

# Step 7 - Compare Critical Value of K-S vs Dn Value

cv = 0.24 # Critical value for Kolmogorov-Smirnov test with sample size of 30

if Dn <= cv:

print("Your data fits with a normal distribution N(50,10)")

else:

print("Your data DO NOT fit with a normal distribution N(50,10)")

# Plotting the results

plt.figure(figsize=(10, 6))

plt.step(df_5["Demand of a Product"], df_5["Obs. % Cum. Freq."], where='post', label='Empirical CDF')

plt.plot(df_5["Demand of a Product"], df_5["Exp. % Cum. Freq."], label='Reference Normal CDF', color='red')

# Highlight the KS statistic on the plot

d_max_index = df_5["Difference"].idxmax()

plt.plot([df_5.at[d_max_index, "Demand of a Product"], df_5.at[d_max_index, "Demand of a Product"]],

[df_5.at[d_max_index, "Obs. % Cum. Freq."], df_5.at[d_max_index, "Exp. % Cum. Freq."]],

'k--', label=f'KS Statistic = {Dn:.3f}')

# Adding labels and legend

plt.xlabel('Demand of a Product')

plt.ylabel('Cumulative Probability')

plt.title('Kolmogorov-Smirnov Test for Normality')

plt.legend()

plt.grid()

# Show plot

plt.show()

Demand of a Product Frequency Count Obs. % Cum. Freq.

0 25 1 1 0.033333

1 28 1 2 0.066667

2 32 1 3 0.100000

3 33 1 4 0.133333

4 35 1 5 0.166667

5 36 1 6 0.200000

6 38 1 7 0.233333

7 42 1 8 0.266667

8 44 1 9 0.300000

9 44 2 10 0.333333

10 47 1 11 0.366667

11 48 1 12 0.400000

12 48 2 13 0.433333

13 49 1 14 0.466667

14 51 1 15 0.500000

15 52 1 16 0.533333

16 53 1 17 0.566667

17 56 1 18 0.600000

18 57 1 19 0.633333

19 58 1 20 0.666667

20 58 2 21 0.700000

21 59 1 22 0.733333

22 61 1 23 0.766667

23 63 1 24 0.800000

24 66 1 25 0.833333

25 67 1 26 0.866667

26 69 1 27 0.900000

27 72 1 28 0.933333

28 76 1 29 0.966667

29 78 1 30 1.000000

Demand of a Product Frequency Count Obs. % Cum. Freq. \

0 25 1 1 0.033333

1 28 1 2 0.066667

2 32 1 3 0.100000

3 33 1 4 0.133333

4 35 1 5 0.166667

Exp. % Cum. Freq. Difference

0 0.006210 0.027124

1 0.013903 0.052763

2 0.035930 0.064070

3 0.044565 0.088768

4 0.066807 0.099859

Dn: 0.12574688224992647

Your data fits with a normal distribution N(50,10)

The Python code with the data, and detailed computation to apply Goodness-of-Fit Test is given at:

https://colab.research.google.com/drive/1FHK7ICgAZVQCRd4_e5G76hFNIwzfvf5V?usp=sharing

Two Sample Kolmogorov-Smirnov Test

This is a two-sided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution. If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same [2]. This leads to the following modification of the Null and Alternative hypothesis:

Null Hypothesis H0: The two samples come from the same distribution.
Alternative Hypothesis Ha: The two samples does not come from the same distribution.

This means in terms of numerical values comparisons into:

4.1. Using the determination of critical value (CV):

Reject H0: Dn > CV
Do not reject H0: Dn <= CV

4.2. Using p-value and significance level alpha:

Reject H0: p-value < alpha
Do not reject H0: p-value >= alpha

Remember that the decision could be based on comparing the p-value with a chosen significance level (e.g., 0.05). If the p-value is less than the significance level, reject the null hypothesis, indicating that the two samples come from different distributions.

Two Sample Kolmogorov-Smirnov Test - Testing two normal distributions - Python code

The next code helps to explain how the two sample Kolmogorov-Smirnov Test could be applied for testing two normal distributions with different parameters [1].

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from scipy.stats import ks_2samp

# Set the seed for reproducibility

np.random.seed(42)

# Generate two sample datasets

sample1 = np.random.normal(0, 1, 100)

sample2 = np.random.normal(0.5, 1.5, 120)

# Perform the Kolmogorov-Smirnov test

ks_statistic, p_value = ks_2samp(sample1, sample2)

# Print the results

print(f"Kolmogorov–Smirnov Statistic: {ks_statistic}")

print(f"P-value: {p_value}")

# Decision based on p-value

alpha = 0.05

if p_value < alpha:

print("Reject the null hypothesis. The two samples come from different distributions.")

else:

print("Fail to reject the null hypothesis. There is not enough evidence to suggest different distributions.")

# Plot the histograms with KDE

plt.figure(figsize=(12, 8))

sns.histplot(sample1, bins=20, kde=True, color='b', label='Sample 1')

sns.histplot(sample2, bins=20, kde=True, color='g', label='Sample 2')

plt.legend()

plt.title('Histogram and KDE of Sample Distributions')

plt.xlabel('Value')

plt.ylabel('Frequency')

plt.show()

# Calculate ECDF for both samples

def ecdf(data):

"""Compute ECDF for a one-dimensional array of measurements."""

n = len(data)

x = np.sort(data)

y = np.arange(1, n+1) / n

return x, y

# Get ECDFs

x1, y1 = ecdf(sample1)

x2, y2 = ecdf(sample2)

# Plot the ECDFs

plt.figure(figsize=(12, 8))

plt.step(x1, y1, where='post', label='ECDF Sample 1', color='b')

plt.step(x2, y2, where='post', label='ECDF Sample 2', color='g')

# Highlight the KS statistic

d_max = np.max(np.abs(np.interp(x1, x2, y2) - y1))

plt.plot([x1[np.argmax(np.abs(np.interp(x1, x2, y2) - y1))], x1[np.argmax(np.abs(np.interp(x1, x2, y2) - y1))]],

[y1[np.argmax(np.abs(np.interp(x1, x2, y2) - y1))], np.interp(x1, x2, y2)[np.argmax(np.abs(np.interp(x1, x2, y2) - y1))]],

'k--', label=f'KS Statistic = {ks_statistic:.3f}')

# Adding labels, title, and legend

plt.xlabel('Sample Values')

plt.ylabel('Cumulative Probability')

plt.title('Empirical Cumulative Distribution Functions (ECDF)')

plt.legend()

plt.grid()

plt.show()

Kolmogorov–Smirnov Statistic: 0.35833333333333334

P-value: 9.93895980740741e-07

Reject the null hypothesis. The two samples come from different distributions.

The Python code with the data, and detailed computation to apply Goodness-of-Fit Test is given at:

https://colab.research.google.com/drive/1FHK7ICgAZVQCRd4_e5G76hFNIwzfvf5V?usp=sharing

Two Sample Kolmogorov-Smirnov Test - Testing two different distributions - Python code

The next code helps to explain how the two sample Kolmogorov-Smirnov Test could be applied for testing two different distributions with different parameters [4].

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from scipy.stats import ks_2samp

# Set the seed for reproducibility

np.random.seed(42)

# Generate two sample datasets

data1 = np.random.normal(7, 2, 100) # Normal distribution

data2 = np.random.lognormal(2, 0.2, 100) # Log-normal distribution

# Perform the Kolmogorov-Smirnov test

ks_statistic, p_value = ks_2samp(data1, data2)

# Print the results

print(f"Kolmogorov–Smirnov Statistic: {ks_statistic}")

print(f"P-value: {p_value}")

# Decision based on p-value

alpha = 0.05

if p_value < alpha:

print("Reject the null hypothesis. The two samples come from different distributions.")

else:

print("Fail to reject the null hypothesis. There is not enough evidence to suggest different distributions.")

# Plot the histograms with KDE

plt.figure(figsize=(12, 8))

sns.histplot(data1, bins=20, kde=True, color='b', label='Data 1 (Normal)')

sns.histplot(data2, bins=20, kde=True, color='g', label='Data 2 (Log-normal)')

plt.legend()

plt.title('Histogram and KDE of Sample Distributions')

plt.xlabel('Value')

plt.ylabel('Frequency')

plt.show()

# Calculate ECDF for both samples

def ecdf(data):

"""Compute ECDF for a one-dimensional array of measurements."""

n = len(data)

x = np.sort(data)

y = np.arange(1, n+1) / n

return x, y

# Get ECDFs

x1, y1 = ecdf(data1)

x2, y2 = ecdf(data2)

# Plot the ECDFs

plt.figure(figsize=(12, 8))

plt.step(x1, y1, where='post', label='ECDF Data 1 (Normal)', color='b')

plt.step(x2, y2, where='post', label='ECDF Data 2 (Log-normal)', color='g')

# Highlight the KS statistic

d_max = np.max(np.abs(np.interp(x1, x2, y2) - y1))

plt.plot([x1[np.argmax(np.abs(np.interp(x1, x2, y2) - y1))], x1[np.argmax(np.abs(np.interp(x1, x2, y2) - y1))]],

[y1[np.argmax(np.abs(np.interp(x1, x2, y2) - y1))], np.interp(x1, x2, y2)[np.argmax(np.abs(np.interp(x1, x2, y2) - y1))]],

'k--', label=f'KS Statistic = {ks_statistic:.3f}')

# Adding labels, title, and legend

plt.xlabel('Sample Values')

plt.ylabel('Cumulative Probability')

plt.title('Empirical Cumulative Distribution Functions (ECDF)')

plt.legend()

plt.grid()

plt.show()

Kolmogorov–Smirnov Statistic: 0.2

P-value: 0.03638428787491733

Reject the null hypothesis. The two samples come from different distributions.

The Python code with the data, and detailed computation to apply Goodness-of-Fit Test is given at:

https://colab.research.google.com/drive/1FHK7ICgAZVQCRd4_e5G76hFNIwzfvf5V?usp=sharing

An alternative for checking manual computations to apply Kolmogorov-Smirnov tests are given at [5].

References:

[1] https://www.geeksforgeeks.org/kolmogorov-smirnov-test-ks-test/

[2] https://towardsdatascience.com/non-parametric-tests-in-hypothesis-testing-138d585c3548

[3] https://medium.com/@ricardojaviermartnezsustegui/kolmog%C3%B3rov-smirnov-test-in-python-step-by-step-1b7532021bd2

[4] https://www.statology.org/kolmogorov-smirnov-test-python/

[5] https://python.plainenglish.io/test-of-normality-kolmogorov-smirnov-test-d047a76f5efe