1. Concepts & Definitions
1.1. A Review on Parametric Statistics
1.2. Parametric tests for Hypothesis Testing
1.3. Parametric vs. Non-Parametric Test
1.4. One sample z-test and their relation with two-sample z-test
1.5. One sample t-test and their relation with two-sample t-test
1.6. Welch's two-sample t-test: two populations with different variances
1.7. Non-Parametric test for Hypothesis Testing: Mann-Whitney U Test
1.8. Non-Parametric test for Hypothesis Testing: Wilcoxon Sign-Rank Test
1.9. Non-Parametric test for Hypothesis Testing: Wilcoxon Sign Test
1.10. Non-Parametric test for Hypothesis Testing: Chi-Square Goodness-of-Fit
1.11. Non-Parametric test for Hypothesis Testing: Kolmogorov-Smirnov
1.12. Non-Parametric for comparing machine learning
2. Problem & Solution
2.1. Using Wilcoxon Sign Test to compare clustering methods
2.2. Using Wilcoxon Sign-Rank Test to compare clustering methods
2.3. What is A/B testing and how to combine with hypothesis testing?
2.4. Using Chi-Square fit to check if Benford-Law holds or not
2.5. Using Kolmogorov-Smirnov fit to check if Pareto principle holds or not
Mann-Whitney U Test Concept
The Mann-Whitney U Test, also referred to as the Wilcoxon Rank Sum Test, is a non-parametric statistical method used to compare two samples or groups.
This test evaluates whether the two sampled groups are likely to come from the same population, essentially questioning if these two populations have the same data distribution. In other words, it seeks evidence to determine whether the groups originate from populations with different levels of a variable of interest. Consequently, the hypotheses in a Mann-Whitney U Test are as follows [1]:
The null hypothesis (H0) posits that the two populations are equal.
The alternative hypothesis (H1) suggests that the two populations are not equal.
Some researchers view this as a comparison of the medians between the two populations, while parametric tests compare the means between two independent groups. In specific cases, where the data have similar shapes (as per the assumptions), this interpretation is valid. However, it's important to note that medians are not directly involved in the calculation of the Mann-Whitney U test statistic. Two groups could have the same median and still show significant differences according to the Mann-Whitney U test.
To illustrate the application of the test, it was generated two sets of sample data (old_drug and new_drug) using a normal distribution with different means. This creates datasets suitable for illustrating the Mann-Whitney U test.
The Python code that creates the previous graphics with two distributions from two sets of sample data is given at:
https://colab.research.google.com/drive/1DjJ6-Su7yPxyaKYnFPdG6aPpO4vvngQE?usp=sharing
Mann-Whitney U Test assumptions
Some key assumptions for the Mann-Whitney U Test are detailed below [1, 2]:
The variable being compared between the two groups must be continuous (able to take any number in a range – for example, age, weight, height, or heart rate). This is because the test is based on ranking the observations in each group.
The data are assumed to take a non-normal, or skewed, distribution. If your data are normally distributed, the unpaired Student’s t-test should be used to compare the two groups instead. The next figure illustrates this point.
While the data in both groups are not assumed to be Normal, the data are assumed to be similar in shape across the two groups.
The data should be two randomly selected independent samples, meaning the groups have no relationship to each other. If samples are paired (for example, two measurements from the same group of participants), then a paired samples t-test should be used instead.
A sufficient sample size is needed for a valid test, usually more than 5 observations in each group.
The Python code that creates the previous graphics with two distributions from two sets of sample data is given at:
https://colab.research.google.com/drive/1DjJ6-Su7yPxyaKYnFPdG6aPpO4vvngQE?usp=sharing
Mann-Whitney U Test numerical example with discrete values variable
Now, here you see that the frequency of sleepwalking cases is lower when the person is taking new drugs. Understood the problem? Let’s see how Mann Whitney U test works here. We are interested in knowing whether the two groups taking different drugs report the same number of sleepwalking cases or not. The hypothesis is given below [3]:
H0: The two groups report same number of cases
H1: The two groups report different number of cases
I am selecting 5% level of significance for this test. The next step is to set a test statistic. For Mann Whitney U test, the test statistic is denoted by U which is the minimum of U1 and U2.
U1 = n1 * n2 + (n1 * (n1 + 1)) / 2 - R1
U2 = n1 * n2 + (n2 * (n2 + 1)) / 2 - R2
where R1 is the sum of ranks of group 1, R2 is the sum of ranks of group 2, n1 is the size of group 1, and n2 is the size of group 2.
Now, we will compute the ranks by combining the two groups. The question is "How to assign ranks?"
How to assign ranks?
Ranks are a very important component of non-parametric tests and therefore learning how to assign ranks to a sample is considerably important. Let’s learn how to assign ranks.
1. We will combine the two samples and arrange them in ascending order. I am using OD and ND for Old Drug and New Drug respectively.
The lowest value here is assigned the rank 1 and the second lowest value is assigned the rank 2 and so on.
But notice that the numbers 1, 4 and 8 are appearing more than once in the combined sample. So the ranks assigned are wrong.
How to assign ranks when there are ties in the sample?
Ties are a number appearing more than once in a sample. Look at the position of number 1 in the sample after sorting the data. Here, the number 1 is appearing at 1st and 2nd position. In such a case, we take the mean of 1 and 2 (because the number 1 appears at 1st and 2nd position) and assign the mean to the number 1 as shown below. We follow the same steps for numbers 4 and 8. The number 4 here appears at positions 5th and 6th and their mean is 5.5 so we assign rank 5.5 to the number 4. Calculate the rank for number 8 along these lines.
We assign the mean rank when there are ties in a sample to make sure that the sum of ranks in each sample of size n is same. Therefore, the sum of ranks will always be equal to n(n+1)/2.
2. The next step is to compute the sum of ranks for group 1 and group 2.
R1 = 15.5
R2 = 39.5
3. Using the formula of U1 & U2, compute their values.
U1 = 24.5
U2 = 0.5
Now, U = min(U1, U2) = 0.5.
Note: For the Mann-Whitney U test, the value of U lies in the range(0, n1*n2) where 0 indicates that the two groups are completely different from each other and n1*n2 indicates some relation between the two groups. Also, U1 + U2 is always equal to n1*n2. Notice that the value of U is 0.5 here which is very close to 0.
Now, we determine a critical value (denoted by p), using the table for critical values, which is a point derived from the level of significance of the test and is used to reject or accept the null hypothesis. In Mann Whitney U test, the test criteria are [4, 5]:
Reject H0: U <= critical value
Reject H1: U > critical value
Since n1 = n2 = 5, and significance level α = 0.05, then here p = 2 (using a table or a Python code to obtain it).
Since U < critical value, therefore, we reject the null hypothesis and conclude that there’s no significant evidence to state that two groups report the same number of sleepwalking cases.
Mann-Whitney U Test numerical example with discrete values variable - Python code
The next Python code solves the previous numerical example by employing pandas library to compute the ranks and the command mannwhitneyu to obtain corresponding U metric and p-value for the data. It also produce the bar graphics for each data set.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import mannwhitneyu
# Data for old and new drugs
old_drug = [7, 8, 4, 9, 8]
new_drug = [3, 4, 2, 1, 1]
# Step 1: Plot frequency bars for each drug
def plot_frequency_bars(old_drug, new_drug):
labels, counts_old = np.unique(old_drug, return_counts=True)
_, counts_new = np.unique(new_drug, return_counts=True)
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
ax[0].bar(labels, counts_old, color='blue', alpha=0.7)
ax[0].set_title('Old Drug')
ax[0].set_xlabel('Number of Sleepwalking Cases')
ax[0].set_ylabel('Frequency')
ax[1].bar(labels, counts_new, color='green', alpha=0.7)
ax[1].set_title('New Drug')
ax[1].set_xlabel('Number of Sleepwalking Cases')
ax[1].set_ylabel('Frequency')
plt.tight_layout()
plt.show()
plot_frequency_bars(old_drug, new_drug)
# Combine the samples and create a DataFrame
data = pd.DataFrame({
'Sleepwalking Cases': old_drug + new_drug,
'Group': ['OD']*len(old_drug) + ['ND']*len(new_drug)
})
# Step 2: Rank the combined data
data['Rank'] = data['Sleepwalking Cases'].rank()
# Separate the ranks by group
rank_sum_old = data[data['Group'] == 'OD']['Rank'].sum()
rank_sum_new = data[data['Group'] == 'ND']['Rank'].sum()
# Step 3: Compute U1 and U2 using the ranks
n1 = len(old_drug)
n2 = len(new_drug)
R1 = rank_sum_old
R2 = rank_sum_new
U1 = n1 * n2 + (n1 * (n1 + 1)) / 2 - R1
U2 = n1 * n2 + (n2 * (n2 + 1)) / 2 - R2
print(f"Rank sum for old drug (R1): {R1}")
print(f"Rank sum for new drug (R2): {R2}")
print(f"U1: {U1}")
print(f"U2: {U2}")
# Step 4: Use scipy to compute the p-value
u_statistic, p_value = mannwhitneyu(old_drug, new_drug, alternative='two-sided')
print(f"Mann-Whitney U statistic: {u_statistic}")
print(f"P-value: {p_value}")
# Display the ranks table
print("\nData with ranks:")
print(data)
# Conclusion based on p-value
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: There is a significant difference between the two drugs.")
else:
print("Fail to reject the null hypothesis: There is no significant difference between the two drugs.")
Rank sum for old drug (R1): 39.5
Rank sum for new drug (R2): 15.5
U1: 0.5
U2: 24.5
Mann-Whitney U statistic: 24.5
P-value: 0.015333162113602824
Data with ranks:
Sleepwalking Cases Group Rank
0 7 OD 7.0
1 8 OD 8.5
2 4 OD 5.5
3 9 OD 10.0
4 8 OD 8.5
5 3 ND 4.0
6 4 ND 5.5
7 2 ND 3.0
8 1 ND 1.5
9 1 ND 1.5
Reject the null hypothesis: There is a significant difference between the two drugs.
The Python code with the data, graphic, and detailed computation to obtain Mann-Whitney U statistic is given at:
https://colab.research.google.com/drive/1DjJ6-Su7yPxyaKYnFPdG6aPpO4vvngQE?usp=sharing
Mann-Whitney U Test numerical example with continuous values variable - Python code
The next Python code solves the previous numerical example by employing pandas library to compute the ranks and the command mannwhitneyu to obtain corresponding U metric and p-value for the data. It also produce the curve and bar graphics for each data set.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import mannwhitneyu
# Seaborn style for visualization
sns.set(style="whitegrid")
# Generate sample data
np.random.seed(42)
old_drug = np.random.normal(loc=10, scale=2, size=30) # Old drug data
new_drug = np.random.normal(loc=8, scale=2, size=30) # New drug data
# Combine the samples and create a DataFrame
data = pd.DataFrame({
'Value': np.concatenate([old_drug, new_drug]),
'Group': ['Old Drug']*len(old_drug) + ['New Drug']*len(new_drug)
})
# Step 1: Plot frequency bars for each drug
# Plot frequency bars and continuous curves
fig, ax1 = plt.subplots(figsize=(12, 6))
# Frequency bars
sns.histplot(old_drug, bins=10, color='blue', alpha=0.6, label='Old Drug', kde=False, ax=ax1)
sns.histplot(new_drug, bins=10, color='green', alpha=0.6, label='New Drug', kde=False, ax=ax1)
# Secondary y-axis for KDE plots
ax2 = ax1.twinx()
# Continuous curves
sns.kdeplot(old_drug, color='blue', label='Old Drug KDE', ax=ax2)
sns.kdeplot(new_drug, color='green', label='New Drug KDE', ax=ax2)
# Adjust y-axis limits of ax2 to match ax1
ax2.set_ylim(0, ax1.get_ylim()[1] / len(old_drug))
ax1.set_title('Distribution of Old Drug and New Drug')
ax1.set_xlabel('Value')
ax1.set_ylabel('Frequency')
ax2.set_ylabel('Density')
# Combine legends from both axes
handles1, labels1 = ax1.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(handles1 + handles2, labels1 + labels2, loc='upper right')
plt.show()
# Step 2: Rank the combined data
# Combine the samples and rank them
data['Rank'] = data['Value'].rank()
# Separate the ranks by group
rank_sum_old = data[data['Group'] == 'Old Drug']['Rank'].sum()
rank_sum_new = data[data['Group'] == 'New Drug']['Rank'].sum()
# Step 3: Compute U1 and U2 using the ranks
n1 = len(old_drug)
n2 = len(new_drug)
R1 = rank_sum_old
R2 = rank_sum_new
U1 = n1 * n2 + (n1 * (n1 + 1)) / 2 - R1
U2 = n1 * n2 + (n2 * (n2 + 1)) / 2 - R2
print(f"Rank sum for old drug (R1): {R1}")
print(f"Rank sum for new drug (R2): {R2}")
print(f"U1: {U1}")
print(f"U2: {U2}")
# Display the ranks table
print("\nData with ranks:")
print(data)
# Step 4: Use scipy to compute the p-value
# Mann-Whitney U Test
u_statistic, p_value = mannwhitneyu(old_drug, new_drug, alternative='two-sided')
print(f"Mann-Whitney U statistic: {u_statistic}")
print(f"P-value: {p_value}")
# Conclusion based on p-value
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: There is a significant difference between the two drugs.")
else:
print("Fail to reject the null hypothesis: There is no significant difference between the two drugs.")
Rank sum for old drug (R1): 1149.0
Rank sum for new drug (R2): 681.0
U1: 216.0
U2: 684.0
Data with ranks:
Value Group Rank
0 10.993428 Old Drug 53.0
1 9.723471 Old Drug 43.0
2 11.295377 Old Drug 55.0
3 13.046060 Old Drug 59.0
4 9.531693 Old Drug 39.0
5 9.531726 Old Drug 40.0
6 13.158426 Old Drug 60.0
7 11.534869 Old Drug 56.0
8 9.061051 Old Drug 33.0
9 11.085120 Old Drug 54.0
10 9.073165 Old Drug 35.0
11 9.068540 Old Drug 34.0
12 10.483925 Old Drug 50.0
13 6.173440 Old Drug 7.0
14 6.550164 Old Drug 9.0
15 8.875425 Old Drug 31.0
16 7.974338 Old Drug 22.0
17 10.628495 Old Drug 51.0
18 8.183952 Old Drug 23.0
19 7.175393 Old Drug 15.0
20 12.931298 Old Drug 58.0
21 9.548447 Old Drug 41.0
22 10.135056 Old Drug 48.0
23 7.150504 Old Drug 14.0
24 8.911235 Old Drug 32.0
25 10.221845 Old Drug 49.0
26 7.698013 Old Drug 19.0
27 10.751396 Old Drug 52.0
28 8.798723 Old Drug 30.0
29 9.416613 Old Drug 37.0
30 6.796587 New Drug 12.0
31 11.704556 New Drug 57.0
32 7.973006 New Drug 21.0
33 5.884578 New Drug 6.0
34 9.645090 New Drug 42.0
35 5.558313 New Drug 5.0
36 8.417727 New Drug 26.0
37 4.080660 New Drug 1.0
38 5.343628 New Drug 4.0
39 8.393722 New Drug 25.0
40 9.476933 New Drug 38.0
41 8.342737 New Drug 24.0
42 7.768703 New Drug 20.0
43 7.397793 New Drug 18.0
44 5.042956 New Drug 3.0
45 6.560312 New Drug 10.0
46 7.078722 New Drug 13.0
47 10.114244 New Drug 47.0
48 8.687237 New Drug 29.0
49 4.473920 New Drug 2.0
50 8.648168 New Drug 27.0
51 7.229835 New Drug 16.0
52 6.646156 New Drug 11.0
53 9.223353 New Drug 36.0
54 10.061999 New Drug 46.0
55 9.862560 New Drug 44.0
56 6.321565 New Drug 8.0
57 7.381575 New Drug 17.0
58 8.662527 New Drug 28.0
59 9.951090 New Drug 45.0
Mann-Whitney U statistic: 684.0
P-value: 0.0005561109783724005
Reject the null hypothesis: There is a significant difference between the two drugs.
The Python code that creates the previous graphics with two distributions from two sets of sample data is given at:
https://colab.research.google.com/drive/1DjJ6-Su7yPxyaKYnFPdG6aPpO4vvngQE?usp=sharing
Mann-Whitney U Test numerical example with continuous values variable - Annotated observations
Although, it is possible to apply the Mann-Whitney U test to two samples that come from normal distributions. However, while the test can technically be applied, there are certain drawbacks and considerations to keep in mind:
Applicability
Appropriateness: The Mann-Whitney U test is a non-parametric test designed for situations where the data may not follow a normal distribution. When the data is known to follow a normal distribution, the parametric equivalent (the independent samples t-test) is generally more appropriate because it is more powerful (i.e., it has a higher chance of detecting a true effect when one exists).
Drawbacks of Using Mann-Whitney U Test on Normally Distributed Data
1. Power: The Mann-Whitney U test is less powerful than the t-test when the data is normally distributed. This means that the Mann-Whitney U test is less likely to detect a true difference between the groups if one exists.
2. Efficiency: The efficiency of the Mann-Whitney U test compared to the t-test decreases when the sample size increases. For normally distributed data, the t-test will often yield more accurate results with smaller confidence intervals.
3. Assumptions: The Mann-Whitney U test does not assume normality, but it does assume that the distributions of the two groups are identical in shape (though they can have different medians). If the samples are known to be normally distributed, this assumption is unnecessarily restrictive.
When to Use Mann-Whitney U Test
Non-Normal Data: When the data is ordinal, not normally distributed, or when the sample sizes are very small and normality cannot be assumed.
Presence of Outliers: When the data contains significant outliers that cannot be removed and are likely to affect the results of a t-test.
When to Use Independent Samples t-Test
Normal Data: When the data is approximately normally distributed.
Equal Variances: When the assumption of equal variances is met (or you can use Welch’s t-test if variances are unequal).
Continuous Data: When the data is continuous and interval or ratio scaled.
Results Interpretation
If both tests are applied to normally distributed data, you may find that the t-test gives a more precise p-value with a smaller margin of error.
The Mann-Whitney U test might give a similar result, but with less power and efficiency.
Conclusion
While the Mann-Whitney U test can be used on data from normal distributions, it is not the most efficient choice. The independent samples t-test is more suitable for normally distributed data due to its higher power and efficiency. The Mann-Whitney U test should be reserved for non-normal data or when the assumptions of the t-test cannot be met.
Mann-Whitney U Test numerical example with continuous values variable - normal distributions as input
The objective of the next python code is to provide a better understand of the remarks from previous subsection.
import numpy as np
from scipy.stats import mannwhitneyu, ttest_ind
# Generate normally distributed sample data
np.random.seed(42)
group1 = np.random.normal(loc=10, scale=2, size=30)
group2 = np.random.normal(loc=12, scale=2, size=30)
# Mann-Whitney U Test
u_statistic, p_value_u = mannwhitneyu(group1, group2, alternative='two-sided')
print(f"Mann-Whitney U test: U statistic = {u_statistic}, p-value = {p_value_u}")
# Independent Samples t-Test
t_statistic, p_value_t = ttest_ind(group1, group2)
print(f"Independent samples t-test: t statistic = {t_statistic}, p-value = {p_value_t}")
Mann-Whitney U test: U statistic = 186.0, p-value = 9.791710196799422e-05
Independent samples t-test: t statistic = -4.512913234547555, p-value = 3.176506547470154e-05
The Python code that creates the previous graphics with two distributions from two sets of sample data is given at:
https://colab.research.google.com/drive/1DjJ6-Su7yPxyaKYnFPdG6aPpO4vvngQE?usp=sharing
Mann-Whitney U Test numerical example with continuous values variable - normal and exponential distributions as input
The next code provides a comparison of two samples from two different distributions: normal and exponential distributions.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import mannwhitneyu
# Seaborn style for visualization
sns.set(style="whitegrid")
# Generate sample data
np.random.seed(42)
normal_data = np.random.normal(loc=10, scale=2, size=30) # Normal distribution data
skewed_data = np.random.exponential(scale=2, size=30) + 5 # Skewed distribution data
# Combine the samples and create a DataFrame
data_normal = pd.DataFrame({
'Value': normal_data,
'Group': ['Normal']*len(normal_data)
})
data_skewed = pd.DataFrame({
'Value': skewed_data,
'Group': ['Skewed']*len(skewed_data)
})
# Plot frequency bars and continuous curves side-by-side
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
# Frequency bars for normal data
sns.histplot(normal_data, bins=10, color='blue', alpha=0.6, label='Normal', kde=False, ax=ax1)
# Secondary y-axis for KDE plots for normal data
ax1_2 = ax1.twinx()
sns.kdeplot(normal_data, color='blue', label='Normal KDE', ax=ax1_2)
ax1_2.set_ylim(0, ax1.get_ylim()[1] / len(normal_data))
ax1.set_title('Normal Distribution')
ax1.set_xlabel('Value')
ax1.set_ylabel('Frequency')
ax1_2.set_ylabel('Density')
# Frequency bars for skewed data
sns.histplot(skewed_data, bins=10, color='green', alpha=0.6, label='Skewed', kde=False, ax=ax2)
# Secondary y-axis for KDE plots for skewed data
ax2_2 = ax2.twinx()
sns.kdeplot(skewed_data, color='green', label='Skewed KDE', ax=ax2_2)
ax2_2.set_ylim(0, ax2.get_ylim()[1] / len(skewed_data))
ax2.set_title('Skewed Distribution')
ax2.set_xlabel('Value')
ax2.set_ylabel('Frequency')
ax2_2.set_ylabel('Density')
# Combine legends from both axes
handles1, labels1 = ax1.get_legend_handles_labels()
handles1_2, labels1_2 = ax1_2.get_legend_handles_labels()
ax1.legend(handles1 + handles1_2, labels1 + labels1_2, loc='upper right')
handles2, labels2 = ax2.get_legend_handles_labels()
handles2_2, labels2_2 = ax2_2.get_legend_handles_labels()
ax2.legend(handles2 + handles2_2, labels2 + labels2_2, loc='upper right')
fig.suptitle('Normal vs. Skewed Distribution')
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()
# Combine the samples and rank them
combined_data = pd.concat([data_normal, data_skewed])
combined_data['Rank'] = combined_data['Value'].rank()
# Separate the ranks by group
rank_sum_normal = combined_data[combined_data['Group'] == 'Normal']['Rank'].sum()
rank_sum_skewed = combined_data[combined_data['Group'] == 'Skewed']['Rank'].sum()
n1 = len(normal_data)
n2 = len(skewed_data)
R1 = rank_sum_normal
R2 = rank_sum_skewed
U1 = n1 * n2 + (n1 * (n1 + 1)) / 2 - R1
U2 = n1 * n2 + (n2 * (n2 + 1)) / 2 - R2
print(f"Rank sum for normal data (R1): {R1}")
print(f"Rank sum for skewed data (R2): {R2}")
print(f"U1: {U1}")
print(f"U2: {U2}")
# Display the ranks table
print("\nData with ranks:")
print(combined_data)
# Mann-Whitney U Test
u_statistic, p_value = mannwhitneyu(normal_data, skewed_data, alternative='two-sided')
print(f"Mann-Whitney U statistic: {u_statistic}")
print(f"P-value: {p_value}")
# Conclusion based on p-value
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: There is a significant difference between the two drugs.")
else:
print("Fail to reject the null hypothesis: There is no significant difference between the two drugs.")
Rank sum for normal data (R1): 1232.0
Rank sum for skewed data (R2): 598.0
U1: 133.0
U2: 767.0
Data with ranks:
Value Group Rank
0 10.993428 Normal 53.0
1 9.723471 Normal 44.0
2 11.295377 Normal 55.0
3 13.046060 Normal 59.0
4 9.531693 Normal 41.0
5 9.531726 Normal 42.0
6 13.158426 Normal 60.0
7 11.534869 Normal 56.0
8 9.061051 Normal 36.0
9 11.085120 Normal 54.0
10 9.073165 Normal 38.0
11 9.068540 Normal 37.0
12 10.483925 Normal 49.0
13 6.173440 Normal 16.0
14 6.550164 Normal 19.0
15 8.875425 Normal 34.0
16 7.974338 Normal 28.0
17 10.628495 Normal 51.0
18 8.183952 Normal 30.0
19 7.175393 Normal 25.0
20 12.931298 Normal 58.0
21 9.548447 Normal 43.0
22 10.135056 Normal 47.0
23 7.150504 Normal 23.0
24 8.911235 Normal 35.0
25 10.221845 Normal 48.0
26 7.698013 Normal 27.0
27 10.751396 Normal 52.0
28 8.798723 Normal 33.0
29 9.416613 Normal 39.0
0 7.305502 Skewed 26.0
1 6.160182 Skewed 15.0
2 5.260304 Skewed 4.0
3 6.367094 Skewed 17.0
4 5.069987 Skewed 1.0
5 9.800846 Skewed 45.0
6 5.598916 Skewed 8.0
7 7.172512 Skewed 24.0
8 5.747093 Skewed 11.0
9 6.468222 Skewed 18.0
10 6.582448 Skewed 21.0
11 5.408777 Skewed 6.0
12 11.985614 Skewed 57.0
13 7.984491 Skewed 29.0
14 10.610189 Skewed 50.0
15 9.504304 Skewed 40.0
16 6.822109 Skewed 22.0
17 10.098871 Skewed 46.0
18 5.185311 Skewed 3.0
19 5.436269 Skewed 7.0
20 5.092564 Skewed 2.0
21 5.787064 Skewed 12.0
22 5.984261 Skewed 14.0
23 5.633121 Skewed 9.0
24 8.529116 Skewed 32.0
25 5.882454 Skewed 13.0
26 5.659606 Skewed 10.0
27 6.564814 Skewed 20.0
28 5.303796 Skewed 5.0
29 8.240967 Skewed 31.0
Mann-Whitney U statistic: 767.0
P-value: 2.87897208006701e-06
Reject the null hypothesis: There is a significant difference between the two drugs.
The Python code that creates the previous graphics with two distributions from two sets of sample data is given at:
https://colab.research.google.com/drive/1DjJ6-Su7yPxyaKYnFPdG6aPpO4vvngQE?usp=sharing