1. Concepts & Definitions
1.1. A Review on Parametric Statistics
1.2. Parametric tests for Hypothesis Testing
1.3. Parametric vs. Non-Parametric Test
1.4. One sample z-test and their relation with two-sample z-test
1.5. One sample t-test and their relation with two-sample t-test
1.6. Welch's two-sample t-test: two populations with different variances
1.7. Non-Parametric test for Hypothesis Testing: Mann-Whitney U Test
1.8. Non-Parametric test for Hypothesis Testing: Wilcoxon Sign-Rank Test
1.9. Non-Parametric test for Hypothesis Testing: Wilcoxon Sign Test
1.10. Non-Parametric test for Hypothesis Testing: Chi-Square Goodness-of-Fit
1.11. Non-Parametric test for Hypothesis Testing: Kolmogorov-Smirnov
1.12. Non-Parametric for comparing machine learning
2. Problem & Solution
2.1. Using Wilcoxon Sign Test to compare clustering methods
2.2. Using Wilcoxon Sign-Rank Test to compare clustering methods
2.3. What is A/B testing and how to combine with hypothesis testing?
2.4. Using Chi-Square fit to check if Benford-Law holds or not
2.5. Using Kolmogorov-Smirnov fit to check if Pareto principle holds or not
Parametric Test Definition
Parametric test in statistics refers to a sub-type of the hypothesis test . Parametric hypothesis testing is the most common type of testing done to understand the characteristics of the population from a sample.
While there are many parametric test types, and they have certain differences, few properties are shared across all the tests that make them a part of ‘parametric tests’. These properties include [1]:
1. The population should be normally distributed (at least approximately). The outputs from such tests cannot be relied upon if the assumptions regarding the population deviate significantly.
2. A large sample size is required to run such tests. Theoretically, the sample size should be more than 30 so that the central limit theorem can come into effect, making the sample normally distributed.
3. These tests are only helpful with continuous/quantitative variables.
4. Measurement of the central tendency (i.e., the central value of data) is typically done using the mean.
5. Such tests are more powerful, especially compared to their non-parametric counterparts for the same sample size.
Parametric Test Assumptions
Parametric tests usually assume three things [2]:
Independence of cases: samples are independent observations.
Normality: sample data come from a normal distribution (or at least is symmetric).
Homogeneity of variances: sample data come from a population with the same variance.
However, in real life, these assumptions can hardly be met. Non-Parametric Tests have much more relaxed assumptions and they are either distribution-free or having a specified distribution but with the distribution’s parameters unspecified.
Non-parametric Test Assumptions
To summarize, non-parametric tests can be applied to situations when [3] :
The data does not follow any probability distribution.
The data constitutes of ordinal values or ranks.
There are outliers in the data.
The data has a limit of detection.
Comparing parametric and non-parametric tests
This is also the reasons that non-parametric tests are also referred to as distribution-free tests. In modern days, Non-parametric tests are gaining popularity and an impact of influence some reasons behind this fame are [4]:
1. The main reason is that there is no need to be mannered while using non-parametric tests.
2. The second reason is that we do not require to make assumptions about the population given (or taken) on which we are doing the analysis.
3. Most of the nonparametric tests available are very easy to apply and to understand also i.e. the complexity is very low.
The next table details a comparison between parametric and non-parametric tests.
The following table can help you understand when and where you should use the parametric tests or their non-parametric counterparts and their advantages and disadvantages [1].
The next Figure from [5] provides a more clear vision about methods available under each parametric or non-parametric testing.
The next graphics from [2] provides a flowchat to know how to select a parametric or non-parametric test.
Adapting the previous figure to provide a guide in terms of sections for Track 11 gives the following figure.
The reference [6] provides a detailed classification of the non-parametric tests according to three classes:
Goodness of Fit Tests: These tests are used to determine if sample data matches a hypothetical distribution. They check how well the observed data fit the expected distribution.
Example: Chi-Square Goodness of Fit Test: This test compares the observed frequencies of categories of a categorical variable with the expected frequencies. For instance, if a die is fair, the probability of each face (1 through 6) is 1/6. By rolling the die many times, you can use the Chi-Square Goodness of Fit Test to check if the observed frequencies of each face match the expected frequencies.
Tests for Independence: These tests determine whether two variables are independent or not. They assess if the occurrence of one variable affects the occurrence of another variable.
Example: Chi-Square Test for Independence: This test is used to examine if there is a significant association between two categorical variables. For example, you might want to test if gender (male/female) is independent of voting preference (yes/no). The test will determine if the distribution of voting preference is different for males and females.
Tests for Homogeneity: Tests for homogeneity compare the distribution of a categorical variable across different populations. They assess if different populations have the same distribution for a particular variable.
Example: Chi-Square Test for Homogeneity: This test is similar to the Chi-Square Test for Independence but is used when you have two or more groups and want to compare the distribution of a single categorical variable across these groups. For instance, you might want to test if the distribution of blood types (A, B, AB, O) is the same across different ethnic groups. This test will determine if there are significant differences in the distribution of blood types among these groups.
The reference [3] provides a summary of equations employed in some parametric and non-parametric testing.
References:
[1] https://www.analytixlabs.co.in/blog/parametric-and-non-parametric-test/
[2] https://towardsdatascience.com/non-parametric-tests-in-hypothesis-testing-138d585c3548
[3] https://www.analyticsvidhya.com/blog/2017/11/a-guide-to-conduct-analysis-using-non-parametric-tests/
[6] https://kindsonthegenius.com/blog/what-are-non-parametric-tests/