1. Concepts & Definitions
1.1. Defining statistical test of hypothesis
1.2. Numerical example of test of hypothesis for mean
1.3. Code for test of hypothesis for mean
1.4. Code for right tailed test of hypothesis for mean
1.5. Code for left tailed test of hypothesis for mean
1.6. Code for small sample hypothesis for mean
1.7. P-Value and test of hypothesis
1.8. Statistical power and power analysis
1.9. Shapiro Wilk for normality test
2. Problem & Solution
2.1. Shapiro Wilk to verify CLT Simulator
Making a statistical decision always involves uncertainties, so the risks of making these errors are unavoidable in hypothesis testing [1].
All statistical hypothesis tests have a chance of making either of the following types of errors [1, 2]:
Type I Error: Incorrect rejection of a true null hypothesis or a false positive. The probability of making a Type I error is the significance level, or alpha (α).
Type II Error: Incorrect acceptance of a false null hypothesis or a false negative. The probability of making a Type II error is beta (β).
The next figure summarizes the relation between the test of hypothesis, type or error, and their respective probability of making each type of error.
Statistical power, or the power of a hypothesis test is the probability that the test correctly rejects the null hypothesis (1-β). That is the probability of a true positive result. It is only useful when the null hypothesis is rejected [4].
Another possible interpretation of an alternative hypothesis (Ha) is that there is a difference in the mean score of the tests. In other words, each group's mean score belongs to a distinct sampling distribution. The next figures help to illustrate this aspect [3].
The blue curve is the distribution of the data if the alternate hypothesis (Ha) is true.
The significance level and the dark blue overlap are mutually exclusive: when a p-value falls inside the red area (i.e. rejection area), there can be no Type II error, because we will be rejecting the null hypothesis. Therefore, when α increases, the dark blue overlap must shrink (β) [3].
The higher the statistical power, the lower the probability of having a type II error. Consequently, it means that there is a higher chance of detecting an effect when there actually is an effect to measure [5].
In reality, results from experiments, which have low statistical power will lead to wrong conclusions and eventually have a negative impact on the decision-making process. Commonly, the statistical power is set at 80% to ensure that the tests or experiments yield accurate and reliable results [5].
Power analysis is built from the following variables [6]:
Effect Size: The quantified magnitude of a result present in the population. Effect size is calculated using a specific statistical measure, such as Pearson’s correlation coefficient for the relationship between variables, or Cohen’s d [7].
Sample Size: The number of observations in the sample.
Significance: The significance level used in the statistical test, e.g. alpha. Often set to 5% or 0.05.
Statistical Power: The probability of accepting the alternative hypothesis if it is true.
A power analysis involves estimating one of these four parameters given values for three other parameters. This is a powerful tool in both the design and the analysis of experiments that we wish to interpret using statistical hypothesis tests.
As a beginner, we can start with sensible defaults for some parameters, such as a significance level of 0.05 and a power level of 0.80. We can then estimate a desirable minimum effect size, specific to the experiment being performed. A power analysis can then be used to estimate the minimum sample size required.
A detailed description of how to apply this on A/B testing can be found at [8, 9, 10].
References:
[1] https://www.scribbr.com/statistics/type-i-and-type-ii-errors/
[2] https://www.geeksforgeeks.org/introduction-to-power-analysis-in-python/
[3] https://medium.com/analytics-vidhya/statistical-power-c6356b63b75
[4] https://machinelearningmastery.com/statistical-power-and-power-analysis-in-python/
[5] https://www.nickmccullum.com/power-analysis-in-python/
[6] https://medium.com/data-science-community-srm/statistical-power-and-power-analysis-98cf4e10b064
[7] https://en.wikipedia.org/wiki/Effect_size#Cohen%27s_d
[8] https://blog.statsig.com/calculating-sample-sizes-for-a-b-tests-7854d56c2646
[9] https://blog.statsig.com/you-dont-need-large-sample-sizes-to-run-a-b-tests-6044823e9992
[10] https://www.geeksforgeeks.org/introduction-to-power-analysis-in-python/