1. Concepts & Definitions
1.1. Defining statistical test of hypothesis
1.2. Numerical example of test of hypothesis for mean
1.3. Code for test of hypothesis for mean
1.4. Code for right tailed test of hypothesis for mean
1.5. Code for left tailed test of hypothesis for mean
1.6. Code for small sample hypothesis for mean
1.7. P-Value and test of hypothesis
1.8. Statistical power and power analysis
1.9. Shapiro Wilk for normality test
2. Problem & Solution
2.1. Shapiro Wilk to verify CLT Simulator
Statistics is the science of analyzing huge amounts of data. In the real world, it is nearly impossible to deduce statistics about the entire population. And this huge amount of data needs interpretation to draw meaningful conclusions. Hence, we take some random samples from the population, derive some statistical measures (e.g. mean, standard deviation, variance), and draw conclusions about relationships from the data collected[1].
Data can be interpreted by assuming a specific outcome and using statistical methods to confirm or reject the assumption. This assumption is called a hypothesis and the statistical test used for this purpose is called hypothesis testing.
In statistics, a hypothesis is a statement about a population that we want to verify based on information contained in the sample data.
Hypothesis testing quantifies an observation or outcome of an experiment under a given assumption. The result of the test enables us to interpret whether the assumption holds true or false. In other words, it signifies if the hypothesis can be confirmed or rejected for the observation made.
The building blocks of hypothesis testing are [2]:
Create a hypothesis statement.
Formulate Null and Alternate Hypotheses.
Choose a statistical test: one-tailed or two-tailed.
Choose a probability distribution: sample size.
Compute test statistics.
Obtain critical values and make a decision.
A good hypothesis statement should [3]:
Include an “if” and “then” statement.
Include both the independent and dependent variables.
Be testable by experiment, survey, or other scientifically sound technique.
Be based on information in prior research (either yours or someone else’s).
Have design criteria (for engineering or programming projects).
Your statement will look like this:
“If I…(do this to an independent variable)….then (this will happen to the dependent variable).”
For example:
If I (decrease the amount of water given to herbs) then (the herbs will increase in size).
If I (give patients counseling in addition to medication) then (their overall depression scale will decrease).
If I (give exams at noon instead of 7) then (student test scores will improve).
If I (look in this certain location) then (I am more likely to find new species).
After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (Ho) and alternate (Ha) hypothesis so that you can test it mathematically [4].
The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.
For example, imagine the adoption of a new procedure or machine that employs artificial intelligence (AI) techniques for inspection. The following question will emerge: This new configuration is more efficient than the previous one? The next figure summarizes this issue.
Depending on how the hypothesis test is formulated, the null and alternative hypotheses can assume three possible configurations concerning their respective signs. For each of the three cases, the region indicating evidence favoring the null or alternative hypothesis will have a different corresponding configuration. These three cases are summarized in the following table.
To draw the region related to each hypothesis case (as done in the column named 'Graphic') the developments made in Track 06 - Section 2.3 are useful. The next section will how to adapt and employ these developments.
The sample size determines the choice between normal or Student's T distributions to compute the critical regions. The next figure gives a comparison between different Student's T distributions with different sample sizes and the normal distribution.
More details on the necessary code development made to built this figure could be found at Track 07 - Section 1.7.
Depending on the distribution and critical level, it is possible to determine Zcrit/Tcrit that are the limits of the critical region. One possible analogy is the definition of confidence interval limits as done in the next figure.
The details about the procedure to determine the confidence interval limits according to a certain confidence level could be found at Track 07 - Section 1.5.
Using observed data to determine Zobs or Tobs. Then, compare Zobs/Tobs with Zcrit/Tcrit to determine if Zobs is in the critical region (reject Ho), or not (do not reject Ho), and make a decision. The next figure gives an example of a Zobs felt in a critical region, which implies that the sample data is evidence to reject the Null Hypothesis.
The conversion of values that follows a non-standard normal to a standard normal, and how to compute the lower and upper interval of a confidence interval which is related to the critical values that define critical regions are covered by the developments made at Track 07 - Section 1.6.
The next figure tries to summarize all the hypothesis test steps described before.
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1bxq2WrqKRTOEZT4Q0pPDbb_526txzOAE?usp=sharing
References:
[1] https://www.analyticsvidhya.com/blog/2021/07/a-simple-guide-to-hypothesis-testing-for-dummies/
[2] https://towardsdatascience.com/understanding-hypothesis-testing-65f9b3e9ab1f
[3] https://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/
[4] https://www.scribbr.com/statistics/hypothesis-testing/