For ML training, sample collection is very important. But how do you know if your sample is representative of the whole population? A representative sample is one which is drawn without bias from the population of interest. If you are interested to know about method to validate sample data, then this document helps.
The process of hypothesis testing is to draw inferences or some conclusion about the overall population or data by conducting some statistical tests on a sample.
Null hypothesis
The null hypothesis is the one to be tested. For example A person(say Amit) is innocent (didn't make crime)
Alternate hypothesis
Alternate hypothesis is complementary of null hypothesis. So, in above example, alternate hypothesis is Amit is not innocent
Set the Hypothesis
Set the Significance Level, Criteria for a decision
Compute the test statistics
Make a decision
It is the rejection of a null hypothesis which is true in reality. For example, an innocent person(Amit) is convicted.
It is the non-rejection of a false null hypothesis. For example, a guilty person is not convicted.
It is used to compare quantitative data to check if they came from the same population.
The z-statistic is a measure of how much an observed statistic differs from an expected statistic put forward by the null hypothesis.
Here sigma used is not the standard error of the observed data, but the standard error for the population.
t-Test is modified version of t-test where we compute the mean and standard deviation of the sample. So, t-test doesn't need variance as input.
The job of the p-value(Refer above diagram) is to decide whether we should accept our Null Hypothesis or reject it.
It is used to compare categorical variables from a single population.
The F-test for linear regression tests whether any of the independent variables in a multiple linear regression model are significant
According to the central limit theorem, the distribution of the sample mean follows a normal distribution. t-test relies on this property. Z-test uses the variance relation between the population and sample given by this theorem.
t-test/z-test can be used to check if two samples are drawn from different population.
F-test can be used to improve your linear regression model by making it more complex i.e. by adding more linear regression variables to it.
Before launching a new feature, hypothesis test can be used to predict possible impact of a new feature launch to customer. For example, Netflix can analyse if a new feature can increase user view time.
Hypothesis testing aim is to reject null hypothesis. If it can't be rejected, then it doesn't mean that null hypothesis is acceptable.
t-test and z-test can be used only if the population follows normal distribution. Below are other criteria
Reference
https://www.datacamp.com/community/tutorials/hypothesis-testing-machine-learning
https://en.wikipedia.org/wiki/False_positives_and_false_negatives
https://en.wikipedia.org/wiki/Type_I_and_type_II_errors
https://images.app.goo.gl/6BeSy7hXsCPYpbLA8
https://images.app.goo.gl/2ggYNhu6tg2kfs8V6
https://medium.com/dataseries/hypothesis-testing-in-machine-learning-what-for-and-why-ad6ddf3d7af2
https://www.investopedia.com/terms/t/t-test.asp
https://images.app.goo.gl/Zo7AqFsczr74oy1TA
https://www.analyticsvidhya.com/blog/2020/06/statistics-analytics-hypothesis-testing-z-test-t-test/
https://towardsdatascience.com/introduction-tfrom-the-central-limit-theorem-to-the-z-and-t-distributions-66513defb175
https://www.khanacademy.org/math/statistics-probability/significance-tests-one-sample/more-significance-testing-videos/v/z-statistics-vs-t-statistics
https://www.khanacademy.org/math/statistics-probability/significance-tests-one-sample/more-significance-testing-videos/v/small-sample-hypothesis-test
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume20/long03a-html/node64.html
https://en.wikipedia.org/wiki/Student%27s_t-test
https://images.app.goo.gl/YYikvpgQLjFBvQ7GA
https://mgimond.github.io/Stats-in-R/z_t_tests.html
https://medium.com/datadriveninvestor/p-value-t-test-chi-square-test-anova-when-to-use-which-strategy-32907734aa0e
https://sciencing.com/difference-between-ttest-chi-square-8225095.html
https://images.app.goo.gl/Z8ZZR7wPi7JBrht28
http://www-personal.umd.umich.edu/~acfoos/Courses/381/08%20-%20Hypothesis%20Testing%20with%20z%20Tests.pdf
https://images.app.goo.gl/esP8LkudidrCfabF6
http://facweb.cs.depaul.edu/sjost/csc423/documents/f-test-reg.htm
https://towardsdatascience.com/fisher-test-for-regression-analysis-1e1687867259
https://www.youtube.com/watch?v=kx-pcQAPvoc