Hypothesis Testing

In this exercise, you will practice using spreadsheet formula to do statistical tests on sets of data. For the first part of this exercise, you will use a synthetic dataset so that you can manipulate the parameters to see how this affects the output of the statistical tests. Note that the test scores are randomly generated based on the parameters, and will constantly recalculate.

Please make a copy of the following spreadsheet in your own Google Drive:

3b-1 Hypothesis testing

This spreadsheet contains the following sections:

1. Columns A to M contain simulated test scores for two cohorts of students for a pre-test and a post-test.

2. Rows 1 to 15 in column O to R contain the parameters used to simulate the test scores. You can manipulate the test scores by changing the parameters in the yellow boxes.

3. Two histograms on the right provide visualisation of the data.

4. Space is provided for you to do your working

Investigate the questions below using the spreadsheet.

Q1: Student X is a member of Cohort 1. Evaluate whether Student X has a significantly higher pre-test score for Cohort 1.

Because we are comparing one student to a cohort, we should use a Z-test. The Z-test compares a single datapoint against the population.

There are two metrics that you get from a Z-test:

The test statistics Z. This is equal to the number of significant differences away from the mean.
For example, for a class with a mean score of 50 and a standard deviation of 10, a score of 65 is 1.5 standard deviations from the mean, i.e. Z=1.5
The p-value. This is the probability that a randomly-selected value from the population will be at least this far from the mean.

For human data, we are able to calculate the p-value because we assume the data follows a known distribution (usually modelled as a normal distribution). For example, a Gaussian/normal curve is known to have the following distribution:

Image source: https://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg

Offhand, this model tells us that 95% of the data is less than 2 standard deviations away.

Intuitively this can be understood as that there is only a 5% chance of finding data at least 2 standard deviations away, so a data point two standard deviations away from the mean is likely to be an outlier or from a different dataset.

Implementation in a spreadsheet

First, we want to calculate the test statistic Z to get an intuitive sense for the data.

You will need to do the following steps:

1. Calculate the mean of the data (click to show solution)

The formula for mean is:

=average(...)

The data we want to understand is cohort 1's pre-test score. Google Sheets is smart enough to ignore non-numerical cells, so we can simply use

=average(C:C)

2. Calculate the standard deviation of the data (click to show solution)

The formula for standard deviation is:

=stdev(...)

The data we want to understand is cohort 1's pre-test score. Google Sheets is smart enough to ignore non-numerical cells, so we can simply use

=stdev(C:C)

3. Put these together with Student X's pre-test score (click to show solution)

We want to know how many standard deviations away from the mean Student X's pre-test score is.

Student X's pre-test score is in cell P15, so the appropriate formula to full into cell P21 is:

=( P16 - average(C:C) ) / stdev(C:C)

Note that since we took Student X's score minus the average score, we are calculating how much higher student X is. In this case it does not really matter whether we take Student X minus cohort or vice versa.

Based on the number of standard deviations, do you think Student X is likely to be significantly different from Cohort 1?

To quantitatively answer that question, we need to perform a Z-test.

The syntax for the formula for Z-test is =ZTEST(data, value, standard deviation):

The first argument is the data we want to test. In this case, this will be Student X's score in cell P15.
The second argument is the population mean.
The third argument is the population standard deviation.

Based on the p-value obtained, evaluate the following hypothesis:

Null Hypothesis H0: Student X's pre-test score is not different from the cohort average.

Alternate Hypothesis H1: Student X's pre-test score is different from the cohort average.

Now, try manipulating the parameters in the yellow boxes and see how this affects the test. What happens if...

The cohort standard deviation is bigger or smaller... (click to show)

The bigger the standard deviation, the broader the spread of data. This can be seen on the histograms.

If the spread is larger, but the value of (Student X - Cohort Mean) remains unchanged, then Z becomes smaller and the p-value is larger.

Intuitively, we can understand this to mean that the larger the spread, the more likely it is to find values away from the mean.

The cohort mean is bigger or smaller... (click to show)

This set of data is set up such that Student X is always a specified value away from the cohort mean (in cell P5).

You will notice that the value of the mean does not affect the calculation of Z or p. All that matters is the value of (Student X - mean).

Q2: Did cohort 1 and 2 perform differently on the pre-test?

Because we are comparing two sets of data against each other, we should use a T-test. Just like the Z-test, we eventually obtain a p-value. In this case, the p-value describes the probability that the two datasets are different.

Implementation in a spreadsheet

The formula for T-test is fairly straightforward:

=TTEST(Dataset 1, Dataset 2, No. of 'tails', 'type')

The first two arguments are the two datasets. In this case, the two pre-test scores are in columns C and J. The order does not matter.
For the third argument, testing is always either '1-tailed' or '2-tailed'. This refers to the number of 'tails' on the bell-curve that we consider:
- If we are only testing if one dataset is 'greater' or 'smaller', we only consider 1 side, so it is 1-tailed.
- If we are testing whether one dataset is 'different', we consider both sides, so it is 2-tailed.
For the last argument, the possible values are:
- 1: Paired test: Each value in dataset 1 corresponds to a value in dataset 2. This is usually used for relating two variables describing the same group of participants, e.g. pre test and post test scores.
- 2 and 3: Two-sample test. 2 and 3 are equal and unequal variance respectively. We usually use 3 by default, unless we have good reason to justify that the two datasets have the same variance.

For this question, we are testing whether the two groups have 'different' scores on the same variable, therefore... (click to show answer)

The appropriate equation is:

=ttest( C:C, J:J, 2, 3 )

since we use a 2 tailed test and are using test type 3.

Based on the p-value obtained, evaluate the following hypothesis:

Null Hypothesis H0: The pre-test score of Cohort 1 is not different from Cohort 2.

Alternate Hypothesis H1: The pre-test score of Cohort 1 is different from Cohort 2.

Now, try manipulating the parameters in the yellow boxes and see how this affects the test.

The cohort mean and standard deviation are bigger or smaller... (click to show)

You should expect similar results as the earlier z-test!

The cohort sizes are bigger or smaller... (click to show)

The t-test uses a Standard Error instead of a Standard Distribution. Generally speaking, SE = SD/√n where n is the sample size.

As the cohort size increases, for the same distribution, we have a smaller Standard Error. You will notice that as the cohort size increases, it has an effect on the p-value.

We can think of it this way: The Standard Error is our metric for how sure we are that the dataset's mean is close to the 'real' mean of the population. Therefore, smaller the standard error, the more confident we are. Intuitively, the more datapoints you have following the same pattern, the more confidence you have in the pattern. Therefore, something that might only be a plausible possible outlier for a small dataset could be justified as a clear definite outlier as you collect more data.

Q3: Did cohort 1 perform significantly better than cohort 2 on the pre-test?

(Click to show answer)

You should have a similar solution as the previous question, but instead of a 2-tailed test we are now using a 1-tailed test:

=ttest( C:C, J:J, 1, 3 )

Q4: Has either cohort shown a significant change between pre and post?

(Click to show answer)

Again, we use a T-test, but since we are looking at two scores for the same person, we should use a paired t-test.

For Cohort 1, the formula is

=ttest( C:C, E:E, 2, 1 )

since we have a 2-tailed test and perform test type 1.

Similarly, for Cohort 2, we have

=ttest( H:H, J:J, 2, 1 )

Q5: Has either cohort shown a significant improvement between pre and post?

(Click to show answer)

Your solution should be the same as earlier, but now performing a 1-tailed test instead of a 2-tailed test.

Q6: Is there a significant difference between the change in pre-test and post-test scores for cohort 1 vs cohort 2?

(Click to show answer)

To perform this evaluation, you should use columns F and M to calculate the change in test scores for each student, then perform a t-test to compare these two columns.

Bonus task: Plot a scatterplot of post-test scores vs pre-test scores with both cohorts on the same set of axes. How would you interpret this data? Can you relate the shape and spread of the data to the parameters used to generate the data?

Click here to go to the next section (Precision, Accuracy, Reliability and Validity)

Page updated

Google Sites

Report abuse