Using Averages

If we have data sets that are close but independent, we will have problems with overlap. The figure shows a background well with a data set that has an average of 100 and a SD of 5. The graph also shows a compliance well with an average of 110 and a SD of 5. They are statistically different but we see a lot of overlap. This overlap is a gray zone in which we cannot tell if a data point is from one or the other. In this example, the gray zone ranges from about 95 to 115 (a range of 20, which is 4 SDs). If we take the two same data sets and average the data 4 points at a time, and use these to be our models, the populations become much clearer. This is what is done in the second figure. The gray zone is now from about 103 to 108 range of 5.

Advantages

  • Fewer false positives and negatives

  • Fewer issues with data points that fall at the tails of the data sets

  • Fewer issues with seasonal changes

  • Greater power to the statistical plan because we are doing ¼ as many statistical tests (see Increase the Statistical Power — Reduce the number of Tests)

Disadvantages

  • We have to have four times more data in the background data set. Note; you could only average two points at a time so you would only have to have two times as much data

  • Longer period between statistical reviews of the data