S4 Swamping vs Balancing

There is a very common and natural misconception about the LLN that must be cleared up. Having an understanding of this distinction differentiates professional statisticians from amateurs. The natural mistake is to think that LLN works by BALANCING:

BALANCING: The average of the observed random variables has to converge to the EX. Suppose it is ABOVE EX, then it has to go down in the long run. Suppose it is BELOW EX, then it has to go up in the long run. This means that when it is ABOVE, it is likely to move down, and when it is below, it is likely to move up.

APPLICATION: You are flipping a fair coin. You observe a sequence of NINE Heads and ONE Tail -- this is very unbalanced in favor of Heads with observed frequency 90% of Heads. The EXPECTED value of the Heads is 50%, so you know that the 90% will go down towards 50% in the long run. SO YOU CONCLUDE that the COIN is MORE LIKELY to come up TAILS, because too many HEADS have already appeared. The COIN must BALANCE the EXCESS of HEADS by producing an EXCESS of Tails in the future.

THIS REASONING IS WRONG, but very common, and also very plausible. Here is the PROBLEM with it:

The coin flips are INDEPENDENT. They have no memory. They do not know what happened in the past. They do not know if the average of the past 10 trial is ABOVE normal or BELOW normal. So it is IMPOSSIBLE for the next trial or sequence of trials to BALANCE what happened in the past.

But this leads to a PARADOX: The subsequent trial DO in fact balance the previous one in the sense that they will move the previous average towards the mean -- this means that high averages will be brought down, and low averages will be brought up. HOW CAN THIS HAPPEN, when the later trials DO NOT KNOW what happened in the earlier trials?

The answer is a bit subtle and complex. Let us continue to consider the case of coins discussed above. Suppose that there are exactly 50 heads in the NEXT 100 trials. What happens when we add these 100 trials to the initial 10? The 9/10 heads will now become 59/110 heads so 90% Heads will move to 54% Heads. On the other hand suppose there was ONLY 1 Head and 9 Tails in the first ten trials for 10% heads in the initial 10 trials. Then 1/10 Heads will change to 51/110 which is 46% heads.

SO WE SEE THE LLN IN ACTION. The next 100 trials do not know anything about what happened in the first 10 trials. They just achieve a balance outcome of 50 heads and 50 tails. This has the automatic effect of moving 90% DOWN to 54% and moving a 10% up to 46%.

The QUESTION remains HOW does this happen, and the answer is SWAMPING. The 100 trials is a LARGE number, an OCEAN compared to a DROP. What happened in the first 10 trials DOES NOT MATTER in determining the overall average for 110 trials. Similarly, what happened in the first 1000 trials will not have much effect on the average on the first 100,000 trials. This is why it is called the Law of LARGE NUMBERS. It requires LARGE NUMBERS of trials to work.