S4 Verifying Elementary Properties

It is useful to get an intuitive, hands-on, and direct understanding of the properties of standard Normal Random variables. Without computers, this was difficult to do. Nowadays, it is very easy to do. Generate 1000 standard Normal Random Variables in EXCEL, using the Data Analysis Method to create STATIC random variables. Put these numbers in cells A1:A1000 Answer the following questions, and then check your spreadsheet to see if these properties hold for the 1000 numbers you have generated.

2. Symmetry: About how many negative numbers ____ and how many positive numbers do you EXPECT to see within your sample of 1000 normals in EXCEL. Count them using the COUNTIF command, and write down how many there actually are. Do you see a rough correspondence between the theoretical, PRE-EXPERIMENTAL values, and the actual, POST-EXPERIMENTAL values.

3. Interquartile Range. About how many numbers should lie between -0.6745 and +0.6745? Count them to see if there is a rough match between the actual observed outcomes, and the theoretical probabilities of these outcomes.

4. What is the theoretical, pre-experimental, probability of outcomes within the range -1 and +1? What is the observed, actual, proportion of outcomes within this interval.

5. Repeat exercise 4 for the range (-2,+2), (-3,+3). About how many numbers do you expect to see within 1000 which are OUTSIDE the interval (-3,+3)? How many are there in your sample?

6. How many values do you expect to see which are OUTSIDE the range (-5,+5)? How many are there?

The above exercises should show you that the normal is a very well behaved distribution. It has no OUTLIERS -- wild and large numbers. Normal models are suitable for situations like this, where no outliers are likely to emerge. However, there are many real world situations which do generate outliers. Analysis using a normal distribution for such situations can be extremely misleading. We provide one example to illustrate the difference.

UNLIKE the standard normal, the CAUCHY distribution is a wild and wacky distribution which generates a lot of outliers. An easy way to generate a Cauchy random variable is to take a RATIO of two standard normals. To do this, generate A SECOND column of 1000 standard normals in cells B1: B1000. Then take the ratio of the first two columns by putting =A1/B1 in column C1 and copy this down to C1000. Now the columns C1:C1000 contain Cauchy random variables. Take the maximum and minimum value of these numbers to see how huge numbers are generated as Cauchy random variable. Another unpleasant feature of the Cauchy is unpredictably large changes. Just read the sequence of numbers as you go down C1, C2, C3 and so on. Fairly soon you will get to a huge number, which is far outside the range of your earlier numbers. If someone was generating these numbers, you could look at the range of the observed values and use this range to form some expectation of what might come up next -- in this case, you would be VERY SURPRISED. For example, on one spreadsheet, the first 11 numbers generated all lie within the range -3 to +3 but the 12th number is 9.96, which is more than 3 times the previous maximum. The Cauchy distribution keeps on generating surprises like this, and requires rather different tools for analysis -- different from the standard statistical tools designed for the Normal Distribution.