P&S L1 C5 S5

Intro Stats: Islamic Approach -- Part 2 Prob & Stat

Lecture 1: Random Samples -- Concept 5: Law of Large Numbers

S5: Using LLN to get approximate probability

We show how to use EXCEL to replicate an event, and obtain an estimate of the probability of an event, based on the observed frequency.

How often would a classroom of 25 students contain two students with matching birthdays?

We use a MODEL for the situation to obtain a rough and approximate answer to this question.

Step 1: Generate 25 Birthdays, chosen at random.

HOW: Write =RANDBETWEEN(1,365) in cell A1, and copy this to Y1. Then the cells A1 to Y1 contain 25 numbers, Each number is randomly chosen with equal probability between 1 and 365. So now have a SIMULATED set of 25 birthdays which have been generated according to our model of equal probabilities for birth on all days of the year.

Step 2: Check for duplicates.

HOW to check for duplicates of A1:

Write =COUNTIF($A$1:$Y$1:A1) in cell AA1. This checks for the presence of the value in cell A1 in all the cells in the array A1:Y1, and returns the COUNT of how many matches there are. If there are no duplicates of A1 then the count will 1 -- the value in A1 is present ONLY in the cell A1. If this value is also present in some other cell, the count will be 2 or more.

HOW to check for duplicates of B1...Y1

The above checks to see whether or not the value in cell A1 is duplicated in some other cell. The same method can be used to check all the other cells. Just highlight the cell AA1, grab the lower right corner with the mouse and DRAG to AY1. Now each of the cells A?1 tests whether or not the entry in cell ?1 has been duplicated. Any entry of 2 or more means that the corresponding cell occurs two or more times.

Automating the Check

We can visually inspect the numbers to see if anyone is bigger than 1, but this is inconvenient, and not suitable for many replications. To automated the process, write in cell Z1: =IF(SUM(AA1:AY1)>25,1,0) . This EXCEL statement takes the sum of the entries in cells AA1 to AY1. Each of these cells contain a 1 if there is no dupliction and 2 or more if there is duplication. If none of the cells is duplicated, it mean that no two birthdays match. In this case the sum of all 25 entries will be exactly equal to 25. If any one of the cells is duplicated, the sum of the entries will be 26 or more. The IF statement will produce a 1 if there is a duplication and and a 0 if there is not.

STEP 3: Check Frequency Manually

EXCEL re-computes random numbers every time any entry in any cell is changed. Put ANY value into cell Z2, and watch what happens. All the birthdays will be re-computed. All the duplication checks will be recomputed, and the cell Z1 will display whether or not there is a match. Keep track of what happens, and repeat the experiment 10 times. This is like taking 10 different classrooms of 25 students, and checking to see if two birthdays match. On the average you will see 1,2, or 3 matches within 10 trials, reflecting the underlying probability. This will give you a ROUGH idea of what the probability of the event is.

STEP 4: Replicate 100 Times for a larger sample

Instead of manually replications, we can have EXCEL do large number of replications. SELECT the cells from A1 to AY1 and place cursor at bottom right corner of AY1 where you get a + sign. DRAG the cursor down to row 100. After you let go, the first row will be copied down to row 100. Now you have 100 trials, or 100 different classrooms of 20 students. The column Z records the matches. If you write in ANY vacant cell the following entry: =SUM(Z1:Z100), you will get the number of classrooms which had matching birthdays for two or more students. This number is an APPROXIMATION to the true underlying probability of the event.