S1 Estimating Probabilities

For any Random Variable X with density f(x), we can estimate the probability P(a<X<b) using simulations -- this holds for both continuous and discrete random variables. The method is to generate a large number of IID random variables X1, X2, ... X(N) and then look at the proportion of these which fall within the desired region. The sample proportion will be close to the true probability of the event by the Law of Large Numbers. The ERROR of approximation can be estimated using the Central Limit Theorem. We provide an example to illustrate this concept.

Estimating a Binomial Probability: Suppose X is Bi(10,0.75) and we want to estimate P(X=5). We can actually compute this exactly by using the formula for Binomial probabilities, or using the built in EXCEL function BINOMDIST. But suppose these options are not available, but we have a process to generate IID Binomial Random Variables Bi(10,0.75). Then we can generate 1000 such variables X(1),X(2),...,X(1000). Now COUNT the number of these which are equal to 5; suppose this is B. Then B/1000 is an estimate of the P(X=5). With confidence 95%, we can say that the true probability p is within 2SE=2Sqrt(p(1-p))/31.6, since 31.6=sqrt(1000).

MORE DETAILS: We carry out this process in EXCEL as follows. FIRST we have to decide HOW to generate the IID sequence of Binomials X(1),...X(1000). Note that 1000 is arbitrary -- the actual number should be chosen in accordance with how accurately we want to estimate the probability. A sample size of 400 will give us 95% confidence of accuracy within 5% while 1600 will give us accuracy within 2.5% even in the worst case of p=50%. For other values of p, the accuracy would be even higher. There are MANY methods to generate IID Binomials in EXCEL.

DATA ANALYSIS TOOLS: This is an EXCEL ADD-IN. Go into the Analysis Menu, and choose Random Number Generation. Within this menu, on the third line, there is box for DISTRIBUTION, which has a default set at DISCRETE -- change this to BINOMIAL. After you make this change, Parameters will appear, and you have to fill in p Value = 0.75 and Number of Trials = 10. This tells EXCEL to generate Bi(N,p) random variables with N=10 and p=0.75, which is what the question asks for. Next you specify the number of variables as 1, and the number of Random Numbers as 1000 (or 1600, if you want to get 2.5% or better accuracy). Next you have to specify the Output options, which tell EXCEL where you want these random numbers. If you CHOOSE Output Range and then specify A1, EXCEL will generate these random numbers in the cells A1:A1000.

COUNT the number of 5's: Now use the COUNTIF command to count how many 5's occur in your random sample of 1000. Call this B. Now we are in a position to ANSWER the question. Our estimate of the desired probability is P= B/1000, and an estimate of the 95% confidence interval is SQRT(P*(1-P))/16.3 -- this is the approximate error (Maximum error with 95% confidence) in our estimate.

EXERCISE: Compare this with the EXACT value of the probability computed used BINOMDIST. Check to see if the difference between the two lies within 2 SE's or not

EXTENSION: Do this for ALL the Binomial Probabilities. Compute the Sample Estimates, the SE's and the 95% confidence interval, the true probabilities, and check if the true probabilities lie within the 95% confidence interval. Note that about 1 time in 20 the true p will lie OUTSIDE the confidence interval. That means that if two students do this exercise for 10 outcomes, then one of them will find one case in which the true p falls outside the 95% confidence interval.