S5 RWE(2): Treatments

A population of 200,000 children is given a Polio Vaccine. After one year, the children are tested for Polio. It is found that 100 children developed Polio. ANALYZE this situation using a probability model.

ANALYSIS: In order to apply a probability model, we have to make some assumptions. Let assume that each child has some probability "q" of developing polio WITHOUT vaccine. We cannot learn about this number from this data, because all of the children were given the vaccine. Let us suppose that each child has probability "p" of developing polio after being given the vaccine. It is our hope that p would be much less that q -- the difference between p and q is a measure of the effectiveness of the vaccine. From this sample we can approximately measure p under certain assumptions. From a DIFFERENT sample of children who did not take the vaccine, we could measure q, the probability of developing polio without the vaccine. If q was much larger than p, then we would consider the vaccine to be effective.

PROBABILITY MODEL to MEASURE p. Let us assume that each child has a probability p of developing polio after he/she takes the Polio vaccine. Let assume that all children have the SAME probability of developing polio after taking the vaccine. Let us assume that the children are INDEPENDENT -- that is, after taking the vaccine, the probability that one child develops polio does not depend on whether or not the other children develop polio. ALL of these assumptions are DOUBTFUL or FALSE. Different children may have different probabilities. If one child develops polio, it will increase the probability of nearby children developing polio because polio is contagious. This probability model for polio vaccine and its effect on polio is FALSE. Nonetheless, it may be useful in making some calculations. So we pretend that the model is TRUE and make calculations under this assumption.

UNDER THESE ASSUMPTIONS, each child is like an independent Bernoulli trial, with probability p of developing Polio (=1), and probability 1-p of Not Developing Polio (=0). Let X be the number of children who will develop polio after one year. Then it is clear that X is Binomial with number ot trials N=200,000 and SOME UNKNOWN probability p of developing polio. After one year, we find that 100 children develop polio. Then 100 is the OUTCOME of X -- it is NOT the random variable X. The random variable EXISTS only BEFORE we find out whether or not a child has polio, and while all the children have the chance p of developing polio. AFTER one year, either a child has developed polio or he/she has not developed polio. This is NO LONGER a random variable. It is now a fixed and deterministic outcome. If we know the value of p, then we could compute P(X=100) using the Binomial Probability formula. Unfortunately, p is unknown, hidden variable, and so we cannot compute P(X=100), because the formula for this probability involves p. HOWEVER, according to the law of large numbers, the observed proportion of successes in a large number of trials converges to p. Since 200,000 is fairly large, we could conclude that p should be close to 100/200,000=1/2000=0.05% one twentieth of a percentage point. .