A probability is the likelihood of an event or outcome. Probabilities are specified mathematically by a number between 0 and 1 including 0 or 1.
0 is no likelihood an event will occur.
1 is absolute certainty an event will occur.
0.5 is an equal likelihood of occurrence or non-occurrence.
Any value between 0 and 1 can occur.
We use the notation P(eventLabel) = probability to report a probability.
There are three ways to assign probabilities.
Intuition or subjective estimate
Equally likely outcomes
Relative Frequencies
Intuition/subjective measure. An educated best guess. Using available information to make a best estimate of a probability. Could be anything from a wild guess to an educated and informed estimate by experts in the field.
Equally Likely Events: Probabilities from mathematical formulas
In the following the word "event" and the word "outcome" are taken to have the same meaning.
Probabilities versus Statistics
The study of problems with equally likely outcomes is termed the study of probabilities. This is the realm of the mathematics of probability. Using the mathematics of probability, the outcomes can be determined ahead of time. Mathematical formulas determine the probability of a particular outcome. All measures are population parameters. The mathematics of probability determines the probabilities for coin tosses, dice, cards, lotteries, bingo, and other games of chance.
This course focuses not on probability but rather on statistics. In statistics, measurement are made on a sample taken from the population and used to estimate the population's parameters. All possible outcomes are not usually known. is usually not known and might not be knowable. Relative frequencies will be used to estimate population parameters.
Calculating Probabilities
Where each and every event is equally likely, the probability of an event occurring can be determined from
probability = ways to get the desired event ÷ total possible events
or
probability = ways to get the particular outcome ÷ total possible outcomes
Dice and Coins
Binary probabilities: yes or no, up or down, heads or tails
P(head on a penny) = one way to get a head ÷ two sides = 1/2 = 0.5 or 50%
That probability, 0.5, is the probability of getting a heads or tails prior to the toss. Once the toss is done, the coin is either a head or a tail, 1 or 0, all or nothing. There is no 0.5 probability anymore.
Over any ten tosses there is no guarantee of five heads and five tails: probability does not work like that. Over any small sample the ratios of expected outcomes can differ from the mathematically calculated ratios.
Over thousands of tosses, however, the ratio of outcomes such as the number of heads to the number of tails, will approach the mathematically predicted amount. We refer to this as the law of large numbers.
In effect, a few tosses is a sample from a population that consists, theoretically, of an infinite number of tosses. Thus we can speak about a population mean μ for an infinite number of tosses. That population mean μ is the mathematically predicted probability.
Population mean μ = (number of ways to get a desired outcome) ÷ (total possible outcomes)
Dice: Six-sided
A six-sided die. Six sides. Each side equally likely to appear. Six total possible outcomes. Only one way to roll a one: the side with a single pip must face up. 1 way to get a one/6 possible outcomes = 0.1667 or 17%
P(1) = 0.17
Dice: Four, eight, twelve, and twenty sided
The formula remains the same: the number of possible ways to get a particular roll divided by the number of possible outcomes (that is, the number of sides!).
Think about this: what would a three sided die look like? How about a two-sided die? What about a one sided die? What shape would that be? Is there such a thing?
Two dice
Ways to get a five on two dice: 1 + 4 = 5, 2 + 3 = 5, 3 + 2 = 5, 4 + 1 = 5 (each die is unique). Four ways to get/36 total possibilities = 4/36 = 0.11 or 11% Note that there are 36 total possibilities, which is six squared. There is a formula for coins and dice to obtain the total number of possibilities: (number of sides)^(number of coins or dice). The ^ symbol is exponentiation. 6^2 is 36, so 36 unique combinations. I know this is confusing: but try using a white die and a black die and you can see that ⚀⚁ is different from ⚁⚀.
In the medical fields probabilities arise in the discussion of risk. Risk can be categorized as one's relative risk and one's absolute risk. Risks are probabilities.
Relative risk and absolute risk difference are two ways to describe how a specific factor (like a habit or treatment) influences the likelihood of an outcome. While they use the same data, they provide very different perspectives on the impact of a risk factor.
Relative Risk (RR)
Relative risk is a ratio that compares the risk of an event occurring in one group to the risk in another. It highlights the strength of an association between a factor and an outcome.
How it is calculated: You divide the risk of the "exposed" group by the risk of the "unexposed" group.
Interpretation: It tells you how many times more (or less) likely an event is to happen in one group compared to another. For example, a relative risk of 6 means the exposed group is six times more likely to experience the outcome.
For example, Taiwan has an estimated two million regular users of betel nut. Taiwan also has roughly 8000 cases of oral cancer per year which is primarily linked to betel nut consumption. This represents on the order of 0.4% of the chewing population being diagnosed each year. 0.00325% are chewing betel nut without tobacco and 0.39675% are chewing betel nut with tobacco. The relative risk of cancer that adding tobacco presents to a chewer is 122 times the rate of those who do not chew with tobacco. Tobacco supercharges the carcinogenicity of chewing betel nut.
A study in India characterized the relative risk of oral cancer as a result of chewing betel nut versus not chewing betel nut:
Areca nut chewing is one of the major risk factors for oral cancer, with large-magnitude risks reported in studies comparing betel quid chewers and never users, and it has been evaluated as a group 1 carcinogen by the International Agency for Research on Cancer. Data from a high-quality meta-analysis examining risk estimates are presented in summary form with additional information from more recent studies (pooled adjusted relative risk, 7.9; 95% CI, 7.1 to 8.7).
Areca Nut and Oral Cancer: Evidence from Studies Conducted in Humans
The study quotes a relative risk of 7.9 with a 95% confidence interval (chapter nine) from 7.1 to 8.7. A betel nut chewer in India is eight times more likely to develop oral cancer than someone who does not chew.
Absolute Risk Difference (ARD)
Absolute risk difference (also called attributable risk) is the subtraction of one risk from another. It measures the actual change in probability or the size of the effect in the population.
How it is calculated: You subtract the risk of the unexposed group from the risk of the exposed group.
Interpretation: It tells you the actual percentage point increase or decrease in risk. For example, if one group has a 15% risk and the other has a 2.5% risk, the absolute risk difference is 12.5%.
For the Taiwan betel nut data, the absolute risk difference is:
0.39675% - 0.00325% = 0.394%
Key Differences in Interpretation
The sources highlight that these two metrics can lead to very different perceptions of data:
Perception: Relative risk can look dramatic even when the actual change in risk is small. For instance, a "200% increase in risk" (Relative Risk) sounds more alarming than a "2% increase in total population risk" (Absolute Risk Difference).
Clinical Use: Relative risk is often used to show how strongly a factor (like sugary drinks) is linked to a disease (like diabetes). Absolute risk difference is more useful for understanding the practical impact of an intervention or risk factor on a specific group.
Another factor to consider is whether the risk factor is being calculated on a lifetime basis or an annual basis. The example using betel nut is based on annual rates. Over the course of a lifetime of chewing, a chewer faces that risk level annually. So while 0.4% looks like a small risk on an annual basis, over a twenty year time frame that rises to a one in twelve chance of developing oral cancer.
The sample space set of all possible outcomes in an experiment or system.
Bear in mind that the following is an oversimplification of the complex biogenetics of achromatopsia for the sake of a statistics example. Achromatopsia is controlled by a pair of genes, one from the mother and one from the father. A child is born an achromat when the child inherits a recessive gene from both the mother and father.
A is the dominant gene
a is the recessive gene
A person with the combination AA is "double dominant" and has "normal" vision.
A person with the combination Aa is termed a carrier and has "normal" vision.
A person with the combination aa has achromatopsia.
Suppose two carriers, Aa, marry and have children. The sample space for this situation is as follows:
The above diagram of all four possible outcomes represents the sample space for this exercise. Note that for each and every child there is only one possible outcome. The outcomes are said to be mutually exclusive and independent. Each outcome is as likely as any other individual outcome. All possible outcomes can be calculated. the sample space is completely known. Therefore the above involves probability and not statistics.
The probability of these two parents bearing a child with achromatopsia is:
P(achromat) = one way for the child to inherit aa/four possible combinations = 1/4 = 0.25 or 25%
This does NOT mean one in every four children will necessarily be an achromat. Suppose they have eight children. While it could turn out that exactly two children (25%) would have achromatopsia, other likely results are a single child with achromatopsia or three children with achromatopsia. Less likely, but possible, would be results of no achromat children or four achromat children. If we decide to work from actual results and build a frequency table, then we would be dealing with statistics.
The probability of bearing a carrier is:
P(carrier) = two ways for the child to inherit Aa/four possible combinations = 2/4 = 0.50
Note that while each outcome is equally likely,there are TWO ways to get a carrier, which results in a 50% probability of a child being a carrier.
Note to instructors: This book is unapologetically frequentist in its approach. Hard core long term frequency is the probability of an event. The author is keenly aware of the Bayesian objection to this definition and approach to statistics. This is an introductory first statistics course for students who have not studied any statistics prior to this class. The intent is to bring them up to speed in frequentist statistics leading in confidence intervals and two-tailed hypothesis testing and then have them use these tools in data exploration exercises. This text deliberately synonymizes relative frequency with probability.
The third way to assign probabilities is from relative frequencies. Each relative frequency represents a probability of that event occurring for that sample space. Body fat percentage data was gathered from 58 females. The data had the following characteristics:
sample size n (count): 59
mean: 28.7
Standard deviation sx 7.1
min 15.6
max 50.1
Note that the classes are not equal width in this example.
This means there is a...
0.05 (five percent) probability of a female student in the sample having a body fat percentage between 12 and 20 (athletically fit)
0.25 (25%) probability of a female student in the sample has body fat percentage between 20.1 (the Tanita unit only measured to the nearest tenth) and 24 (physically fit)
0.41 (41%) probability of a female student in the sample has body fat percentage between 24.1 and 31 (acceptable but not fit level of fat)
0.20 (20%) probability of a female student in the sample has body fat percentage between 31.1 and 39 (on the borderline between acceptable and obese)
0.08 (8%) probability of a female student in the sample has body fat percentage between 39.1 and 51 (medically obese)
The most probable result (most likely) is a body fat measurement between 24.1 and 31 with a 41% probability of a student being in each of either of these intervals.
Remember that...
The sum of the frequencies is the sample size n.
The sum of the relative frequencies is always one: probabilities add to one, which is also 100%.
The sum of the frequencies being the sample size and the sum of the relative frequencies were ways to check the accuracy of the frequency table.
For relative frequency probability calculations, as the sample size increases the probabilities get closer and closer to the true population parameter (the actual probability for the population). Bigger samples are more accurate.