Quoted (with some modifications) from: Pinker, Steven: Rationality 2021. (pp. 152-153).
The great insight of the Reverend Thomas Bayes (1701–1761) was that the degree of belief in a hypothesis may be quantified as a probability. (This is a subjectivist meaning of “probability.”) Call it prob(Hypothesis), the probability of a hypothesis, or, the degree to which we believe it is true. For example, In the case of medical diagnosis, the hypothesis is that the patient has the disease.
Clearly our credence in any idea should depend on the evidence. In statistical language, our acceptance of a hypothesis should be conditional depending on the evidence. What we want to know is the probability of a hypothesis given the data, or prob(Hypothesis / Data). That’s called the posterior probability, our credence in an idea after we’ve examined the evidence.
This is the basic idea behind Bayesian reasoning. It leads to a formula for conditional probability, applied to belief and evidence. Remember that the probability of A given B is the probability of A and B divided by the probability of B. So the probability of a hypothesis given the data (what we are seeking) is the probability of the hypothesis and the data (say, the patient has the disease and the test result comes out positive) divided by the probability of the data (the total proportion of patients who test positive, healthy and sick).
Stated as an equation: prob(Hypothesis / Data) = prob(Hypothesis and Data) / prob(Data). The probability of A and B is the probability of A times the probability of B given A. Make that simple substitution and you get Bayes’s rule:
Recall that prob(Hypothesis | Data), the expression on the left, is the posterior probability: our updated credence in the hypothesis after we’ve looked at the evidence. This could be our confidence in a disease diagnosis after we’ve seen the test results. Prob(Hypothesis) on the right means the prior probability or “priors,” our credence in the hypothesis before we looked at the data: how plausible or well established it was, what we would be forced to guess if we had no knowledge of the data at hand. In the case of a disease, this could be its prevalence in the population, the base rate. Prob(Data | Hypothesis) is called the likelihood. In the world of Bayes, “likelihood” is not a synonym for “probability,” but refers to how likely it is that the data would turn up if the hypothesis is true. If someone does have the disease, how likely is it that they would show a given symptom or get a positive test result?
And prob(Data) is the probability of the data turning up across the board, whether the hypothesis is true or false. It’s sometimes called the “marginal” probability, not in the sense of “minor” but in the sense of adding up the totals for each row (or each column) along the margin of the table—the probability of getting those data when the hypothesis is true plus the probability of getting those data when the hypothesis is false. A more mnemonic term is the commonness or ordinariness of the data. In the case of medical diagnosis, it refers to the proportion of all the patients who have a symptom or get a positive result, healthy and sick. Substituting the mnemonics for the algebra, Bayes’s rule becomes:
In a simplified version: “Our credence in a hypothesis after looking at the evidence should be our prior credence in the hypothesis, multiplied by how likely the evidence would be if the hypothesis is true, scaled by how common that evidence is across the board.”
And translated into common sense, it works like this: Now that you’ve seen the evidence, how much should you believe the idea? First, believe it more if the idea was well supported, credible, or plausible to start with - if it has a high prior, the first term in the numerator. As they say to medical students, if you hear hoofbeats outside the window, it’s probably a horse, not a zebra.
Second, believe the idea more if the evidence is especially likely to occur when the idea is true—namely if it has a high likelihood, the second term in the numerator. It’s reasonable to take seriously the possibility of methemoglobinemia, also known as blue skin disorder, if a patient shows up with blue skin, or Rocky Mountain spotted fever if a patient from the Rocky Mountains presents with spots and fever.
And third, believe it less if the evidence is commonplace—if it has a high marginal probability, the denominator of the fraction. That’s why we laugh at Irwin the hypochondriac, convinced of his liver disease because of the characteristic lack of discomfort. True, his symptomlessness has a high likelihood given the disease, edging up the numerator, but it also has a massive marginal probability (since most people have no discomfort most of the time), blowing up the denominator and thus shrinking the posterior, our credence in Irwin’s self-diagnosis. How does this work with numbers? Let’s go back to the cancer example. The prevalence of the disease in the population, 1 percent, is how we set our priors: prob(Hypothesis) = .01. The sensitivity of the test is the likelihood of getting a positive result given that the patient has the disease: prob(Data | Hypothesis) = .9. The marginal probability of a positive test result across the board is the sum of the probabilities of a hit for the sick patients (90 percent of the 1 percent, or .009) and of a false alarm for the healthy ones (9 percent of the 99 percent, or .0891), or .0981, which begs to be rounded up to .1. Plug the three numbers into Bayes’s rule, and you get .01 times .9 divided by .1, or .09.
So where do the doctors, and most of us, go wrong? Why do we think the patient almost certainly has the disease, when she almost certainly doesn’t?