Bayes' rule without independence assumptions

Paper

<<Scott asks Sebastien: Perhaps we can make this website visible to everyone on the web while we work on it, rather than allowing access to it only by subscription. Would you mind that?>>

<<Sebastien to Scott: no.>>

Possible titles:

Bayes' rule without independence assumptions

Bayes' rule without dependence assumptions

Do we have to assume independence to use Bayes' rule?

Do we have to assume samples are independent to use Bayes' rule?

Does Bayes' rule require independent data?

Can dependence assumptions in Bayes' rule be relaxed?

Can dependence assumptions in Bayes' rule be completely relaxed?

Abstract:

Likelihood functions from different pieces of evidence are combined by multiplication under the assumption that the pieces of evidence are independent. But such an assumption is not always tenable, especially when the different data values are collected under similar conditions, or by the same protocol or operator, or when they emerge from a single stochastic process. What should analysts do if they know or suspect that the data are not independent of one another? In principle, the multiplication of likelihood functions can be generalized according to the classical Fréchet inequality governing the probability of a conjunction of events, but we show that fully or even partially relaxing the independence assumption tends to make the results of a Bayesian analysis trivial. Our conclusions concern all methods based on statistical likelihood.

Outline:

Not just priors, but likelihoods too should be handled with robust methods. Imprecision about the likelihood arises in essentially two ways: mensurational uncertainty about empirical values, or uncertainty about dependencies among samples or components of the evidence.

Events A and B are by definition independent if and only if their joint probability factors into the product of their marginal probabilities P(A∩B) = P(A) P(B). Likewise, random variables are independent if and only if their joint probability distribution is the product of their marginal probabilities distributions.

We consider two kinds of independence assumptions commonly used in applications of Bayes' rule. The first is the independence assumption used in Bayesian inferences from sample data where the samples are assumed to be IID. This is the assumption of sample independence. The second is the independence assumption between components in some probabilistic model that employs Bayes' rule. We call these assumptions of component independence. We consider both kinds of independence in each of the following settings:

1. Simplest application of Bayes' rule to an estimation problem (using non-distributional probabilities)
2. Distributional parameter estimation problem (using traditional zipper combination)
3. Positive predictive value after multiple but non-independent tests for a disease (non-distributional)
4. Gang's derivations
5. PPV distributional example (Mossman−Berger convolution)
6. PPV distributional example (Winkler−Smith convolution)

Sources yet to be cannibalized:

<<https://sites.google.com/site/winklerwrinkle/extending-bayes-rule>>

<<bubble-cap model of interval likelihood>>

<<p-box triviality>>

<<ARRA needs text, including Gang's derivations>>

<<Glymour's paper>>

Introduction:

In recent years shortcomings in engineering design have dramatically and sometimes tragically revealed that independence assumptions may not always be reasonable.

Indeed independence assumptions can be dangerous when they are used to compute misleadingly low probabilities of adverse outcomes.

The disaster at Japan's Fukushima Daiichi Nuclear Power Station, for example, showed that designing multiple backup generator subsystems to pump the all-important cooling water did not in fact substantively improve the reliability of the system by creating functional redundance because the failure risks of these generators were not in fact independent. The placement of most of backup generators and all the switching stations in the same vulnerable location was sufficient to erase the intended redundancy advantage that would have otherwise prevented the disaster (<<citation used by Wikipedia that, if the switching stations had not been destroyed by the tsunamic, the plant would not have experienced the meltdown>>).

A more deadly tragedy resulted from the same misplaced reliance on illusory redundance in the New Orleans hospital flooded by Hurricane Katrina. All <<#>> of the hospital's backup electricity generators were in the building's basement and were knocked out immediately by the flood waters. Even years after the failures in New Orleans, there have been several, often deadly repetitions of this kind of failure at other major hospitals where the situation has been dubbed "Katrina-esque" (Ornstein 2012).

Consider the classical application of Bayes' rule to medical diagnosis.

Is it always reasonable to assume that the second test is actually independent of the first?

Of course the is made on the same patient, so if the test result is due to some peculiarity in, say, the blood chemistry of that patient rather that his disease status, then the tests will not be independent as they both depend on the happenstance of test one unusual patient.

But even if we neglect this possibility, as we perhaps must to use Bayes' rule in this context, there are a variety of other reasons that the test results may fail to be independent.

For instance, if we apply the same test, perhaps made by the same manufacturer

Perhaps the test kits were manufactured at the same time, or shipped together, or stored in the same refrigerature so they are the same age or have experienced similar temperature <<variations...what's the word they use for thermal insults?>>.

Perhaps the tests were administered by the health care professional, or read by the same technician, or even in the same laboratory using fixed protocols.

Even if the tests are manufactured by different companies from different regions, administered by different professionals, and read by independent laboratories, the tests themselves may both be based on the same biological reaction, employing similar reagents.

Perhaps the tests are both based

In most cases, analysts are probably not worried about the possibilities for dependence to emerge between the results of multiple medical tests,

except in suspicious cases

Questions left to answer:

1) Is there some reasonable way to model evidence jointly, rather than pushing each datum through its own likelihood function separately and then trying to combine them? I suppose the answer is no, but perhaps there's another way to think about this.

2) Can independence be relaxed in other logical inference problems such as cond() or imply() as distinguished by Wagner?

References:

Glymour, C. (1985). Independence Assumptions and Bayesian Updating," Artificial Intelligence 25: 95-99.

Ornstein, C. (2012). "Why do hospital generators keep failing?", ProPublica, http://www.propublica.org/article/why-do-hospitals-generators-keep-failing

Woodpile:

A variously attributed quip perhaps relevant to the vacuity findings is "You should keep an open mind, but not so open that your brains fall out" .

Page updated

Google Sites

Report abuse