See also https://sites.google.com/site/mixinggooddatawithbad/say/dependentp-valuesinmeta-analysis
See also https://sites.google.com/site/mixinggooddatawithbad/say/meta-analysiswithp-values
See also https://sites.google.com/site/computingwithconfidence/say-1/multiplecomparisonsissue
The Wikipedia article on unsolved problems in statistics includes this: "Meta-analysis: Though independent p-values can be combined using Fisher's method, techniques are still being developed to handle the case of dependent p-values."
We are aware of Jost's claims in his on-line manuscript that Fisher's method of combining p-values is itself underdeveloped. Perhaps it would be possible to generalize, or rather relax, Jost's method to handle the Fréchet case in which no assumption at all is made about the dependence among the p-values. The Fréchet inequalities are described here. We would expect a Fréchet-style generalization to yield an interval for the resultant combined p-value if we made no assumption at all about how the initial p-values might be interdependent.
By the way, Jost's manuscript does not say it clearly, but I think he assumes that the tests to be combined are based on independent sets of data, meaning, at a minimum, that the sets of the objects in the data do not overlap across the statistical tests. The manuscript confuses the issue by suggesting that the Bonferroni correction is related to the Jost-Fisher combination of p-values. I don't think it really is. Bonferroni and other similar corrections for multiple comparisons apply in the situation where, for instance, we compare new teaching techniques for writing using two groups of students each taught with the old or new technique and we consider differences in grammar, spelling, organization, etc. The more attributes of the students' skills we compare, the more likely we are to find a significant difference by chance alone, and therefore we must control the experiment-wise error rate using one of the correction methods. But it would be wrong to apply the Jost-Fisher combination to the various p-values from the tests of grammar, spelling, etc.
Below is our generalized implementation and exercising in R of Jost's algorithm.
# Jost, Lou. "Combining Significance Levels from Multiple Experiments or Analyses"
# http://www.loujost.com/Statistics%20and%20Physics/Significance%20Levels/CombiningPValues.htm
# For two p-values, this function is pq(1-log(pq))
jost <- function(p=NULL, n=length(p), k=prod(p)) { # accepts an array of p-values, or their count and product
i = 0:(n-1)
k * sum((-log(k))^i / factorial(i))
}
# numerical examples
jost(c(0.073, 0.086, 0.10, 0.080)) # 0.01112776, in contradiction with Box 18.1 of Sokal and Rohlf even though they are "quite firm"
jost(rep(0.051,2)) # 0.01808179
jost(rep(0.051,5)) # 0.0009378262
jost(c(0.1, 0.56, 0.3)) # 0.2257183
jost(c(0.1, 0.056, 0.03)) # 0.00797379
jost(runif(10)) # 0.1047377
jost(runif(10)) # 0.5367387
jost(runif(10)) # 0.1928351
jost(runif(35)) # 0.6367861
jost(runif(35)) # 0.0498436
jost(runif(35)) # 0.7071325
jost(runif(1000)) # NaN There were 50 or more warnings (use warnings() to see the first 50)
The input p-values may not always be precise scalar values. The could sometimes be intervals. It is convenient that the Jost formula intervalizes easily because it is monotone in both parameters n and k:
jnk <- function(n,k) {
i = 0:(n-1)
k * sum((-log(k))^i / factorial(i))
}
nn = 1:10
ii = 1:100
kk = rev(0.3/ii)
J = matrix(nrow=length(nn),ncol=length(ii))
for (i in ii) for (n in nn) J[n,i] <- jnk(n,kk[i])
persp(nn,log(kk),J,phi=30,theta=-50,tic='detailed')
Here is the intervalized function for combining interval (or scalar) p-values encoded into an interval k:
jost <- function(p=NULL, n=length(p), k=prod(p)) { # accepts an array of interval p-values, or their count and their (interval) product
i = 0:(n-1)
if (is.interval(k)) return(interval(jost(n=n,k=left(k)), jost(n=n,k=right(k))))
k * sum((-log(k))^i / factorial(i))
}
jost(n=2, k=interval(0.04, 0.11)) # Interval: [0.168755, 0.3528002]
It is interesting that the Jost combinations of p-values yields results that are outside of the Fréchet bounds:
jost <- function(p=NULL, n=length(p), k=prod(p)) {
i = 0:(n-1)
k * sum((-log(k))^i / factorial(i))
}
and.frechet = function(a,b) return(c(max(0, a+b-1), min(a, b) ))
doit = function(p1,p2) {
j = jost(c(p1,p2))
f = and.frechet(p1,p2)
if ((left(f) < j) & (j < right(f))) cat('Okay\n') else cat('Eek!\n');
cat('Jost: ',j,'\n')
cat('Frechet: ',f,'\n')
}
p1 = p2 = 0.08
doit(p1,p2)
Okay
Jost: 0.03872933
Frechet: 0 0.08
p1 = 0.08
p2 = 0.28
doit(p1,p2)
Eek!
Jost: 0.1074908
Frechet: 0 0.08
p1 = 0.08
p2 = 0.8
doit(p1,p2)
Eek!
Jost: 0.2399278
Frechet: 0 0.08
p1 = p2 = 0.8
doit(p1,p2)
Eek!
Jost: 0.9256237
Frechet: 0.6 0.8