ISIPTA reviews

The first draft of the ISIPTA paper had several problems, many of which were identified by careful and conscientious reviewers.  The remedies for these problems yielded the current draft of the ISIPTA paper.  The "revision report" to the ISIPTA editors explaining the revisions (or lack of revision) in response to individual comments by the reviewer is archived below.  The reviewer comments appear in red.

We thank all three reviewers for their thoughtful opinions.  We have made efforts to adopt all of their recommendations, with details explained below interstitially among their comments.

----------------------- REVIEW 1 ---------------------

TITLE: Computing with Confidence

AUTHORS: Scott Ferson, Michael Balch, Kari Sentz and Jack Siegrist

OVERALL EVALUATION: 0 (borderline paper)

----------- REVIEW -----------

The authors need to clarify what is the contribution of their paper. My impression is that they just provide some means of computing through MC simulation several quantities in relation with c-boxes.

We’ve added text to the last paragraph of the introduction to clarify the contribution of the paper.  We intend its contribution to be to bring c-boxes and a new way to empirically characterize parameters and parameterized distributions to the attention of ISIPTA participants, specifically users of Walley’s imprecise beta model (IBM), p-boxes and other credal sets.

The authors are good at stirring interest in provoking discussions with various quotes on confidence intervals and credible intervals and in this their paper might be of interest in the conference. For a journal paper they would need much more content.

Okay.  We’ll look for more content if this conference paper eventually appears as a journal paper.

Most of the analysis they do in relation with a c-box and the binomial model, may be embedded in a standard Bayesian analysis with 'improper' beta(0,1) and beta (1,0) priors (and the distributions 'in between'). This should be pointed out. Note that at least one of the posteriors could be improper.

We’ve added language to our introduction of the IBM that mentions this fact and explains that, if k=0 or k=n, one of the posteriors will also be one of the limiting distributions. 

Some things that should be mentioned are pivotal quantities (in relation with confidence distributions) and Bonferroni's inequality (in relation with multiple confidence intervals).

It is not clear what the reviewer particularly wants us to say about either pivotal quantities or the Bonferroni inequalities.  The former is mentioned in section 3 as one of multiple methods to derive how c-boxes.  In section 4 on computing with confidence, we mention that the c-box approach obviates some of the uses of Bonferroni and Šidák corrections.  We expect that the c-box approach yields better (tighter) results than would be possible with either of the traditional approaches, but a comparison to justify such a claim is beyond the scope of the present paper.

Some pieces of the text may be removed (like the excerpts from Bayes paper or the well-known info from the binomial distribution).

The paper is well under the page limit, and the text suggested for removal is quite short in any case, so we have retained both passages.  The Bayes quotation provides an interesting historical connection and motivation, and we expect the synopsis about the binomial distribution might be helpful to some readers, especially in understanding why there is no confidence distribution for its parameter.

----------------------- REVIEW 2 ---------------------

TITLE: Computing with Confidence

AUTHORS: Scott Ferson, Michael Balch, Kari Sentz and Jack Siegrist

OVERALL EVALUATION: -2 (reject)

----------- REVIEW -----------

The paper considers the foundations of statistical inference and the use of imprecise probabilities in this context. This is certainly a very interesting topic for the ISIPTA community.

We are encouraged that that the reviewer thinks so.

Unfortunately, the paper is rather confused and superficial. In particular, its goal is not very clear: if the goal is to illustrate c-boxes, then how is it possible that these are not clearly introduced?

This was indeed the goal.  We agree with the reviewer that the section introducing c-boxes as generalizations of confidence distributions, especially its first paragraph, was poorly constructed.  We have revised it to make the ideas clearer and simpler.

In fact, the whole paper consists of a text with very few formulas and almost no clear definitions. Section 3, which should be central if the goal were to introduce c-boxes, is particularly unclear: for instance, what do "nominally continuous but interval-uncertain data", "mensurational uncertainty", "significance-value function" mean?

We disagree with the suggestion, perhaps not intended by the reviewer, that a paper at ISIPTA should have lots of formulas.  We intend this paper to bypass the mathematical derivation of c-boxes as this is complex, already published elsewhere, and not necessary for the purposes of the paper.  We have, however, tried to improve the clarity of definitions wherever possible.

We apologize to the reviewer for the confusion, coming fast and furious in the several phrases (s)he points out.  We were trying to be terse.  We have changed the text to better explain the phrases the reviewer found confusing with expanded language.  The phrase “significance-value function” refers to the function (of parameters and data) that produces significance values, i.e., p-values in a significance test.  The phrase “nominally continuous but interval-uncertain data” refers to empirical data that comes in the form of intervals about measurements of a continuously varying variable.  We use the phrase “mensurational uncertainty” to refer to non-probabilistic uncertainty associated with measurements, that is, the plus-minus part of an individual measurement implied by the precision of gradations on a ruler or scale, the finite decimal places of a digital readout, or other observability limitations from the measurement protocol.  We find that the phrase “measurement uncertainty” too quickly evokes a different meaning in the reader’s mind, and this preconception is usually not the kind of uncertainty we want.

As a consequence, it is impossible to separate (unjustified) opinions from facts. For example, the following strong opinions do not seem to be justified:

.confidence intervals cannot "express inferential uncertainty that may arise from small sample size, from mensurational uncertainty due to non-negligible interval-censoring, and from demographic uncertainty arising under discrete sampling models"

What we were trying to say is that c-boxes, because they have the form of p-boxes, generalize distributional, interval and point estimators.  We’ve rewritten this passage to make the claim clearer and remove the unintended implication.

.propagating c-boxes is "much more efficient than propagating individual confidence intervals" (how is it possible? c-boxes describe particular kinds of confidence intervals)

There has been heretofore no practical way to project confidence intervals of parameters to get confidence intervals for mathematical functions of those parameters, although some clunky strategies involving Bonferroni or Šidák corrections have been possible.  C-boxes describe an infinite family of confidence intervals, and the calculations deliver all these confidence intervals all at once.  We retain the assertion “much more efficient…” but mention that the calculations do not require using Bonferroni or Šidák corrections.   

."confidence distributions are in fact widely used throughout modern statistics" (they are at least not often explicitly used; and by the way, calling Student's t distributions "confidence distributions" is very audacious...)

It is true that they are not often used under the name confidence distribution, and we have indicated this fact.  We have rewritten the passage to make clear that it is Efron’s assertion that bootstrap distributions are confidence distributions that justifies the claim that confidence distributions are widely used in statistics, albeit under a different name.

We no longer say that Student’s t-distributions are confidence distributions, but we do mention their “intimate connection” with t-distributions.  We make this change even though we feel we are trading audaciousness for pedantry;  the distribution of the random variable Tn-1 (from the t-distribution with n-1 degrees of freedom) is not a confidence distribution, but the scaled and shifted distribution of  xbar + sTn-1/ √is of course a confidence distribution.

."there is no confidence distribution for the binomial probability" (few lines above you cite Efron, 1988, who consider a confidence distribution for the binomial probability)

A page citation would be helpful if the reviewer is referring to a particular claim by Efron.  There are many related schemes that may come close to realizing this ideal in some circumstances, especially when data are numerous, but modifiers such as “approximate” or “asymptotic” are essential in characterizing these schemes.  There is no single, precise confidence distribution for the binomial parameter.  One cannot get the necessary coverage with any single distribution.  The problem arises fundamentally from trying to estimate a continuous parameter from discrete observations.  We use the phrase “demographic uncertainty” in reference to this effect.

Furthermore, the interpretation of confidence statements is superficial: relative frequency and (theoretical) probability are often confused, the "correctness" of the statistical model is never addressed, and in Section 5 confidence distributions are simply interpreted as a kind of posterior probability.

Doubt about the correctness of the probability model is certainly of interest, although a broadside attack on this problem is clearly beyond the scope of this paper.  One thing that is emphasized in the paper is the c-box approach relaxes the assumption about beta distributions that is embodied in Walley’s imprecise beta model.  The correct model may fail to be the conjugate beta, and yet the c-box approach will still work, which we take as an advantage of c-boxes over the IBM, and a first step in the direction of accounting for doubt about the correctness of the probability model.  Further steps such as relaxing assumptions about the independence of samples or the constancy of the binomial parameter, or the normal shape of the generating distribution, are all of interest to us and are part of our future work, but they seem well beyond what should be the focus of this paper.  Perhaps they are content for enriching this conference paper into a journal paper.

We are trying to suggest that confidence distributions and c-boxes are frequentist analogs of the Bayesian posterior distribution, though we would not say they are “simply interpreted as” Bayesian posteriors because, of course, the interpretation is entirely different, because confidence entails coverage which is not necessarily part of the Bayesian paradigm.

We are not sure where in the text the reviewer feels we have confused relative frequency and theoretical probability, or whether that might be problematic in any way.

The use of R code instead of formulas is also questionable: many readers do not know R (and those who do should be able to calculate a confidence interval for the mean of a normal distribution...), and R formulas are certainly not clearer than usual mathematical ones.

We’re not sure that the class of readers who will find the R code snippets useful is larger than the class of readers who would much prefer mathematical expositions, but the barriers to entry are certainly smaller for the former class than for the latter.  As anyone who programs will understand, presenting the ideas as mathematical equations will also have ambiguities that the R code does not have.  The example R code snippets, including ancillary routines like the alphabeta function, can be taken directly from the paper and run without modification or need for implementation.  We believe this will indeed be convenient, even for readers who “should be able to calculate a confidence interval for the mean of a normal distribution”.  The R code functions can be used not only to construct c-boxes but also to demonstrate their coverage properties—and convey a sense for the usefulness of c-boxes—even to readers who find the mathematical expositions hard to follow (as many have complained about Balch’s original paper).  Finally, the total space devoted to the R code in the paper is rather short, amounting to a little more than half of one column of one page, so complaints about it are bit hard to fathom.  We elect to retain the R code.

Moreover, the "didactical" Monte Carlo simulations are nonsense: do we really need to check statements from the theory of probability? (using simulations based on the same theory, by the way)

As we did not use the word “didactic” in our description of the R code snippets, we presume that the reviewer’s use of quotes around the word indicate sarcasm.  We reject the assertion that the simulations are nonsense.  First, they provide a simple argument that c-boxes do what is claimed about them, and this argument is compelling even to people who do not follow the mathematical arguments of Balch (2012).  Secondly, and perhaps more importantly to the reviewer, the simulations also allow users to explore numerically how conservative the resulting confidence intervals obtained from the c-box approach will be in practice.  This conservativism is difficult to study analytically.  Simulations that can reveal this are the most effective way to determine the usefulness of the approach.  We have emphasized this purpose of the simulations in the introduction.

A critical point with confidence statements is the choice of "at least alpha" or "exactly alpha" for the coverage probabilities: you choose "at least alpha", but then the "obvious" implication at the end of the second paragraph of the first column of page 2 is wrong.

We’ve added the word “exact” to the passage defining confidence distributions.  This is the very limitation of confidence distributions—the one that precludes there being one for the binomial parameter—that motivates c-boxes.

You briefly comment on the non-uniqueness issue of confidence statements, but sentences like "the c-box of the normal mean", or "Users of the c-box approach do not need to choose such a value" give the (wrong) impression of uniqueness.

We understand the reviewer’s comment to be that using the definite article “the” before the word “c-box” misleadingly implies uniqueness.  We have changed the first instance of “the” to “this”, but have retained the second instance of “the” because it modifies the phrase “c-box approach”.  There is only one c-box approach described in the paper.

Minor Comments:

.page 1, column 2, lines 4-6: so-called objective Bayesian methods often "ensure statistical performance over the long run" (e.g. in Section 4 your "result is the same as the 95% credible interval that would be obtained using Bayesian inference with a Jeffreys prior")

Yes, but so what?  Objective Bayesians, so far as we know, do not traffic in confidence.  Their posteriors are not defined or required to have any particular coverage properties, although they may have some incidentally.  We suspect there are important and deep connections among the various approaches.  Indeed, Efron and others have suggested that confidence distributions are objective Bayes posteriors (see, for example, http://www.fisheriesstockassessment.com/TikiWiki/tiki-index.php?page=Confidence%20Distributions%20as%20Objective%20Bayes%20Posteriors).  We are not qualified to judge whether this is a reasonable identification, nor are we inclined to do so given the scope of this paper.  In any case, it makes sense to us to keep the lines between Bayes and frequentist methods as clean as possible for pedagogical reasons.  We think the connections and opportunities for cross-fertilization will be apparent to readers from each side.

.page 1, column 2, end of paragraph 1: why is it "not clear how knowledge of confidence intervals for parameters can be translated into a confidence interval for an arbitrary function of those parameters"? (e.g. in Section 4 you do such computations)

Because c-boxes are the first approach that can do so!  We have added the phrase “using traditional methods”.

.page 2, column 1, paragraph 1: "in principle can be checked empirically": how is it possible if parameters are not known?

They can be checked empirically by simulations, such as those described in this paper that you said were “nonsense”, in which hypothetical values for the parameters are used as the “true values” that parameterize distributions from which artificial random samples are drawn.  Such simulations are common in frequentist statistics.  We’ve appended the phrase “with Monte Carlo simulations”.

.page 2, column 1, paragraph 2: "space of possible parameter values" might be confusing, since it is a subset of the real line

Why might this confusing?  How should it be fixed?  And who said it was a subset of the real line?  The text is unchanged.

.page 3, column 2, paragraph 3: what are the assumptions about "interval-censored data"? (e.g. informative/uninformative)

There are no assumptions other than that an imperfectly observable value X is censored such that we know only [L, R] and that LXR.  This is the standard definition of interval censoring.  No text was changed.

.page 3, column 2, 3rd last line: what do you mean by "To be conservative"? ("conservative" has a clear meaning in connection with confidence intervals, but with this meaning it is not clear why one would want to be conservative)

The word “conservative” seems to have a strict and pejorative sense in this community that we did not intend.  We have omitted the phrase in which it appears.

.page 4, column 1, paragraph 1: what should "Statistical confidence intervals are not rigorous intervals" mean?

A rigorous interval (estimate) is one that is guaranteed to enclose the value it estimates; statistical intervals like confidence intervals are not so guaranteed.  No text was changed, as a common-English reading of the wording conveys this idea, and it would be understood by anyone familiar with interval analysis.

.page 4, column 1, paragraph 2: "the level could fall lower than 65%": how is it possible?

The Fréchet inequalities (http://en.wikipedia.org/wiki/Fr%C3%A9chet_inequalities)

max(0, P(A1) + ... + P(An) − (n − 1)) ≤ P(A1 & ... & An) ≤ min(P(A1),..., P(An))

give best-possible bounds on the probability of a conjunction.  If P(Ai) = 0.95 for each of seven events, the lower bound is 7´0.95 – (7–1) = 0.65.  Any higher presumption about this lower bound requires particular assumptions about the dependence among the different confidence intervals.  We altered the phrase "lower than" to "as low as".

.page 6, column 2, paragraph 2: "the only sensible inference" is a strong statement (most Bayesians would disagree)

We’ve added the adverb “arguably”.

.page 6, column 2, paragraph 2: from where comes the idea that the IBM posterior "is roughly what one might expect to see across a community of competent Bayesians"?

Well, we’re talking about the case where there is no particular prior information.  This case has been studied by many authors, and many solutions have been offered, including solutions associated with the names Haldane, Jeffreys, Zellner, Bayes-Laplace.  All of these solutions, which differ appreciably, are contained within the IBM solution.  A discussion of this diversity is out of the scope of the present paper, so no text was changed, but we have added Walley (1991) as a reference for the idea that IBM encloses disparate Bayesian solutions.

 

----------------------- REVIEW 3 ---------------------

TITLE: Computing with Confidence

AUTHORS: Scott Ferson, Michael Balch, Kari Sentz and Jack Siegrist

OVERALL EVALUATION: 3 (strong accept)

----------- REVIEW -----------

This is a great paper, on a very interesting subject, opening the door to a new well-founded frequentist approach to imprecise probability.

 

We thank the reviewer for this heartening comment.

My main gripe with the paper is that it is at times:

1. a bit sloppy and careless with notation (abuses standard statistical notation),

We have tried to be more conventional in our use of notation, although this is sometimes difficult simply because this notation was not designed for imprecise probability structures.

2. the usual distinctions one makes between random and non-random quantities in statistical analysis is not always clear, and

Part of the problem may well be that we are using distributions to characterize the uncertainty of fixed quantities, so such lack of clarity may be an occupational hazard.  We have adopted the convention of capitalizing random quantities and using lower case for fixed quantities.

3. there is insufficient clarity about what is being conditioned on, which is particularly important when comparing frequentist and Bayesian methods in the same paper; even in a frequentist setting, it makes sense to clarify the null hypothesis through conditional probability notation.

We do not understand what the reviewer is suggesting that we do.  See further responses to detailed comments below.

That said, these issues are easy to overcome in revision, and moreover the paper will certainly generate very good discussion at ISIPTA, so therefore I strongy recommend this paper for acceptance.

Details:

section 1 last sentence: "c-boxes characterizing variables" - by variables, do you mean parameters? If so, perhaps write "c-boxes on parameters"

Yes, we mean parameters and have amended the text.

section 2: A rather important element in comparing confidence intervals to credible intervals is missing: confidence intervals are random intervals conditional on some null hypothesis, whereas credible intervals condition directly on the data; an important consequence of this is that confidence intervals are prone to the so-called prosecutors fallacy (i.e. if there's a strong prior bias towards or against the null hypothesis, confidence intervals will lead you astray when you try to use them for decision analysis).

We find this comment confusing.  We have no null hypothesis that we are aware of.  Confidence intervals are closely related to, but different from, statistical significance testing.  This paper is only about the estimation problem, not the hypothesis testing problem.  The confidence intervals and the c-boxes are functions of the data, and they are conditioned on certain assumptions such as iid sampling, and known distribution shapes, and stopping rules, but these details are typically described in the surrounding text, and not in the notation in a frequentist analysis.  No text was changed.

page 3 first equation: What exactly does this equation mean? Clearly it is not the usual tilde known from statistics, so please use a different notation, and even more importantly, explain it with great care e.g. what is t? why do you multiply with n-1? (it looks like that, even that's probably not the intent...)

We have changed the text to carefully explain the meaning of the equation.

section 3 first paragraph: It is stated that a c-box is a random set, but I do not quite immediately see how that can be. I can see how a confidence interval at a fixed level of confidence is a random set (i.e. it is a set, and it is randomly generated as it is a function of the data), but that's quite different from a c-box (where, as far as I understand, the data is considered fixed, but the confidence level is varied; there is however no randomness in the confidence level).

We have rewritten this paragraph and have omitted the reference to random sets entirely which is likely to be confusing to all kinds of readers.

page 5, first (full) paragraph: a crucial detail here is that, in the treatment of the paper, p1 and p2 are sought to be functions of the data and probabilities are conditional on some hypothetical (but unknown) value of p. In contrast, Bayes explicitly conditions on the data, and asks about the probability of p as a latent variable. Please clarify that you are asking a very different question from what Bayes was asking: you are asking about coverage for a fixed value of p, whereas Bayes is asking about the probability of p as a latent random variable.

We have adopted the reviewer’s exposition of this difference almost verbatim, although we have placed the added language further down in the text.

Just below that paragraph: again many readers will be at best very confused, and at worst completely misunderstand, the way in which you use tilde. Really, please don't use existing notation to mean something completely different. Give it a star, or a subscript c (for confidence), anything but a plain tilde. Also, the way the tilde is used here, with an interval on the right hand side, is different from how it is used earlier (no interval... but I guess this was an exact confidence distribution so you did not need one earlier?).

We apologize to strained sensibilities, but we cannot justify using unnecessarily complex notation when a simpler choice aptly generalizes from prior usage in a natural way.  We’ve taken pains to explain the meaning of the tilde, and we think any initial confusion will be minor and short-lived, and will be outweighed by the benefit of using unified notation.  We really think that the tilde is really the right symbol here.  We’ve added text to make this clear.

The “interval” on the right-hand side mentioned by the reviewer is actually a c-box consisting of a left and right edge.  We’ve added text to make this clear.

In any case, it would be good to have one formal precise mathematical definition of whatever notation you need for confidence functions, and more generally, c-boxes, and then to stick to it.

We have endeavored to be consistent throughout the paper.

Perhaps it would also be useful to adopt standard practice and denote random variables with capital letters, and fixed values with lower case letters (e.g. \bar{X} for random variable vs. \bar{x} for a specific value, e.g. after conditioning on specific data); and to use conditional probability notation to clarify what is being conditioned on: it makes things a lot easier to understand.

We have adopted the recommendation and denote random variables with capital letters and fixed values with lowercase letters.

We do not entirely understand what changes in notation the reviewer is suggesting with respect to conditional probabilities.  Frequentist calculation of confidence intervals are not ordinarily expressed as explicitly conditional probabilities.  Using such notation might inadvertently suggest a Bayesian interpretation, which would of course be profoundly misleading.  No text or equations have been changed on this issue.

Section 5.1: Walley introduced the imprecise beta model already in his 1991 book, it seems reasonable to add it as a reference.

The reference has been added.