COLLABORATION
Please review and make contributions to this paper. For now, we can edit the paper here, as a Google site. To make a change or add text, just click on the pen icon above (on the right of the screen). When you're done, be sure to save your edits by clicking the blue Save button. As the paper matures we may elect to migrate to a Google Doc, but there will always be access to it from this website. To see previous versions and compare revisions, select Revision History from the More menu (on the right above). Details that must eventually be addressed are indicated in double angle brackets <<like this>>. You can use color to highlight passages or questions for our joint review. More enduring or 'meta' commentary can be appended to this page as a comment at the bottom of the page.
Factoring out bias and overconfidence: advanced bias correction
Jack Siegrist1, Scott Ferson1, Adam Finkel2 (order tentative)
1Applied Biomathematics, 100 North Country Road, Setauket, New York 11733 USA
2University of Pennsylvania Law School, 3501 Sansom Street, Philadelphia, Pennsylvania 19104 USA
Numerical estimates produced by experts and lay people alike are commonly biased as a result of self-interest on the part of the persons making the estimates. There is also empirical evidence that expressions of uncertainty are much smaller than justified. Simple scaling, shifting or inflating corrections are widely used to account for such biases and overconfidence, but better distributional information is usually available, and fully using this information can yield corrected estimates that properly express uncertainty. Corrections can be made in two distinct ways. First, predictions can be convolved with an empirical distribution or p-box of observed errors (from data quality or validation studies) to add uncertainty about predictions associated with model error. Second, predictions can be deconvolved to remove some of the uncertainty about predictions associated with the measurement protocol. In both of these cases, the structure of errors can be characterized as a distribution or p-box with arbitrary complexity. We illustrate the requisite calculations to make these corrections with numerical examples. We conclude (1) the notion of 'bias' should be understood more generally in risk analysis to reflect both location and uncertainty width, (2) self-interest bias and understatement of uncertainty are common, large in magnitude, and should not be neglected, (3) convolution can be used to inflate uncertainty to counteract human psychology, and (4) deconvolution can be used to remove some of the uncertainty associated with measurement errors.
For the abstract, I think these are the main issues: <<Scott's suggestions from Jack's document>>
1) Bias needs to be conceived more generally in risk analysis
2) Empirical approaches to recognizing uncertainty about point estimates
Self-interest bias
More general validation studies
3) Empirical approach to inflating uncertainty to counteract human psychology
4) Removing inflated uncertainty from non-negligible measurement errors
Keywords
bias; self interest; jacketing; deconvolution; measurement error; validation study; difference between forecast and realization
Bias is traditionally conceived as a systematic error in an estimate that can be characterized as a simple, scalar signed value that can removed from the estimate by subtraction
there are several problems
misestimation
self-interested bias
overconfidence
unacknowledged imprecision
nakedness (either extreme overconfidence, or lack of appreciation of uncertainty)
Bias can be distributional, or p-boxy
We shall use the phrase "naked estimate" to refer to an estimate of some quantity that has been offered as a scalar value without a quantitative characterization of the uncertainty. <<If the clothing metaphor is too much, we could use Adam's phrase "arrogant estimate".>>
There are several strategies that can be used to characterize the unstated uncertainty about a naked estimate, including
1) significant digit conventions,
2) interpretation of modifying hedge words, and
3) validation study of previous estimates.
The third way involves assuming that the uncertainty that should ascribed to the current naked estimate can be estimated by a validation study in which an observed distribution of errors (differences) is computed from comparisons of historical forecasts of quantities against their eventually realized values.
Ways to account for and remove from estimate distributions the uncertainty due to a confounding measurement process
<<See Adam's responses to Scott's notes from the August 2012 meeting at Adam say.>>
Cost-benefit analysis using only point estimates is worse than useless. The estimates that a cost-benefit analysis depends on have been documented to be biased most of the time, in a number of different fields. This consistent bias entails that cost-benefit analyses based on point estimates are guaranteed to result in average net losses over time, having exactly the opposite effect of their intended purpose!
Some of this bias in estimates is just a problem of negligence concerning model assumptions, which would be relatively simple to fix with more correct assumptions, or more honest contractors. For example, ignoring the fact that costs of materials increase significantly over the time that is required to complete a large public sector project has been found to explain 20 - 25% of bias in estimated costs of these projects (Morris 1990). Most of the bias, however, tends to be related to much more complicated psychological and sociological phenomena, such as self-interest bias (or a lack of incentives for accurate estimates, e.g. Pickrell 1992), undue optimism and risk aversion (Kahneman and Lovallo 1993, Lovallo and Kahneman 2003), poor management, poor communication, and bureaucratic fecklessness (Morris 1990), and many other reasons (e.g. Cantarelli et al. 2010).
Although simple scaling, shifting or inflating corrections are widely used to account for such biases and generic overconfidence, much better distributional information is usually available to the analyst, and fully using this information can yield corrected estimates that properly express uncertainty and make them more suitable for use in risk analysis and decision making. These advanced corrections express biases as distributions or bounds on sets of possible distributions (probability boxes or p-boxes) rather than simple scalar values.
Corrections can be made in two distinct ways that will be useful in different analytical settings. In the first way, an empirical distribution or p-box of errors (established in a prior data-quality study or an ancillary validation study) is convolved with each observed value. This calculation acknowledges uncertainties associated with the estimation process and puts them into the estimated value. The second way involves a deconvolution that extracts, insofar as is possible, the blurring of the data by measurement error associated with the measurement protocol. The result is often a reduction in the variance of a distributional estimate because the deconvolution removes the confounding uncertainty that contaminated the measurement process. In both of these cases, the structure of errors can be characterized as a distribution or p-box with arbitrary complexity. For instance, the errors may be zero-centered or directional, symmetric or asymmetric, balanced or skewed, and precisely or imprecisely specified. We illustrate the requisite calculations to make these corrections with numerical examples.
Traditionally, quantitative bias or error is characterized by a simple scalar magnitude, that is, a single number, and the correction of bias consists of untangling a model of how this error and the underlying true value collided in the observed measurement. The model is usually additive, so the correction is a subtraction, or multiplicative, so the correction is a division, although a more general error model is sometimes needed. In the context of risk analysis, however, this traditional concept of bias correction is insufficient because we are often estimating distributions, rather than merely scalar values. Consequently, the notion of bias can be considerably more complicated. For instance, in risk analysis bias might no longer be simply a leftward or rightward displacement of the value, but could also be an under- or overestimate of the variance of a distribution of values. For the present discussion, we consider bias to denote any sort of error or deficiency in an estimate that needs correcting, including both directional shifts in location as well as inflation or underestimation of dispersion. This view also generalizes the statistical conception of bias, which is usually understood to be a systematic distortion of a statistic as a result of imperfections in observation. Our notion includes both systematic and random components of the error. Advanced bias correction is, then, any operation that removes any such errors from an estimate.
Misestimation
Validation studies of cost estimates of environmental regulations have shown that costs are consistently overestimated, often grossly so. This indicates that economic models of environmental regulatory costs are poorly calibrated, so that these models are significantly biased. The resulting systematic error in cost estimates can and should be corrected by estimating the magnitude of the error using data from validation studies.
Misestimation bias has many possible causes. For example, total uncertainty almost always increases when a measuring device is used at the extremes of its range, but this commonly introduces bias in measurements as well. A thermometer might have a maximum reading of 100 degrees so that if the thermometer reads 100 degrees, the reading might be reported as 100 degrees even though all higher temperatures would also be reported as 100 degrees, which would result in a bias of temperature readings to be smaller than the actual temperatures. Similarly, some assay used to measure the concentration of a toxicant in the environment might have a lower threshold of detection of 1 part per billion. Any actual environmental concentrations lower than 1 part per billion might be reported as 1 part per billion, resulting in an upward bias, or reported as 0, resulting in a downward bias (other methods are also available that do not result in bias).
The context in which measurements are made can also lead to bias. The properties of any physical measuring instrument will change in response to physical changes in the environment. For example, the length of a metal ruler will become longer in warm temperatures and shorter in cooler temperatures. A ruler that is manufactured in a factory at 30 degrees will result in length readings that are consistently longer than the true length when used in cooler weather, and consistently shorter than the true length when used in warmer weather.
Improper application of statistical methods also commonly leads to biased estimates. It has been common in the past to perform linear least squares regression on log transformed variables for power relationships (log-x and log-y) and for exponential relationships (linear x and log-y). This is valid if the error structure becomes normal with transformation, but the method has typically been used as a more convenient alternative to nonlinear regression. When the error structure is normal in the untransformed data then linear regression on the transformed variables results in biased parameter estimates when these estimates are back-transformed to get the power model or exponential model. This phenomenon, and the solution for correcting the bias (e.g. Beauchamp and Olson 1973, Sprugel 1983), is well known, but the problem has been common (e.g. Newman 1993).
Psychological factors can also cause bias. For example, a preference for particular digits can be found in data by studying the distribution of final digits in the reported data. Under certain conditions that are commonly met, the distribution of final digits would be uniform, but in data collected in many different fields there is commonly a non-uniform distribution of the final digits (e.g. Yule 1927, Preece 1981). This preference for particular digits can lead to bias depending on the range of the true values. For example, if the length of bolts in a box varies uniformly between 0.8 mm and 1.0 mm, then a preference for round number lengths of 1.0 mm would cause an upward bias in the reported lengths.
Another psychological factor, self-interest bias...
Given that analysts and decision makers start to account for the self-interested biases by partially discounting statements, the interests on both sides may create an escalating war of discounting and further exaggeration. A similar situation arises in traffic safety risk analysis. Green lights at traffic intersections are sometimes delayed by traffic engineers because some drivers do not respect yellow signals to clear the intersection. However, once aggressive drivers notice that their red lights are not associated with moving cross traffic, they tend to expand their transgressions and proceed through recently changed red lights. Indeed, the longer the time during which signals in both directions are red, the more likely it is that some driver will violate a red signal. So how can such escalation be prevented? Validation is the answer.
Overconfidence
Although estimates are generally biased in either direction, depending on the self-interests or other proclivities of the parties involved, quantitative characterizations of uncertainty about estimates are generally too small in magnitude. This tends to be true for both lay persons and experts alike, and is even true of the experts who study the overconfidence of expressions of uncertainty.
Characterizations of measurement uncertainty can be significantly underestimated, even for seemingly straightforward tasks such as the measurement of physical constants. For example, Shewart (1939) documented overconfidence in the uncertainty statements for estimates of the speed of light, which is apparent throughout the history of this measurement. The figure below illustrates the uncertainty about historical measurements of the speed of light in a vacuum, as expressed by the measurement teams themselves. The precise meanings of the specified intervals were not always clear in the published reports, but, giving them the benefit of the doubt, we might take them to represent plus or minus one standard deviation (rather than 95% confidence intervals as is the standard format today). The nonlinear square-root scale of the velocity axis is centered on the current definition of the speed of light so that the individual ranges can more easily be distinguished as they progressively tighten through time. This scale obscures the tightening over the years as professed by the different measurement teams.
These self-reported uncertainties decreased over time for individual teams, suggesting that they were becoming more confident in their measurements, despite the fact that the measurements do not actually become any more accurate over time. Henrion and Fischhoff (1986) noted that about 70% of well calibrated measurements would be expected to enclose the true value, but fewer than half (13 out of 27) of these ranges include the currently accepted* value of the speed of light. Youden (1972; Stigler 1996) similarly described systematic underestimation of uncertainty in the measurements of both the speed of light and the astronomical unit (mean distance from the earth to the sun). Henrion and Fischhoff (1986) also examined the “performance” of physicists in assessing uncertainty due to possible systematic errors in measurements of physical quantities by comparing historical measurements against the currently accepted values (i.e., values recommended by standards agencies) for a variety of fundamental physical constants. They observed consistent underestimation of uncertainties. Morgan and Henrion (1990, page 59) asserted that the overconfidence “has been found to be almost universal in all measurements of physical quantities that have been looked at.” This overconfidence is apparently pervasive in science generally and may be related to human abilities systemically. It is well known in the psychometric literature that lay people and even experts are routinely and strongly overconfident about their estimates (Plous 1993). When experts are asked to provide 90% confidence intervals for their judgments (which ought to enclose the true value 90% of the time on average), their ranges will enclose the truth typically only about 30 to 50% of the time.
[SPEED OF LIGHT FIGURE HERE]
Figure. Measurement uncertainty for the speed of light at different years. Note the nonlinear (square root) scale of the ordinate. Redrawn from Cutler (2001a).
Shlyakhter (1993; 1994) suggested that this propensity for overconfidence is so pervasive in science that we might introduce an automatic inflation factor to all uncertainty statements to account for it. Errors are expected to be zero-centered and normally distributed, which would imply a Student multiplier of 1.96 to reach 95% coverage. But the evidence suggests that errors are in fact <<>> distributed so that the Student multiplier needs to be 3.8 to achieve this 95% coverage. To account for this understatement of uncertainty Shlyakhter suggests that 95% confidence intervals should be wider by a factor of 3.8/1.96 ≈ 2 (without changing the mean values). Doubling the widths of confidence intervals should then give them the intended coverage performance.
Unaknowledged imprecision
Nakedness
Point estimates represent a total neglect of uncertainty and variability. In the discipline of risk analysis, this is surely a most egregious bias.
CliffsNotes starts its discussion of point estimates with the statements “The sample mean xbar is an unbiased estimate of the population mean μ. Another way to say this is that xbar is the best point estimate of the true value of μ.” (http://www.cliffsnotes.com/study_guide/Point-Estimates-and-Confidence-Intervals.topicArticleId-25951,articleId-25932.html). But these claims are not true without qualification. The implicit assumption behind these claims is that the measurements that created the sample are themselves unbiased. If the measurement error is tilted somehow, the sample mean won’t be the best estimate at all.
Inflating uncertainty: convolution
A jacket is defined as the bounds on all cumulative probability distributions that are d units away from the distributions of predictions, in which d is some measure of error.
One measure of error is the area between the predicted and observed distributions
Wasserstein metric
Example: Physician survival predictions
Optimism in physician prognoses is normal and common (Glare et al. 2003). In this example, we will illustrate bias correction using the data on prognostic errors from Christakis and Lamont (2000) and Alexander and Christakis (2008), who studied the accuracy of physician predictions about time of survival for terminally ill patients. Physicians overestimate time of survival by a factor of about 5.3 on average, with about 20% of predictions considered accurate (defined as being between 33% of the actual survival) and 63% of predictions being overestimates. Interestingly, although more experienced physicians tended to be more accurate, accuracy was lower for more specialized physicians and for physicians who had worked with the patients for longer time periods.
Figure. Observed days of survival cannot be predicted from physician predicted days of survival in any simple way.
Figure. The distribution of prognostic errors appears leptokurtotic, with a mean bias of 42.5203 days.
Figure. The normal quantile-quantile plot of the prognostic errors suggests significant leptokurtosis, and, consequently, deviation from normality.
The errors in these data sets illustrate the need for the methods presented here. One possible alternative method that may be more familiar would be to perform a regression (linear or otherwise) of observations on predictions, then using the best fitting model to produce prediction intervals for any further prediction values. The difficulties in this approach, however, are obvious from the complicated relationship between observations and predictions (first figure), and from the non-normality in the distribution of the errors (illustrated in the second and third figures), which is a significant deviation from normality according to the Shapiro-Wilk test of normality (p < 2.2 x 10^16, Shapiro and Wilk 1965).
It would be difficult (if possible) to specify an adequate model of prediction from these errors, but by using the empirical cumulative probability distribution function of the errors in prediction, as illustrated in the next figure, the point estimate predictions can have the associated uncertainty represented by this distribution of errors incorporated into the previously inadequate predictions. <<Should we describe the construction of an ECDF.?>>
Figure. The empirical cumulative probability distribution of the prognostic errors is the foundation for the bias correction methods described.
On its own, however, the empirical CDF is still inadequate, because this distribution represents only a sample of the population of all possible errors. One method for adding sampling uncertainty to the empirical CDF is by placing Kolmogorov-Smirnov bounds around the distribution (see next figure). Kolmogorov-Smirnov (KS) confidence limits (Kolmogorov 1941; Smirnov 1939; Feller 1948; Miller 1956) are distribution-free bounds around an empirical cumulative distribution function. These bounds are analogous to simple confidence intervals around a single number, but instead represent sampling uncertainty concerning an entire statistical distribution. These bounds represent a pair of CDFs that together promise to enclose the true distribution of errors with some specified level of confidence (here 95%). These confidence limits converge to the empirical distribution function as the sample size increases, but the convergence is rather slow. Probability bounds analysis and Dempster-Shafer theory (Ferson et al. 2003) provide methods for propagating KS confidence limits through calculations.
Figure. Kolmogorov-Smirnov 95% confidence bounds on the distribution of physician prognosis errors.
Construction of the KS confidence limits does not require any assumption about the distribution of the population, requiring only that the samples are independent and identically distributed. The bounds are computed with the expression min(1, max(0, DF(x) ± D(a, n))), where DF denotes the best estimate of the distribution function, and D(a, n) is the one sample Kolmogorov-Smirnov critical statistic for intrinsic hypotheses for confidence level 100(1 - a)% and sample size n. The proof that there is such a number D that can be used to create confidence limits for entire distributions was given originally by Kolmogorov (1941), and the values for D(a, n) were tabled by Miller (1956).
The KS bounds give theoretically infinite tails for the distribution, but it would also be valid to truncate this distribution based on practical biological considerations. For example, Weon and Je (2009) have provided an estimate of a theoretical maximum lifespan of 126 based on the statistical properties of life expectancy data. This seems reasonable given that the longest documented human life was Jeanne Calment of France, who lived 122 years (Robine and Allard 1999). The distribution could then be truncated by considering that the absolute value of the error in prognosis should, under no circumstance, exceed the theoretical maximum number of days in a human life.
These KS-bounds on the distribution of errors can be used to characterize the uncertainty inherent, but ignored, in a naked point estimate. Imagine that a new terminally ill patient not represented int the sample has been predicted to survive 365 days. Error correction for this point estimated is completed by subtracting every point in each of the two bounding CDFs from this prediction of 365 days, resulting in the pair of CDFs illustrated in the next figure. This structure provides 95% confidence intervals around the possible cumulative probability distributions describing the uncertainty about survival for this new patient.
Figure. Bias correction for a new patient predicted to survive one year.
There is evidence that survival predictions may hasten death through various mechanisms (Benor 2003). If this is true, then bias correction should not be used in patient communication because it may simply decrease survival further! Obviously, however, the corrections made could be useful for treatment planning, suggestions for entering hospice care, and other decisions for which the patient is not required to be aware of the prognosis of the physician in a quantitative way.
Example: Cost overruns for transportation infrastructure projects
Costs of transportation projects are well known to be frequently and largely underestimated (Cantarelli et al. 2010). On the other hand, the estimated benefits of transportation projects, which largely consist of estimated traffic, are consistently overestimated (Wachs 1987, 1989). If the costs of potential transportation projects are generally underestimated and the benefits are generally overestimated, then the use of cost-benefit analyses for the purpose of choosing projects to fund will result in mostly choosing projects that result in net losses. In this case, it would actually benefit society to choose projects at random, with the further benefit of saving the money wasted on estimating costs and benefits and performing cost-benefit analyses!
In this example we use published data on construction cost estimates for large transportation infrastructure projects in 20 nations (Flyvbjerg et al. 2003) and traffic demand estimates for rail projects (Flyvbjerg et al. 2005) to construct a distribution of errors that could be used to correct a cost-benefit analysis for a new proposed rail project. The first figure shows that there does not appear to be much of any trend over time toward more accurate cost estimates, and the second figure shows that as a whole cost estimates are skewed towards very large overruns. The third figure illustrates 95% Kolmogorov-Smirnov bounds on the possible distributions of errors given the observed percent errors. These bounds show that (among other things) with 95% confidence up to 20% of projects can be expected to cost at least twice as much as initial estimates (cost overrun greater than 100%), and with 95% confidence at least 75% possibly up to 90% of cost estimates will be underestimates (cost overrun greater than 0%).
Figure. Cost overruns as percent of initial cost estimate for land-based transportation infrastructure projects (data from Flyvbjerg et al. 2003).
Figure. Histogram of transportation infrastructure cost overruns shows that the errors are skewed towards large positive overruns, or gross underestimates.
Figure. Kolmogorov-Smirnov bounds on the distribution of transportation infrastructure cost overruns.
The next three figures describe the benefits data. The first figure shows again that there does not appear to be much of any trend over time toward more accurate rail traffic demand estimates, and the second figure shows that as a whole benefit estimates tend to have negative errors, or overestimates of demand. The third figure illustrates 95% Kolmogorov-Smirnov bounds on the possible distributions of errors given the observed percent errors. This width of these bounds for the benefits is noticeably larger than for the costs because the there is an order of magnitude larger sample size for the costs.
Figure. Errors for demand estimates as percent of initial demand estimate for rail projects (data from Flyvbjerg et al. 2005).
Figure. Histogram of rail project demand prediction errors shows that the errors are skewed towards significant overestimates.
Figure. Kolmogorov-Smirnov bounds on the distribution of rail project demand prediction errors.
Now consider a hypothetical new rail project that is estimated to cost 100 million dollars and based on traffic demand forecasts benefit society by 125 million dollars. A traditional cost-benefits analysis would take these estimates at face value and conclude that investment in this project should benefit society by 25 million dollars, suggesting that the project should be executed. Advanced bias correction, however, leads to a different conclusion. The above error data are given in terms of percent errors. The first step is to convert the bounds on the distributions of percent errors to bounds on distributions of dollar errors, by convolving each estimate (100 million for costs and 125 million for benefits) with its respective distribution of percent errors, producing the following two figures. Next, the p-box for the costs is "subtracted" using convolution from the p-box for the benefits, resulting in the 95% confidence bounds on the distribution of net benefit for the new rail project, which is illustrated in the third figure. The expected value of the project after bias correction is between -20.6 and -18.8 million dollars with 95% confidence. This is surprisingly different from the net value of +25 million dollars using uncorrected point estimates.
Figure. Kolmogorov-Smirnov bounds on the distribution of costs for a new rail project estimated to cost 100 million dollars.
Figure. Kolmogorov-Smirnov bounds on the distribution of predicted benefits of a new rail project estimated to bring in 125 million dollars.
Figure. Kolmogorov-Smirnov bounds on the distribution of benefits - costs of a new rail project.
Deflating uncertainty: deconvolution
The previous sections discussed the problem of overconfidence. This section discusses the opposite problem of underconfidence which is the overstatement of uncertainty that can result from the presence of non-negligible measurement errors.
Most protocols for estimating statistical distributions assume that the measurement uncertainty associated with individual value observations is negligible compared to its sampling uncertainty arising from limitations on the sample size. In cases where the measurement uncertainty should not be neglected, deconvolution can be used to correct the observed distribution for the measurement uncertainty associated with the data used to form it. Like ordinary bias correction, such a deconvolution uses information about the sign and magnitude of the error to produce the improved estimate, but it also makes use of information about the distributional shape and dependence of the measurement error to improve the estimate of the variable.
For instance, suppose X is a random variable that cannot be measured exactly because of imperfections in the measurement process. A common model for observation with measurement error is
Y = X + ε
where the actual value X of the measurand is masked by the presence of unavoidable measurement error ε that blurs it into the observed value Y. In risk analysis, these quantities are distributions rather than point values. In principle, we can improve the estimate of the distribution for X by untangling this error model. This untangling cannot generally be accomplished by solving the equation as X = Y − ε. Instead, we must solve for X using deconvolution. The reason that this special operation is required is because the quantities are distributions rather than real-valued, so the observed distribution Y is the result of the convolution of the distributions X and ε. As a result, Y and ε are not independent, but rather have a dependency that owes to Y’s being a function of ε.
Suppose that measurement error ε is a random variable independent of X but that sample information is available that allows us to estimate the distribution of Y, and that an ancillary data quality study allows us to independently estimate the distribution of ε. Then we can infer some things about the underlying distribution of X.
In particular, its mean can be inferred from the fact that the mean of Y is the sum of the means of X and ε. Because ε and X are independent, we can also infer its variance from the fact that the variance of Y is likewise the sum of the variances of X and ε. If we further assume normality, then the mean and variance are enough to completely specify the distribution of X with the deconvolution formula
X ~ N(E(Y) − E(ε), sqrt(V(Y) − V(ε)))
where E and V denote the mean and variance respectively, and the N function specifies a normal distribution with a given mean and standard deviation.
Suppose the observed values Y have a normal distribution with mean 20 and standard deviation 7, and the measurement errors ε also have a normal distribution with mean of 0 but a standard deviation of 5. These distributions are depicted in cumulative form on the left graph of Figure <<dec1>>. Because the errors are zero-centered, the estimate Y would traditionally be called ‘unbiased’, but it can nevertheless be improved to
X ~ N(20 − 0, sqrt(72 − 52)
which is about N(20, 4.9).
This result of the deconvolution operation is depicted in the right graph of Figure 2. Also shown in the right graph is the original Y distribution for comparison. It is clear that the deconvolution has achieved a substantial improvement in the estimation over the originally observed distribution. For instance, the tail risk of being smaller than 12 decreases from almost 13% for the Y values to about 5% for the computed X values. The larger the dispersion of the measurement error, the more substantial is the possible improvement.
When the measurement error distribution is combined with this distribution for X via ordinary Monte Carlo simulation according to the error model Y = X + ε, the result is identical to the original distribution of observations Y as expected. Thus, given the measurement errors, the deconvolution result yields the underlying distribution of X that must exist for the distribution Y to have been observed.
Figure 2. On left, the cumulative distributions for observed values Y ~ N(20, 7) and measurement errors ε ~ N(0, 5) (shown in gray). On right, the resulting deconvolution X such that X + ε = Y, superimposed over the original Y distribution (shown in light gray).
To make this figure, use R with the following script:
shown <- function(m, s, xx, yy, name='', col='black') {
x = 2.85*s*(-100):100 / 100 + m
p = pnorm(x, m, s)
lines(x,p,col=col)
text(xx,yy,name,vfont=c('serif','italic'),col=col)
}
plot(c(-19,39),c(0,1),xlab='',ylab = 'Cumulative probability',col='white')
shown(20, 7, 20, 0.62, 'Y')
shown(0, 5, 0, 0.62, expression(epsilon), 'gray50')
plot(c(1,39),c(0,1),xlab='',ylab = 'Cumulative probability',col='white'); shown(20, 7, 8, 0.1, 'Y', 'gray75'); shown(20, sqrt(7^2-5^2), 16, 0.1, 'X')
pnorm(12, 20,7) # 0.1265490
pnorm(12, 20, sqrt(7^2-5^2)) # 0.05123522
The variables X and ε may not be independent. The observability of an X-value may depend on its magnitude somehow. One common situation is that measurement error increases with the size of X. In such cases, an assumption of perfect dependence (Hoeffding 1940; Whitt 1976; Ferson et al. 2004) may be preferable to one of independence. If X and ε are perfectly dependent (i.e., comonotonic, or maximally dependent), then the standard deviation of Y is the sum of the standard deviations of X and ε. This means the deconvolution formula for the normal case changes to
X ~ N(E(Y) − E(ε), sqrt(V(Y)) − sqrt(V(ε)))
The numerical example introduced above would then be X ~ N(20 − 0, 7 − 5) ~ N(20, 2), which produces a massively tighter improvement for the distribution of the underlying variable. Figure 3 depicts the result of this deconvolution, superimposed over the original observed distribution (shown in light gray). The tail risk of being less than 12 falls almost four orders of magnitude, from almost 13% to about 0.003%.
Figure 3. Cumulative distributions for observed values Y and the deconvolution X such that X + ε = Y, given that X and ε are perfectly dependent.
To make this figure, use R with the following script:
shown <- function(m, s, xx, yy, name='', col='black') {
x = 2.85*s*(-100):100 / 100 + m
p = pnorm(x, m, s)
lines(x,p,col=col)
text(xx,yy,name,vfont=c('serif','italic'),col=col)
}
plot(c(1,39),c(0,1),xlab='',ylab = 'Cumulative probability',col='white');
shown(20, 7, 8, 0.1, 'Y', 'gray75');
shown(20, 7-5, 16, 0.1, 'X')
pnorm(12, 20,7) # 0.1265490
pnorm(12, 20, 2) # 3.167124e-05 = 0.00003
When the variables X and ε are neither independent nor perfectly dependent, there are many possible dependencies they might have. Under a normality assumption (which includes the dependencies as well as the marginal distribution shapes), the dependence between the two variables could in principle be any correlation between −1 and +1. In general, the variance of a sum of two random variables is the sum of their respective variances plus twice their covariance, from which we can deduce the variance of X and derive the deconvolution as
<<>>
where r denotes the Pearson correlation between X and ε. This formula includes the previous cases. If we set r to zero, it degenerates to the first deconvolution formula given above for the independence case. And if we set r to one, it becomes the formula for the perfect dependence case. It is interesting to note that, if r is more negative than about −0.3, then the variance of X is actually larger than that of Y, so the deconvolution becomes a way to acknowledge greater uncertainty in the estimate than originally perceived.
When the correlation between X and ε is unknown, then the deconvolution yields an imprecise result. It is always reasonable to the correct the distribution of Y by subtracting the mean of ε, but the best that can be said about the standard deviation of the underlying X distribution is that it is somewhere in the interval
<<>>. If we continue to assume that the dependence comes from the normal family of dependence families or ‘copulas’ (Clemen and Reilly 1999), then the deconvolution yields p-box of normal distributions that envelope all of the possible results of the displayed formula above with r taking on any value in [−1, +1].
Of course, there are many other possible patterns of dependency between X and ε beyond these normal dependencies, including nonlinear and complex relationships that may emerge from differences in the observability of X-values at different magnitudes. Using an assumption of independence, perfect dependence, or any of the correlated dependencies above represents an assertion that such complexities do not exist in the measurement protocol. The appropriate estimate for X when its dependence function with ε is unknown is a p-box that represents the subtraction Y−ε under any possible dependence, which is an even wider p-box. This fact implies that, given some non-negligible measurement error ε, not being able to specify the dependence between it and the true underlying distribution makes our estimate of X weaker than Y suggests.
<<arbitrary distribution shapes>>
<<see Scott’s early deconvolution manuscripts>>
Combed p-boxes (only normal distributions inside)
Distributional p-box
<<this section may not be necessary>>
<<suppose Y = X ´ ε>>
moment formulas are available for the independent case
The multiplicative convolution of normals is no longer normal.
Lognormals might work.
There a
Is there a Bayesian approach to excising inflation of uncertainty due to measurement error?
Making use of deconvolution to improve an estimate depends on identifying the error model and assumptions regarding the shape of the distribution of errors and the dependence between errors and the underlying variable.
It thus requires an ancillary study of the measurement error distribution itself.
It is not very useful without a precise assumption about the dependence between the underlying variable and its measurement error.
Several studies have shown that numerical estimates produced by experts and lay people alike are commonly biased as a result of self-interest on the part of the persons making the estimates. For example, bids made by contractors under cost-plus-fee contracts regularly underestimate the actual costs of a project<<ref>>. Likewise, economic estimates of compliance costs of industrial regulation commonly overestimate the eventual true costs<<ref>>. Industrial safety reports routinely understate the failures<<ref>>. Some hospitals do not enjoy full reporting of morbundity statistics<<ref>>. The effect of self-interest is generally consistent in direction and, although not always consistent in magnitude, it is often large enough that post hoc numerical corrections are warranted and important. It has also been empirically well established that, when numerical estimates include expressions of uncertainty, they are also usually overconfident, that is, the uncertainties are smaller than they ought to be. Although simple scaling, shifting or inflating corrections are widely used to account for such biases and generic overconfidence, much better distributional information is usually available to the analyst, and fully using this information can yield corrected estimates that properly express uncertainty and make them more suitable for use in risk analysis and decision making. To account for the other information, these advanced corrections express biases as distributions or probability boxes rather than simple scalar values. Corrections can be made in two distinct ways that will be useful in different analytical settings. In the first way, an empirical distribution or p-box of errors (established in a prior data-quality study or an ancillary validation study) is convolved with each observed value. This calculation acknowledges uncertainties associated with the estimation process and puts them into the estimated value. The second way involves a deconvolution that extracts, insofar as is possible, the blurring of the data by measurement error associated with the measurement protocol. The result is often a reduction in the variance of a distributional estimate because the deconvolution removes the confounding uncertainty that contaminated the measurement process. In both of these cases, the structure of errors can be characterized as a distribution or p-box with arbitrary complexity. For instance, the errors may be zero-centered or directional, symmetric or asymmetric, balanced or skewed, and precisely or imprecisely specified. We illustrate the requisite calculations to make these corrections with numerical examples.
The authors gratefully acknowledge helpful discussions with Vladik Kreinovich of University of Texas at El Paso, Dale Hattis of Clark University, Andrea Wiencierz of Ludwig-Maximilians-Universität München, and Lev Ginzburg, Janos Hajagos and Resit Akçakaya of Stony Brook University. We thank Nicholas A. Christakis of Harvard Medical School and Elizabeth Bernier Lamont of Massachusetts General Hospital and Harvard Medical School for providing access to their data on physician prognostic errors. This work was undertaken with support from the National Science Foundation (grant #<<insert number here>> to Adam Finkel) and the National Institutes of Health, National Library of Medicine (grants RC3LM010794 to Applied Biomathematics). The views and opinions expressed herein are solely those of the authors and do not necessarily represent the views of The Pennsylvania University, Applied Biomathematics, the National Science Foundation, the National Library of Medicine, the National Institutes of Health, or other sponsors or affiliates.
Brian Borchers of New Mexico Tech
Alexander, M. and Christakis, N.A. 2008. Bias and Asymmetric Loss in Expert Forecasts: A Study of Physician Prognostic Behavior with Respect to Patient Survival. Journal of Health Economics 27:1095–1108.
Beauchamp, J. J. and Olson, J.J. 1973. Corrections for bias in regression estimates after logarithmic transformation. Ecology 54:1403–1407.
Benor, D.J. 2003. Survival predictions may hasten death. BMJ 327:1048–1049.
Bier, V. <<>>
Cantarelli, C.C., Flyvbjerg, B., Molin, E.J.E., Wee, B. 2010. Cost overruns in large-scale transportation infrastructure projects: explanations and their theoretical embeddedness. European Journal of Transportation and Infrastructure Research 10: 5-18.
Clemen, R. and T. Reilly. 1999. Correlations and copulas for decision and risk analysis. Management Science 45: 208-224.
Christakis, N.A., Lamont, A.B. 2000. Extent and determinants of error in physicians' prognoses in terminally ill patients. Western Journal of Medicine 172:310–313.
Cutler, A.N. 2001a. “A history of the speed of light”. http://www.sigmaengineering.co.uk/light/lightindex.shtml
Cutler, A.N. 2001b. “Data from Michelson, Pease and Pearson” (1935). http://www.sigmaengineering.co.uk/light/series.htm
Feller, W. 1948. On the Kolmogorov-Smirnov limit theorems for empirical distributions. Annals of Mathematical Statistics 19: 177–189.
Ferson, S. 1995. Using approximate deconvolution to estimate cleanup targets in probabilistic risk analyses, pages 239–248 in Hydrocarbon Contaminated Soils, P. Kostecki (ed). Amherst Scientific Press, Amherst, Massachusetts.
Ferson, S. 1996. Automated quality assurance checks on model structure in ecological risk assessments. Human and Environmental Risk Assessment 2:558-569.
Ferson, S. 2002. RAMAS Risk Calc 4.0 Software: Risk Assessment with Uncertain Numbers. Lewis Publishers, Boca Raton, Florida.
Ferson, S., and T.F. Long. 1998. Deconvolution can reduce uncertainty in risk analyses. Risk Assessment: Measurement and Logic, M. Newman and C. Strojan (eds.), Ann Arbor Press, Ann Arbor, Michigan.
Ferson, S., V. Kreinovich, L. Ginzburg, K. Sentz and D.S. Myers. 2003. Constructing probability boxes and Dempster-Shafer structures. Sandia National Laboratories, Technical Report SAND2002-4015, Albuquerque, New Mexico, 2002. Available at http://www.sandia.gov/epistemic/Reports/SAND2002-4015.pdf and http://www.ramas.com/unabridged.zip
Ferson, S., R.B. Nelsen, J. Hajagos, D.J. Berleant, J. Zhang, W.T. Tucker, L.R. Ginzburg and W.L. Oberkampf. 2004. Dependence in Probabilistic Modeling, Dempster-Shafer Theory, and Probability Bounds Analysis. SAND2004-3072, Sandia National Laboratories, Albuquerque, New Mexico. http://www.ramas.com/depend.zip
Flyvbjerg, B., Skamris Holm, M.K., and Buhl, S.L. 2003. How common and how large are cost overruns in transport infrastructure projects? Transport Reviews 23 71-88.
Flyvbjerg, B., Skamris Holm, M.K., and Buhl, S.L. 2005. How (in)accurate are demand forecasts in public works projects? Journal of the American Planning Association 71:131-146.
Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York.
Glare, P., Virik, K., Jones, M., Hudson, M., Eychmuller, S. Simes, J. Christakis, N. 2003. A systematic review of physicians' survival predictions in terminally ill cancer patients. BMJ 327:195–198.
Hoeffding, W. 1940. Masstabinvariante Korrelationstheorie. Schriften des Matematischen Instituts und des Instituts für Angewandte Mathematik der Universität Berlin 5 (Heft 3): 179-233 [translated as “Scale-invariant correlation theory” in Collected works of Wassily Hoeffding, N.I. Fisher and P.K. Sen (eds.), Springer-Verlag, New York, 1994].
Helsel, D.R. 1990. Less than obvious: Statistical treatment of data below the detection limit. Environmental Science and Technology 24: 1766-1774.
Helsel, D.R. 2005. Nondetects and Data Analysis: Statistics for Censored Environmental Data. Wiley, New York.
Henrion, M., and B. Fischhoff. 1986. Assessing uncertainty in physical constants. American Journal of Physics 54(9): 791-798.
Kahneman, D. and Lovallo, C. 1993. Timid choices and bold forecasts: A cognitive perspective on risk taking. Management Science 39: 17-31.
Killeen, P.R. 2005. Replicability, confidence, and priors. Psychological Science 16: 1009-1012.
Kolmogorov [Kolmogoroff], A. 1941. Confidence limits for an unknown distribution
function. Annals of Mathematical Statistics 12: 461–463.
Lovallo, D. and Kahneman, D. 2003. Delusions of success: How optimism undermines executives’ decision. Harvard Business Review 81: 56-63.
Michelson, A.A., F.G. Pease and F. Pearson. 1935. Measurement of the velocity of light in a partial vacuum. Astronomical Journal 82: 26. See Cutler (2001b).
Miller, L.H. 1956. Table of percentage points of Kolmogorov statistics. Journal of the American Statistical Association 51: 111–121.
Moore, R.E. 1966. Interval Analysis. Prentice Hall, Englewood Cliffs, New Jersey.
Moore, R.E. 1979. Methods and Applications of Interval Analysis. SIAM, Philadelphia.
Morgan, M.G., and M. Henrion. 1990. Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge University Press, Cambridge.
Morris, P. and Hough, G.H. 1987. The Anatomy of Major Projects: A Study of the Reality of Project Management. New York, John Wiley and Sons.
Newman, M.C. 1993. Regression analysis of log-transformed data: statistical bias and its correction. Environmental Toxicology and Chemistry 12:1129–1133.
Pickrell, D. 1992. A Desire named streetcar: fantasy and fact in rail transit planning. Journal of the American Planning Association 58: 158-176.
Plous, S. 1993. The Psychology of Judgment and Decision Making. McGraw-Hill, New York.
Rabinovich, S. 1993. Measurement Errors: Theory and Practice. American Institute of Physics, New York.
Robine, J.-M., Allard, M. 1999. Jeanne Calment: Validation of the Duration of Her Life, in Jeune, B., Vaupel, J.W. (eds): Validation of Exceptional Longevity. Odense University Press.
Shapiro, S.S., and Wilk, M.B. 1965. An analysis of variance test for normality (complete samples). Biometrika 52:591–611.
Shewart, W.A. 1939. Statistical Method from the Viewpoint of Quality Control. Graduate School, Department of Agriculture, Washington, DC.
Shlyakhter, A.I. 1993. Statistics of past errors as a source of safety factors for current models. Model Uncertainty: Its Characterization and Quantification, A. Mosleh, N. Siu, C. Smidts, and C. Lui (eds.). Center for Reliability Engineering, University of Maryland College Park, Maryland.
Shlyakhter, A.I. 1994. Uncertainty estimates in scientific models: lessons from trends in physical measurements, population and energy projections. Uncertainty Modelling and Analysis: Theory and Applications, B.M. Ayyub and M.M. Gupta (eds.), North-Holland-Elsevier Scientific Publishers. Chapter available at http://people.csail.mit.edu/ilya_shl/alex/94c_uncertainty_scientific_models_physical_measurements_projections.pdf.
Smirnov [Smirnoff], N. 1939. On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bulletin de l’Université de Moscou, Série internationale (Mathématiques) 2: (fasc. 2).
Soll JB, Klayman J. 2004. Overconfidence in interval estimates. Journal of Experimental Psychology Learning Memory and Cognition, 30:299–314.
Springer, M.D. 1979. The Algebra of Random Variables. Wiley, New York.
Sprugel, D.G. 1983. Correcting for bias in log-transformed allometric equations. Ecology 64:209–210.
Stigler, S.M. 1996. Statistics and the question of standards. Journal of Research of the National Institute of Standards and Technology 101: 779-789.
Stuart, A., and Ord, J.K. 1994. Kendall’s Advanced Theory of Statistics. Edward Arnold, London.
Wachs, M. 1987. Forecasts in urban transportation planning: uses, methods and dilemmas. Climatic Change 11: 61-80.
Wachs, M. 1989. When Planners Lie with Numbers. Journal of the American Planning Association 55: 476-479.
Weon, B. M. and Je, J. H. 2009. Theoretical estimation of maximum human lifespan. Biogerontology 10: 65–71.
Whitt, W. 1976. Bivariate distributions with given marginals. The Annals of Statistics 4: 1280-1289.
Youden, W.J. 1972. Enduring values. Technometrics 14:1-11.
Yule, G.U. 1927. On reading a scale. Journal of the Royal Statistical Society 90:570–587.
Corrections can take several forms, depending on the intended use of the information and the information that is available. The most rudimentary correction shifts a point estimate according to the estimated magnitude of some systematic error. Such an algebraic correction is the negative of the estimated systematic error. For example, if in an economic model the estimate of a parameter is found to be incorrect resulting in cost estimates that are always $5 million too high, then all cost estimates can simply be reduced by $5 million to compensate for the misspecification of the model. A correction factor can be used when error is multiplicative instead of additive. For example, if cost estimates are on average 150% of the realized cost, then cost estimates can be adjusted downward by multiplying the estimate by a factor of 2/3, so that estimates will be unbiased on average.
More advanced bias correction is also possible. These methods actually add uncertainty to estimates, providing a more realistic and information rich context for decision making. This requires more sophisticated mathematics but is easy to implement using available software.
NIST snippets:
2.1 In general, the result of a measurement is only an approximation or estimate of the value of the specific quantity subject to measurement, that is, the measurand, and thus the result is complete only when accompanied by a quantitative statement of its uncertainty.
NOTE - The difference between error and uncertainty should always be borne in mind. For example, the result of a measurement after correction (see subsection 5.2) can unknowably be very close to the unknown value of the measurand, and thus have negligible error, even though it may have a large uncertainty (see the Guide [2]).
The Hurricane Track Cone of Uncertainty
The National Hurricane Center of the National Oceanic and Atmospheric Administration in the United States Department of Commerce produces one of the most familiar examples of using validation studies to add uncertainty to deterministic model predictions. Each forecast of the position of the center of a tropical cyclone is compared with the later realized position and the distance between the two positions provides a measurement of the track forecast error.
Uncertainty is incorporated into future forecasts by surrounding each point estimate of the position of a tropical cyclone with a circle that covers two-thirds of past track forecast errors. Errors are only used if both the predicted and observed positions represent tropical or subtropical cyclones at the time of comparison. In the past a ten year window was used for track forecast errors, but recently this has been decreased to a five year window in order to reflect the recently improved accuracy of forecasts.
For a given hurricane season, there is a circle diameter for each forecast interval, and there are separate diameters for Atlantic cyclones and Pacific cyclones, which reflects the lower accuracy of model predictions of the more complicated weather patterns in the Atlantic (see Table). These circles that enclose two-thirds of historical errors do appear to have this coverage value for future predictions in a given year, at least for shorter distances into the future (Majumdar and Finocchio 2010).
Radii of NHC forecast cone circles for 2010, based on error statistics from 2005-2009:
http://www.nhc.noaa.gov/aboutcone.shtml September 2, 2010.
http://www.nhc.noaa.gov/verification/verify4.shtml? on September 2 2010.
Majumdar, S. J., Finocchio, P. M. 2010: On the Ability of Global Ensemble Prediction Systems to Predict Tropical Cyclone Track Probabilities. Weather andForecasting, 25, 659–680.
In some cases, the data are available to construct a distribution
The appropriate correction that would associate a distribution of some kind with a point estimate is not always obvious.
<<Point estimation should be contrasted with general Bayesian methods of estimation, where the goal is usually to compute (perhaps to an approximation) the posterior distributions of parameters and other quantities of interest. The contrast here is between estimating a single point (point estimation), versus estimating a weighted set of points (a probability density function). However, where appropriate, Bayesian methodology can include the calculation of point estimates, either as the expectation or median of the posterior distribution or as the mode of this distribution.>>
‘tare’ is the mass of a container deducted from gross mass to obtain net mass
Metaphors á gogo
A game of pin the uncertainty on the naked estimate.
These are imperial estimate that are pronounced in hybris by economists.
But an imperial estimate that it has no clothes; it is a fraud, of which we should be ashamed.
Vestments of uncertainty
Risk analysis is more than simply an elaboration of statistics and probability theory. The focus on sampling uncertainty to the exclusion of other possible uncertainties is inappropriate in risk analysis. In some situations, sampling uncertainty is not even the most important source of uncertainty. Such situations become more and more common as automated measurement devices, mechanized sampling, remote sensing technologies, and networks of sensors dramatically reduce the costs of sampling and increase sample sizes almost ad libitum. When sample sizes are very large, sampling uncertainty declines and other sources of uncertainty become relatively more important.
Neyman (1937) identified interval estimation ("estimation by interval") as distinct from point estimation ("estimation by unique estimate"). Neyman, J. (1937) "Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability" Philosophical Transactions of the Royal Society of London A, 236, 333–380.
<<I do not know whether it is technically legitimate, under the model Y ~ X + e, to infer the normality of X from the normality of Y and e and the independence of X and e. It seems like Flury or someone could come up with a counterexample. Asked on math.stat.sci: Assuming Y ~ X + e, where Y and e are normally distributed random numbers, and X and e are independent, is it necessarily true that X has a normal distribution whenever its variance is positive?>>
*The uncertainty about the speed of light officially became zero in 1983 when the SI unit meter was redefined in terms of c, the speed of light in a vacuum, rendering the speed of light to be a defined value at exactly 299,792,458 meters per second without empirical uncertainty.
Several studies have established that estimates produced by experts and lay people alike are commonly biased as a result of self-interest on the part of the persons making the estimates. For example, bids made by contractors under cost-plus-fee contracts regularly underestimate the actual costs of a project. Likewise, economic estimates of compliance costs of industrial regulation commonly overestimate the eventual true costs.
The self-interest
When the estimates include expressions of uncertainty, they are also usually overconfident, that is, the uncertainties are smaller than they ought to be.
Although simple scaling or shifting corrections are widely used, much better information is commonly available and fully using this information creates corrected estimates that properly express uncertainty and make them more suitable for use in risk analysis and decision making.
To account for the other information, these advanced corrections express biases as distributions or probability boxes rather than simple scalar values.
Corrections can be made in two distinct ways.
In the first way, an empirical distribution or p-box of errors (collected by an ancillary study) is convolved with each observed value.
In the second way,
The first way acknowledges uncertainties and puts them into the estimates. The second way removes confounding uncertainty that contaminates the measurement process.
In both cases, the structure of errors can be characterized distributionally with arbitrary complexity. For instance, they may be zero-centered or directional, symmetric or asymmetric, balanced or skewed, and precisely or imprecisely specified.
Traditionally, quantitative bias or error is characterized by a scalar magnitude, that is, a simple, single number, and the correction of bias consists of untangling a model of how this error and the underlying true value collided in the observed measurement. The model is usually additive, so the correction is a subtraction like a tare weight, or multiplicative, so the correction is a division such as the intensity of greenhouse gases in terms of CO2-equivalents, although sometimes a more general error model is needed. In the context of risk analysis, however, this traditional conception of bias correction is insufficient because we’re often estimating distributions, rather than merely scalar values. The notion of bias is, consequently, potentially considerably more complicated. For instance, it might no longer be simply a leftward or rightward displacement of the value, but could also be an under- or overestimate of the variance of the distribution of values. For the present discussion, we consider
This view also generalizes the statistical conception of bias which is usually understood to be a systematic distortion (as opposed to random) of a statistic as a result of sampling
<<Scott announces on 17 August: Adam says he should be able to locate several papers with comparisons between forecasts of regulatory costs and actualized regulatory costs. He thought he could find 12 maybe, or 6 meta analyses...there's an OTA review, a report by the consulting firm Putnam, Hayes & Bartlett, something by Winston Harrington at RFF entitled "On the accuracy of regulatory cost estimates. He offered the files now accessible at Manuscripts from Adam which happened to be on his memory stick, and the email with link below.
From: Branden Johnson <brandenjohnson59@gmail.com>
To: Adam Finkel <afinkel@law.upenn.edu>
In case you haven't seen it, the draft EPA report on whether ex post regulatory costs might be higher or lower than ex ante estimates, submitted to SAB for comment (http://yosemite.epa.gov/sab/sabproduct.nsf/fedrgstr_activites/3A2CA322F56386FA852577BD0068C654/$File/Retrospective+Cost+Study+3-30-12.pdf). Mostly hedging, but FYI (citation for later article? ideas for follow-on research?)>>
<< The following email from Adam gives excerpts from Baruch Fischoff that may be extremely important for suggesting why scientists do not fully express their authentic uncertainties.
I just sent off this invited commentary to Issues in Science and Tech (NAS magazine)-- comments welcome although I know I'll be asked to shorten, not lengthen it. But I thought Baruch's article has some great lines both for the piece Dale and I are going to write and for some of the Apppl Biomath writings-- see esp these excerpts (emph added):
***************
In other cases, experts are willing to share their uncertainty but see no need because it seems to go without saying: Why waste decisionmakers’ valuable time by stating the obvious? Such reticence reflects the normal human tendency to exaggerate how much of one’s knowledge goes without saying. Much of cognitive social psychology’s stock in trade comes from documenting variants of that tendency. For example, the “common knowledge effect” is the tendency to believe that others share one’s beliefs. It creates unpleasant surprises when others fail to read between the lines of one’s unwittingly incomplete messages. The “false consensus effect” is the tendency to believe that others share one’s attitudes. It creates unpleasant surprises when those others make different choices, even when they see the same facts, because they have different objectives.
*******************
Even when their organization provides proper incentives, some experts might fear their colleagues’ censure should their full disclosure of uncertainty reveal their field’s “trade secrets.” Table 2 shows some boundary conditions on results from the experiments that underpin much decisionmaking research, cast in terms of how features of those studies affect the quality of the performance that they reveal. Knowing them is essential to applying that science in the right places and with the appropriate confidence. However, declaring them acknowledges limits to science that has often had to fight for recognition against more formal analyses (such as economics or operations research), even though the latter necessarily neglect phenomena that are not readily quantified. Decisionmakers need equal disclosure from all disciplines, lest they be unduly influenced by those that oversell their wares.
***********************
2 attachments — Download all attachments
discussion about Adam's email between Scott Ferson and Helen Regan
Hey, Helen:
When I got the email below from Adam Finkel and read Baruch Fischoff's comment about trade secrets of course I thought of you and your experience with "speaking uncertainty to overconfidence", to turn a phrase. Wasn't it Lovejoy and the rates of extinction?
Could you tell me what your experience was? If you really did feel censured, then maybe you wouldn't want us to refer to your experience directly or by name--and I wouldn't do so unless you'd be comfortable with it. But I'm interested to know whether you think Baruch is correct, and in what you feel today about that whole situation.
I trust you're well and had a good summer. Are there Facebook pictures of the little one? I don't think I've seen any.
I went to Australia for the world congress on risk. Saw Mark Burgman for, literally, almost 15 minutes. It was fun nevertheless (apart from the plane ride itself). Do you make that schlep very often?
Scott
---------- Forwarded message ----------
***************
In other cases, experts are willing to share their uncertainty but see no need because it seems to go without saying: Why waste decisionmakers’ valuable time by stating the obvious? Such reticence reflects the normal human tendency to exaggerate how much of one’s knowledge goes without saying. Much of cognitive social psychology’s stock in trade comes from documenting variants of that tendency. For example, the “common knowledge effect” is the tendency to believe that others share one’s beliefs. It creates unpleasant surprises when others fail to read between the lines of one’s unwittingly incomplete messages. The “false consensus effect” is the tendency to believe that others share one’s attitudes. It creates unpleasant surprises when those others make different choices, even when they see the same facts, because they have different objectives.
*******************
Even when their organization provides proper incentives, some experts might fear their colleagues’ censure should their full disclosure of uncertainty reveal their field’s “trade secrets.” Table 2 shows some boundary conditions on results from the experiments that underpin much decisionmaking research, cast in terms of how features of those studies affect the quality of the performance that they reveal. Knowing them is essential to applying that science in the right places and with the appropriate confidence. However, declaring them acknowledges limits to science that has often had to fight for recognition against more formal analyses (such as economics or operations research), even though the latter necessarily neglect phenomena that are not readily quantified. Decisionmakers need equal disclosure from all disciplines, lest they be unduly influenced by those that oversell their wares.
**********************
Helen May Regan <helen.regan@ucr.edu>
Thu, Aug 30, 2012 at 1:44 PM
To: SandP8 <sandp8@gmail.com>
Hi Scott,
Good to hear from you. The story you are thinking of is an anecdote I mentioned in my talk on fuzzy arithmetic and extinction rates at the Risk Analysis conference in Atlanta. I heard a talk by Robert May (now Baron May of Oxford!!) who presented some estimates of modern extinction rates (as if they were precise), concluding that we are in a mass extinction event, and said that it wasn't "bullshit, it's physics." I repeated the anecdote with the word "bullshit" and was criticized because I swore at a professional meeting, I was a girl and I wasn't wearing a suit. So I wasn't censured because I was suggesting that May's comment in itself was bullshit (because of the uncertain nature of the estimate and the complete lack of acknowledgement that it was highly uncertain), I was censured because I quoted the word bullshit (which to my mind was a bunch of bullshit).
I have been severely criticised for suggesting that Monte Carlo methods are insufficient to capture the full extent of uncertainty in models with uncertain parameters. In these cases opponents have argued that standard statistics captures the uncertainty very well. I think they totally miss the point that the data points in themselves are uncertain - they seem to think that the only errors are sampling errors. And I have also been subject to the regular Bayesian dogma but you have had way more experience than I have on that front.
But in quickly reading through the documents you attached, I don;t think any of these experiences are what you want. But feel free to use them as you wish if they are at all useful.
My own cynical view is that under uncertainty decision makers appeal to preferences. Even when scientific information can inform probabilities it is the preferences that hold sway. I am starting to believe that the public and decision makers view science as only being useful in shedding light on what the potential outcomes of an action are, rather than providing probabilities of events. I have also seen instances where the probabilities are contaminated by the perceived severity of the outcome (e.g. one might argue that the high probabilities of an avian flu outbreak in Fischoff's table could be a confounding of the awfulness of an outbreak and the chance that it will occur). But then, we are in an election year and I am fed up with all the bullshit.
One thing Fischoff doesn't do justice to is the great difficulty inherent in reporting uncertainty quantitatively and the resistance of scientists to use anything but their favorite method - as a result they only see uncertainty where they want to see it. In one of my awful experiences in presenting a p-bounds talk to mathematicians someone said (and their were lots of heads nodding) "if you start going down the path of considering uncertainty in data points then everything is uncertain and why bother - you can't say anything." I think that sums up the attitude of a lot of scientists. I am not suggesting that we don't try to report uncertainty but as someone who has tried I see how difficult it is in many of the models I currently use. I can't even imagine how to report it in the models we are currently developing that link many different models and many different data sources. All I can say is that, yeah, it's highly uncertain and I am unsure what the real value of any of it is except for exploring possible outcomes that we can't put a probability on.
Alright, you weren't asking for any of this so I'll stop.
I will send a separate email with photos of Gabriel. We have had to stop going to Australia for a while because the last time we went (Christmas 2011) Gabriel slept for 2 hours and for the remaining 12 hours he jumped, dived, kicked seatbacks, grabbed people's hair in the row in front, threw toys and books, screamed, squealed, ran down the aisles, struggled and generally behaved like a lunatic on speed. We were the worst family on the plane and I wanted to kill myself. We even tried to drug him with benadryl but it had no effect.
Yeah Burgman is a really high flyer now. I saw him in Santa Barbara in April this year which was good because he was forced to sit still for 2 days. Listening to Hugh Possingham and Mark talk about all their millions of dollars and gazillions of students/postdocs they have to manage and all the papers and attention they are getting made me feel exhausted but really glad that I have very little funding and a low profile. Incidentally, I also saw Manolo's wife Maria at the same meeting. I couldn't remember who she was which was really embarrassing. But she and her kids are doing really well in Ohio.
Okay, I hope you are doing well.
cheers,
Helen.
Dr Helen Regan
Associate Professor
Department of Biology
University of California
Riverside, CA 92521
USA
email: helen.regan@ucr.edu
phone: +1-951-827-3961
fax: +1-951-827-4286
office #: Speith 3358
Web: www.biology.ucr.edu/people/faculty/Regan.html
>>
Deconvolution
<<Adam requests: Can we make the examples about costs to make the paper more relevant to the project?>>
Most protocols for estimating statistical distributions assume that the measurement uncertainty associated with individual value observations is negligible compared to the sampling uncertainty associated with the prescribed sample size. In cases, where the measurement uncertainty should not be neglected, deconvolution can be used to correct the observed distribution for the measurement uncertainty associated with the data used to form it. Like ordinary bias correction, such a deconvolution uses information about the sign and magnitude of the error to produce the improved estimate, but it also makes use of the distributional shape and dependence of the measurement error to improve the estimate of the variable.
For instance, suppose X is a random variable that cannot be measured perfectly because of imperfections in the measurement process. Let Y denote the observed value. The error model might be Y = X + e, where e denotes the measurement error.
Suppose further than sample information is available that
ariables are normally distributed and IX and e are independent.
If X and e are independent, sY² = sX² + se²
• The correction accounts for mean, var, shape
• If e is a distribution, then X is always steeper that Y and possibly shifted from it
Example:
y ~ N(19, 7);
e ~ N(0, 5)
x ~ N(mean(y)-mean(e), sqrt(var(y)-var(e)))
x ~ N( 19, Ö24)
y ~ N(19, 7)
e ~ N(0, 5)
clear; show y; show e in gray
x ~ N(mean(y)-mean(e), sqrt(var(y)-var(e)))
clear; show y; show x in blue
Y = x |+| e
clear; show y; show Y in red
x
~normal(range=[6.38106,31.6189], mean=19, var=24)
<<Resit's paper (Akçakaya 2002) on taking demographic uncertainty out of sample variance to estimate environmental stochasticity>>
What is the numerical example for when the dependence between x and e is perfect ?
How can we implement deconvolution for EDFs so we don’t need to use normal theory?
How can we also account for non-sampling error?
P-box example
• Y = X + e
• Y, X, e ~ normal
• X ~ N(mY - me, sqrt(max(0, sY² - se²)))
• Y ~ N([18, 20], [1, 8]); e ~ N([-1, +1], [4, 5])
• X ~ N([18,20]-[-1,1], Ö(max(0, [1,8]²-[4,5]²)))
• X ~ N([17, 21], [0, 6.93])
y = N( [26,30], [6,9])
e = N( [-1,1], [4,5])
clear; show y; show e in gray
x = N( decon(mean(e), mean(y)), sqrt(decon(var(e), var(y))))
clear; show y; show x in blue
Y = x |+| e
clear; show y; show Y in red
x = N( mean(y) - mean(e), sqrt(var(y) - var(e)))
clear; show y; show x in red
Y = x |+| e
clear; show y; show Y in red
y = N( [18,122], [6,9])
e = N( [-10,10], [4,5])
clear; show y; show e in gray
x = N( decon(mean(e), mean(y)), sqrt(decon(var(e), var(y))))
clear; show y; show x in blue
Y = x |+| e
clear; show y; show Y in red
x = N( mean(y) - mean(e), sqrt(var(y) - var(e)))
clear; show y; show x in red
Y = x |+| e
clear; show y; show Y in red
Y ~ N([18, 20], [1, 8]);
e ~ N([-1, 1], [4, 5])
X ~ N([18,20]-[-1,1], sqrt(max(0, [1,8]^2-[4,5]^2)))
X ~ N([17, 21], [0, 6.93])
X
~normal(range=[-0.850497,38.8505], mean=[17,21], var=[0,48])
If e is a p-box, X
may be shifted,
steeper and wider
Advanced bias correction
• Deconvolve an arbitrary p-box e (measurement uncertainty) out of a collection of measurements
• Distribution for e can be
– Centered off zero
– Asymmetric
– Imprecisely characterized
<<>>
The paper was written with support from the National Science Foundation (NSF), through a grant (<<#?>>) to Adam Finkel at The Pennsylvania University Law School, and a subcontract to Applied Biomathematics, and also with support from the National Library of Medicine, a component of the National Institutes of Health (NIH), through a Small Business Innovation Research grant (award number RC3LM010794) to Applied Biomathematics funded under the American Recovery and Reinvestment Act. Thanks are due to the NSF <<project manager>> Robert O'Connor, and also Janos Hajagos, Bill Oberkampf, Lev Ginzburg, and Michael Balch<<unless he is an author>>. The opinions expressed herein are solely those of the authors, and not those of The Pennsylvania University, Applied Biomathematics, the National Science Foundation, the National Library of Medicine, or the National Institutes of Health.
Akçakaya, H.R. (2002). Estimating the variance of survival rates and fecundities. Animal Conservation 5: 333–336.
<<our old deconvolution papers if any are relevant>>
<<any Resit papers on extracting measurement uncertainty>>
<<Oberkampf and Roy>>
ORIGINAL INTRODUCTION FOLLOWS
Several studies have shown that numerical estimates produced by experts and lay people alike are commonly biased as a result of self-interest on the part of the persons making the estimates. For example, bids made by contractors under cost-plus-fee contracts regularly underestimate the actual costs of a project<<ref>>. Likewise, economic estimates of compliance costs of industrial regulation commonly overestimate the eventual true costs<<ref>>. Industrial safety reports routinely understate the failures<<ref>>. Some hospitals do not enjoy full reporting of morbundity statistics<<ref>>. The effect of self-interest is generally consistent in direction and, although not always consistent in magnitude, it is often large enough that post hoc numerical corrections are warranted and important. It has also been empirically well established that, when numerical estimates include expressions of uncertainty, they are also usually overconfident, that is, the uncertainties are smaller than they ought to be. Although simple scaling, shifting or inflating corrections are widely used to account for such biases and generic overconfidence, much better distributional information is usually available to the analyst, and fully using this information can yield corrected estimates that properly express uncertainty and make them more suitable for use in risk analysis and decision making. To account for the other information, these advanced corrections express biases as distributions or probability boxes rather than simple scalar values. Corrections can be made in two distinct ways that will be useful in different analytical settings. In the first way, an empirical distribution or p-box of errors (established in a prior data-quality study or an ancillary validation study) is convolved with each observed value. This calculation acknowledges uncertainties associated with the estimation process and puts them into the estimated value. The second way involves a deconvolution that extracts, insofar as is possible, the blurring of the data by measurement error associated with the measurement protocol. The result is often a reduction in the variance of a distributional estimate because the deconvolution removes the confounding uncertainty that contaminated the measurement process. In both of these cases, the structure of errors can be characterized as a distribution or p-box with arbitrary complexity. For instance, the errors may be zero-centered or directional, symmetric or asymmetric, balanced or skewed, and precisely or imprecisely specified. We illustrate the requisite calculations to make these corrections with numerical examples.