The questions and tasks below are needed for Applied Biomathematics' NIH ARRA project "Compensating for Uncertainty Biases in Health Risk Judgments" (1RC3LM010794-01). For each need, we’ve tried to explain the problem and give a summary of any thoughts or progress toward its solution. It is possible that some of these problems have already been solved in the literature, and we simply need to be made aware of solutions. In any case, we’d be happy to hear your thoughts on any of the topics. However, if the question or task is wrongly worded, contains mistakes, depends on falsehoods, or otherwise belies confusion, please feel free to offer a correction, improvement or advice. And if you’re not sure there’s a problem, but you think there might be, please let us know rather than presuming we know what we’re doing. Please direct answers, ideas, advice to Scott Ferson, scott@ramas.com, 1-631-751-4350.
The best place to read the list of research needs is still the original Microsoft Word document, but we're trying to transfer the needs to this collaborative Google Site in the various links below. Please contribute your answers, ideas, comments, and concerns directly on the Google Sites pages through the following links:
Quantiles of the medical test posterior given inputs are independent betas
Make a software library for robust Bayes analysis with p-boxes
Confidence bands for a distribution using an assumption such as normality
Cartesian product algorithm for products and quotients when inputs straddle zero
How do we get traditional confidence intervals for logistic regression?
Bayes’ rule for multiple tests without independence assumptions
Bayes’ rule with epistemic uncertainty and without independence assumptions
Sums, differences, etc., of p-boxes that are positively or negatively dependent
Generalize perfect and opposite convolutions for arbitrary operands
Derive and implement tail-emphasizing algorithms for convolutions
Facebook app to collect data to quantitatively define the ‘about’ modifers
<<ideas after Bill Oberkampf left Sandia>>
1) Development of a workable method for sensitivity analysis for use within a Monte Carlo simulation in a risk assessment. Methods in current use are very expensive computationally and require the analyst to specify a “delta” range. To make a sensitivity analysis comprehensive also requires an enormous effort by the analyst to construct and a considerable effort just to understand the results. What is needed is a method with modest computational costs that can be used to characterize sensitivity of risks (i.e., probabilities) to changes in model inputs. One method that is especially promising was proposed by Stanislav Uryasev. This method requires the derivation of a kernel function for each new kind of problem, but once the kernel has been defined, the method can cheaply compute true (infinitesimal) sensitivity estimates of many parameters and does not force the analyst to specify delta ranges in advance. The approach theoretically can be applied to arbitrary model parameters, notwithstanding mathematical complexity of the model, nonlinearities, or correlation structure among the random variables. Remarkably, the calculation can be incorporated into an existing Monte Carlo simulation in a way that requires no further iterations of the Monte Carlo simulation beyond those needed to estimate the probability (risk) itself, and can often be accomplished with negligible additional computation. Indeed, because the sensitivities of any number of parameters can be computed simultaneously in this way, this approach offers vast savings of computational effort compared with the traditional approach to Monte Carlo sensitivity analysis.
Customized week-long workshop on wrangling uncertainty in numerical models
The workshop will provide an introduction to the most important and useful methods and strategies for expressing and propagating aleatory and epistemic uncertainty through numerical models, focusing on imprecise probabilities and the modern distinction between kinds of uncertainties. The workshop will address the basic parameter specification problems of characterizing uncertainty for variables about which little empirical information is available, the effects of inter-variable dependencies and how they (or uncertainty about them) can be incorporated into models. It will review and critique the common strategies for addressing model uncertainty. The workshop will include hands-on numerical exercises in which participants can build their experience and familiarity with the methods. The numerical problems will be designed for solution either by hand calculation or on multiple computing platforms, including R and Matlab. The workshop will also review the scalability of methods to large computational problems and emphasize methods that can be applied to complex numerical simulations. The outline of ancillary topics to be covered will be customized according to the interests of the organizers and participants. These topics might include decision making, sensitivity analysis, validation, solving equations, acceleration strategies for sampling, etc.
($40K)
Review of measures of compliance
The current bi-fold strategy of computing the k-factor for aleatory uncertainty and the confidence ratio for epistemic uncertainty is likely to be misleading for several compounding reasons. The main problem is that it divorces the two kinds of uncertainty and therefore cannot express the implications of their interaction. On top of this, the k-factor, a vestige from normal theory, relies on assumptions about the underlying distributions that may not be appropriate. For instance, if the distributions are asymmetric or heavier-tailed than normal distributions, the k-factor will poorly express the true risk of non-compliance. The confidence factor analogizes the k-factor and may also inappropriately assume symmetry. More generally, the current strategy will be hard to generalize for situations that may soon arise in which the threshold is itself uncertain or varying. What is needed is a more synthetic measure (or perhaps a set of related measures) of compliance that properly integrate aleatory and epistemic uncertainty and allow decision makers to be conservative in the face of the latter when they need to be so. Such a measure should offer a natural interpretation of the riskiness of the performance and of the margin of safety so it can be useful in design and management strategies requiring defense in depth. The intuition the measure provides should be reliable in the sense that it does not depend on unstated assumptions. We will prepare a review that illustrates the weakness of the current strategies, outlines a more flexible synthetic measure of compliance, and applies both to an example problem.
($60K)
Feasibility study for an uncertainty compiler/analyzer for crystal box codes
A crystal box is a computation stream, usually embodied as source code for a computer language such as Fortran or C, which cannot be changed but whose inner workings are fully known to the compiler. An uncertainty compiler is a processor that translates source code into a form that facilitates the machine calculations needed to propagate parametric uncertainty through the mathematical operations expressed in the code. In practice, this will also require the analysis of the code to identify input variables which an analyst might nominate to have uncertainty. An interface is needed to list these variables and allow the analyst to characterize their uncertainty. Such characterizations could be as simple as intervals, but might also be probability distributions, Dempster-Shafer structures or p-boxes. The interface will be built on the previously developed Constructor software. A separate uncertainty compiler will translate the code into a language supported with a library of uncertainty operations. The compiler will track functional dependencies among variables in sequential calculations. It will introduce code that generates appropriate structures in place of inputs designated to be uncertain, and, when necessary, it will replace mathematical expressions that involve function calls or infix operators such as +, -, *, /, ^ or ** with procedure calls that effect the analogous convolutions with uncertainty in the arguments. This project will require a multi-year effort, but after the first year’s feasibility study we will have a working uncertainty compiler/analyzer for a class of straightforward implementations. Within the first year, we can demonstrate the feasibility of an uncertainty compiler for crystal box codes with a case study based the spring-mass-damper problem (using the same uncertain inputs as those recently used by Jon Helton). A successful case study will help to design a plan for further development of the compiler to handle more intricate codes with intervariable dependence or cases when program flow control variables are themselves uncertain.
($50K)
Relationship with Donald Estep’s ideas
We reviewed Donald Estep’s presentation at the CSRI’s validation workshop in August, although not benefiting from the oral exposition probably inhibits our understanding of it. Of course, we generally agree with Estep’s thesis that one must combine statistical, probabilistic and deterministic analytical methods to quantify uncertainty in computations, but we might tend to go further and suggest that still other special methods are needed to address epistemic uncertainty. The methods he discusses can capture aleatory uncertainty, but they would be limited in a context where epistemic uncertainty is also important, at least assuming one needs to continue to distinguish between the aleatory and epistemic forms.
Estep considers numerical error as a source of uncertainty, and we agree that it often is in many situations. But we tend to want to distinguish authentic uncertainty from artifactual uncertainty. Authentic uncertainty is the genuine unpredictability due to acknowledged input (and perhaps model) uncertainty. It cannot be generally reduced by a better analysis, although it can be better estimated. Artifactual uncertainty, in contrast, is due to deficiencies in the analyses or simulations applied to study a system. Such deficiencies might include, for instance, too few polynomial terms in a regression model, an overly coarse discretization or mesh size, or numerical instabilities of many kinds. Artifactual uncertainty can in principle be reduced by a better analysis. The challenge for uncertainty propagation methods is to faithfully project authentic uncertainty while being as insensitive as possible to the artifactual uncertainty. We think that a useful way to get a handle on this difference is to use ‘automatically verified’ computational methods that rigorously bound the true result of a calculation rather than approximate it. Of course we acknowledge the usefulness of approximation approaches generally, as well as the particular ones that Estep proposes, but we suggest that it might also be useful to employ verified methods that include guaranteed interval bounds on all results. The accumulation of iteration error that Estep describes on screen 66 of 90 (on the 50th of 73 slides) is a source of artifactual uncertainty. But, of course, without knowing in advance the true solution (in red), it can be difficult to recognize that one is following an errant solution because of the accumulating error. The advantage of verified methods is that one can see right away when such divergence occurs because the outputs characterize their own reliability.
Estep’s fast adaptive parameter sampling (FAPS) uses error estimates to guide sampling interactively. This seems quite clever to us, and it should yield useful strategies in many cases. It is reminiscent of other recent ideas such as Uryasev’s method of computing estimates of infinitesimal derivatives for sensitivity studies involving Monte Carlo simulations. Both are smarter ways to use the limited sampling we have available by making use of analysis of the model being studied. Brute-force sampling approaches depend on an ability to evaluate samples ad libitum to achieve asymptotically good results. When we cannot make as many samples as we want because of computational constraints, it makes a lot of sense to make use of any extraneous knowledge that may be available about the system to improve the calculation. We would definitely want to use Estep’s approaches (as well as Uryasev’s by the way) in the uncertainty compiler we described above. They could be useful in evaluations whenever the inputs are pure probability distributions.
<<ideas before Bill Oberkampf left Sandia>>
Scott Ferson, Applied Biomathematics, scott@ramas.com, 1-631-751-4350
We’re actively interested in these topics. We’d be happy to discuss our ideas on any of them with you. In most cases, we already have some explanatory text that would give you a better idea of what we're thinking of.
Model uncertainty (why model averaging is bad, jets, and epsilon models)
Optimization with uncertain parameters
Decision theory under epistemic and aleatory uncertainty
Validation metrics and predictive capability
Extreme value theory under distributional uncertainty
Strategies and pitfalls of combining good data with poor data
Finding risk factors through a veil: logistic regression for imprecise inputs
Software for descriptive interval statistics
Software for robust Bayes analyses
Software implementing probability bounds analysis in R
Software for visualizing uncertainty and variability
Extend the Constructor software
Visualizing uncertainty and variability, especially in maps
Interval dependence (using dependence information with in interval analysis)
Tree-lining likelihood (rather than maximizing likelihood)
Confidence limits for distributions (generalizing KS with shape information)
Propagating extreme tail risks
Accessing internal (non-tail) risks
Generalizing the Cauchy deviate method
Avoiding the inflation from repeated uncertain parameters
Rough calculations (computing with words, coarse uncertainty)
Propagating p-boxes through nonlinear ordinary differential equations
Establishing regulatory or specification compliance under uncertainty (QMU)
True infinitesimal sensitivity analysis within Monte Carlo simulation
Reliability analysis with missing data (Alison’s problem)
Uncertainty of the third kind (vagueness/fuzziness)
Uncertainty logic
Modus tollens in the imprecise probability context
Estimating the risk of N-out-of-K events from marginal risks
Order statistics for p-boxes
Sums of random variables with uncertain distributions or dependencies
Axiomatics for uncertainty algebras
Solving equations involving uncertain numbers (engineering design)
Statistics on data that may include intervals, sets or p-boxes
Sensitivity analysis under interacting epistemic and aleatory uncertainty
Model uncertainty. Quantitatively, perhaps the most important source of uncertainty in risk assessments is doubt about the structure of the model or the form of the risk expression. Yet this uncertainty is rarely even acknowledged, much less accounted for in a comprehensive way. If model uncertainty is addressed at all, one of two approaches is usually employed. The simpler is Apostolakis’ method of introducing a discrete random variable and letting its (randomly varying) value decide the model to use for the current simulation. The more complicated approach is Bayesian model averaging which also computes a weighted stochastic mixture of the competing models. These applications are problematic because it seems likely that model uncertainty is almost always epistemic rather than aleatory in nature, and these methods for handling model uncertainty treat it as though it were aleatory. Confusing the two forms of uncertainty was explicitly criticized in the seminal report on risk assessment by the National Research Council, but few analysts know alternative methods for handing model uncertainty.
Another form of model uncertainty is related to the “surrogacy problem” in which we want data about a variable X but only have data on the related variable Y. Using the available data without modification obviously assumes that X = Y. A more respectable use would assume X = f (Y) for some unknown function f and account for the uncertainty about f. Even with very modest assumptions on the function f it is sometimes possible to circumscribe X tightly enough to allow a useful uncertainty propagation. Such an approach generalizes and legitimizes the use by some engineers of “model parameters” intended to represent model uncertainty associated with the use of data from surrogate variables.
The report will review the underlying issues and common approaches to recognizing and accounting for model uncertainty in quantitative risk assessments. The review will contrast the approach of the Bayesian school with possible approaches that respect epistemic uncertainty, and consider strategies for relaxing model structure using polynomial jets with genericity properties, model space embedding schemes, enveloping approaches, generalizations of the ± operation, etc. The report will also consider the use of tetherings such as interval-valued regressions, Tony O’Hagan’s method of modeling functions between known points with Gaussian distributions and smoothness parameters, etc. The report will also consider the use of imprecise probability methods such as Walley’s imprecise Dirichlet model, probability boxes, and related ideas intended to capture model uncertainty. The report will review the advantages and disadvantages of the various traditional and recently developed methods. It will consider the limitations (e.g., Bayesian model averaging require the analyst to construct an explicit enumerated list of possible models) and suggest guidance about which methods should be used in what situations.
Task: Review the available approaches to account for model uncertainty in quantitative models, including epistemic uncertainty about the correct form of the model and uncertainty arising from use of data from surrogate variables.
Optimization with uncertain parameters. Many design problems are basically optimization problems in which analysts seek the values of arguments that extremize some function. For instance, we might like to design a rechargeable battery that has a long service life but can also, with very high reliability, meet a series of episodic power demands over time. The output requirements and charging opportunities may be imperfectly unknown at design time because of epistemic uncertainties and unforeseen variabilities. Optimization is often not used in early-phase design problems because of limitations on time, information or understanding about the processes. Having fast, flexible algorithms to solve optimization problems when both epistemic and aleatory uncertainty are present would be very useful to designers. Well developed methods already exist for optimizing when parameters contain interval uncertainty. In principle, interval branch and bound can solve all bound-constrained global optimization problems where the solution is sought in some box. The general solution is NP-hard, but for small problems, feasible algorithms are available. Methods for when parameters are probability distributions are harder (http://stoprog.org), but their difficulty is partially due to the constraints represented by precise distribution functions. Solutions seem to be easier when distributions are relaxed to p-boxes or Dempster-Shafer structures because fewer constraints need to be satisfied. Because such problems can be decomposed into a series of interval problems, it should be possible to develop algorithms that could be very quick for small or moderate-sized optimization problems in which uncertainty is expressed with uncertain numbers.
Establishing compliance under uncertainty. The most common reason for conducting probabilistic uncertainty assessments is to determine whether engineered systems are performing (or will perform) their intended functions. For instance, certification of the reliability of a stockpile made be needed without the benefit of full-system test data. Essentially similar issues arise in settings where regulatory compliance or materials specification performance must be demonstrated. Quantification of margins and uncertainties (QMU) has been proposed to make such assessments, but the working details and properties of such assessments have not yet become clear in contexts where epistemic and aleatory uncertainties are both present. In deterministic assessments, this determination was often as simple as comparing an expected value of a critical assessment output to a target value. When the assessment is probabilistic, this comparison is not quite so simple if either the output value or the target value is a probability distribution. Are we in compliance with our goals if 90% of the output values compare favorably with the target value(s)? Conditions of compliance can be expressed at several levels. For instance, one may require every value or some percentage of values of a random variable to be smaller than a certain value, or one might demand only that the mean of a distribution be smaller than the value. In a context where there is significant epistemic uncertainty, the comparisons become even more complicated than in the probabilistic case. Another level is added to the complexity of stating goals and measuring compliance. Examples are needed that show decision makers how compliance could be defined in such contexts.
Task: Develop strategies and arguments to express compliance or non-compliance with quantitative performance targets (1) when performance is represented by an uncertain number, and (2) when the target and the performance are both represented by uncertain numbers.
Validation metrics for uncertain numbers. The dissimilarity between two observations, or between an observation and a theoretically expected value is a fundamental consideration in comparing models to data. If the values are scalar points or vectors, there are various ways of measuring dissimilarity (known as metrics) that can be defined. When the values are distributions (which are special kinds of infinite-dimensional vectors), the measures of dissimilarity can be more complicated because distributions can overlap, and they can be in close agreement over some part of their range but in stark disagreement over other parts. The most popular ways to measure dissimilarity between probability distributions involve entropy or some of its many mathematical cousins. When the values being compared are general uncertain numbers (i.e., intervals, p-boxes, Dempster-Shafer structures, probability distributions, or random sets on the real line), the question of how to measure dissimilarity can be much more subtle. Typically, one would like the metric to be a scalar measure itself that assesses in some overall sense the dissimilarity between uncertain numbers. But there may be cases in which it would be more informative and useful to distinguish distances in two senses, say, one concerned with epistemic uncertainty and one concerned with aleatory uncertainty.
Task: Assemble a SAND report that reviews the mathematical considerations in developing validation metrics between uncertain numbers and strategies for inferring predictive capability of simulation models validated by imprecise data.
Extend the Constructor software. The Constructor software has been released for beta review. During this review process the software will be updated continuously in preparation for a public release. The release will include revised and expanded documentation, including a revised software manual with full descriptions of all the software functions, context-sensitive on-line help, and an updated website. The program itself will be revised to take account of reviewer comments and several planned extensions, additional features and capabilities, including fitting confidence limits on probability distributions to observed sample data, supporting other aggregation methods (in addition to intersection and enveloping), and a superstructure for allowing the user to invoke these aggregations. The software and documentation will also be revised to emphasize its use in summarizing the numerical results from black box calculations.
Task: Revise the Constructor software in response to beta reviewer comments, complete the software documentation, and implement planned program extensions, including features to fit data to distributional hypotheses.
Tree-lining likelihood. Much of estimation theory involves Fisher’s idea of maximizing likelihood, that is, finding the parameters of a model that would have the largest probability of producing the data that has actually been observed. When data are sparse, however, the likelihood function can be shallow so that the likelihood associated with the parameter at the peak of the function is not very different from other parameters that might be nearby. In such situations, it makes sense to look at the set of parameters that have likelihoods that are larger than some critical level—that are above a treeline of the function whose altitude is specified. Typically, a set of parameters may emerge as all pretty good at explaining the observed data. Such a set of parameters would characterize a p-box or similar structure in imprecise probability. In some extreme cases of data sparseness, it may be that no parameter achieves the specified level, which would imply that the data are insufficient to determine the model at all. Tree-lining could also be applied to other decision criteria that are usually maximized, such as entropy, utility, etc., in each case producing sets of results each of which satisfy the criterion well.
Deconvolution. Develop analytical methods to solve equations involving uncertainty such as characterizing B when AB=C, and A and C are characterized by Dempster-Shafer structures of probability boxes. (It is easy to show that assuming B=C/A can lead to terribly wrong results.) Deconvolutions are necessary in planning how to achieve performance standards. For instance, how can B be allowed to vary given that certain constraints on C must be maintained?
Propagating extreme tail risks. Most numerical schemes currently used for representing and propagating uncertainty are based on specified probability discretizations or equiprobability discretizations, which may often be too crude for discerning tail risks of interest in disaggregative models or fault tree analyses of engineered systems. We will implement tail-emphasizing algorithms for computing sums (or differences, products, quotients, conjunctions, disjunctions, negations, etc.) of uncertain numbers to replace the current schemes. For instance, a mixed scheme maintaining equal-probability steps in middle ranges but order-of-magnitude steps in the tails that remembers quantile bounds for the probability levels 0, 10-20, 10-19, …, 0.0001, 0.001, 0.01, 0.02, 0.03, 0.04, …, 0.98, 0.99, 0.999, 0.9999, …, 1-10-19, 1-10-20, and 1, would be able to make quantitative assertions about risks between zero and 10-20 even though it uses only 1+18+99+18+1 = 137 discretization levels. [A page-long description of this proposed task is available.]
Sampling theory. Although many analysts employ statistical confidence intervals as though they were rigorous bounds by using them in interval analyses and calculations with Dempster-Shafer structures, it is clear that this is not justified mathematically. Yet the need to make some use of available data drives analysts to use unjustified tools. But what is the correct approach to go from sample data to Dempster-Shafer structure? We will explore the sampling theory for uncertain numbers and, specifically, the implications of applying projection methods designed for rigorous bounds to uncertain numbers derived from statistical confidence statements. [If this task is undertaken, we may be able to harness the efforts of a post doc in the U.K.]
Uncertainty algebra. Develop a list of fundamental facts about uncertainty algebra. In many cases, these allow analysts to foretell the results from an arithmetic computation from mere inspection of the form of the problem. For instance, in some situations an analyst can tell that dependencies between variables will have no effect on calculated results. In other cases, an analyst can recognize equivalences that allow a vast improvement in computational cost to obtain an answer. [A draft outline is available.]
Imprecise functions. Develop analytical techniques to propagate uncertain numbers through an imprecise map f : Rn ® U (where R denotes the reals and U denotes the set of uncertain numbers) using natural extension principles. This ability will enable us to relax the restriction of response-surface methods to deterministic black box codes. This is a practically important special case of proposed task on model uncertainty.
Quantitative study of acceleration strategies. Compare the performance characteristics of the acceleration strategies in realistic example problems. Contrast the strategies in terms of their usefulness in different situations
Software for acceleration strategies. Develop software implementing the most promising accelerations strategies studied in the previous task. The programming will be done in C++ and will adhere to the specifications for add-ons to Sandia’s Dakota software. This task is contingent on effort implied by the previous task.
Visualization tools. Scientists and engineers are not accustomed to viewing displays of uncertain numbers. An extensive psychometric literature has shown that human perception is very easily misled by common graphical and numerical techniques. These perception faults are closely analogous to optical illusions in that they are hard-wired in human brains. Empirical studies have shown that this true for quantitatively trained people such as doctors and even statisticians as well as the lay public. Because humans are so easily tricked, useful visualization software must employ special strategies to prevent misconceptions and facilitate understanding of the essential information and uncertainties to be communicated. We will develop visualization tools that can interactively display the uncertainty and respond to the specific concerns and predispositions about risks of individual viewers. [Prototype software is available.]
Solving equations involving uncertain numbers. Algorithms are needed to solve equations that involve uncertain numbers. Such problems are ubiquitous within engineering design. How can we design an insulation and cooling subsystem when we cannot precisely specify the distribution of thermal insults the device will encounter? We could design for the “worst case”, but this approach is often wasteful, and it cannot make any account of more extreme conditions than are assumed to be (but may not actually be) impossible. On the other hand, it doesn’t make sense to presume that we know the exact distribution of thermal insults, especially for untested designs in new environments. Uncertain numbers (i.e., Dempster-Shafer structures and p-boxes) are a natural compromise that can express gross uncertainty (like worst case and interval analyses) and also distributional information about tail probabilities (like probabilistic analysis). Techniques will be developed that solve equations of the form A o B = C, where o is a mathematical operation and A and C are known or prescribed uncertain numbers. When there is epistemic or aleatory uncertainty in the terms of the equation, the solution cannot be obtained simply by solving for the unknown as is possible when the terms are purely deterministic. The constraints satisfaction solution is such that, so long as the design guarantees that some b is within the solution B, then the result A o b will be guaranteed to be within the target constraint represented by the uncertain number C.
Imprecise statistics. Several scientific and mathematical articles have appeared over the last decade that have addressed disparate topics involving statistics for indeterminate data (i.e., intervals and related objects). In addition to the rather large literatures devoted to the topics of "censoring" and "missing data", there has been at least one book (by Charles Manski) and several individual papers on methods, e.g., a paper on using the Ipop statistic computed from interval data as a variant of the Mantel test for clustering, a paper on discriminant function analysis, and a few papers published by us on computing basic descriptive statistics for intervals data (mean, median, variance, etc.). These papers have appeared in various and sundry journals and have rarely cited each other. Although they clearly constitute a branch of the general topic of robust statistics, there has been very little synthesis of this literature that would facilitate its wider use by the engineering community. We believe that a well-written introduction to this topic would be timely and could be very influential in both the engineering and statistics community in a way that improves their reception of the issues attending epistemic uncertainty.
It is fair to say that the bulk of statistical literature over the last century has focused on assessing and projecting sampling uncertainty (that which arises from having measured only a subset of the population of possible measurands), and has in comparison neglected the problem of assessing and projecting the imprecision of these measurements. (The exception to this is a series of seriously flawed paper from NIST on the subject.) This lack of attention has spawned a great ignorance about the importance of imprecision and epistemic uncertainty more widely.
In many settings, one would intuitively expect there to be a tradeoff between precision and sample size of measurements. For instance, one might be able to spend a unit of additional resources to be devoted to measurement either to increase the number of samples, or to improve the precision of samples. Consider, for example, the problem of estimating the mean of some random quantity. We might use an upper confidence limit on the mean to account for the sampling uncertainty associated with having made only a few measurements. The upper confidence limit is affected by the sample size and also by the precision of the individual measurements. Because statisticians commonly neglect the quantification of measurement precision, they seem to have been lulled into thinking that each measurement is infinitely precise. (This might actually be true for certain kinds of counts, but it fairly uncommon in most settings involving measurement of physical quantities.) Some statisticians, and certainly many practitioners, therefore seem to believe that the tradeoff always favors increasing the number of samples over improving their precision. This is clearly a mistake, as can easily be shown by straightforward numerical examples. The misunderstanding seems to have originated from neglecting the importance of epistemic uncertainty.
The report will explain that the distribution function for an interval data set is a Dempster-Shafer structure and what this fact means for the statistics for such data sets. It will also include a review of the literature on descriptive statistics for interval data, including computability limits, an introduction to the literature on inferential statistics on interval data, and a full discussion of the tradeoff between sample size and precision and guidance on how engineers should handle this tradeoff in practical settings. It will include simple yet practically important examples constructed to show that the tradeoff between sample size and precision is far from universally favoring one side. The report will include a description of the nonlinearities involved in such tradeoffs and will explain how these nonlinearities imply that the optimal investment of empirical resources between increasing sampling and improving measurements depends on the specific details of the magnitudes of the different kinds of uncertainties.
Task: Review the literature on descriptive statistics (mean, variance, median, correlation, etc.) and inferential statistics (tests of means, regressions, etc.) for interval data.
Sensitivity analysis. Traditional approaches to sensitivity analysis focus almost exclusively on effects as measured by variance or change in variance. This perspective has been quite limiting and can produce highly misleading results when epistemic uncertainty is large.
The importance or “influence” of a parameter in an uncertainty analysis is the degree to which its uncertainty contributes to the uncertainty of the output. Some analysts assert that there can be at most 10 influential variables if we define an ‘influential variable’ as one that contributes no less than 10% of the uncertainty. This claim depends on the idea that what’s important about uncertainty is the variance, and the properties of the variance, particularly, the fact that the variance can be partitioned into components.
Of course, variance is not the only possible measure of uncertainty. In fact, very few of the reasonable measures of uncertainty actually behave in this way. Consider, for instance, a parameter’s range, i.e., the difference between the largest and smallest possible values. It is obviously another possible measure of uncertainty and it is commonly used for this purpose. It does not partition like variance does however. For example, consider the following simple uncertainty analysis conducted with interval analysis. Suppose there are 3 parameters to be multiplied together. And suppose for this example that the uncertainty about these parameters is such that each ranges on the interval [0, 2]. Obviously, the range of the product is just [0, 23] = [0, 8]. Replacing any one of the parameters by its midpoint would reduce the range of the product by half to [0, 4]. If we measured the importance of a parameter by the reduction in the width of these intervals, we would say that the importance of each of the three parameters was 50%. Now suppose that there are many such parameters to be multiplied. No matter how many parameters there are, the importance of each is 50%. This little example shows that uncertainty, as distinguished from variance, need not partition in the way analysts who think only of variance think it must.
The force of this example does not depend on the uncertainties of the inputs being similar in magnitude. If the importance of a parameter is measured by the percent reduction of uncertainty associated with removing the parameter from the model (or pinching it to its mean or some other scalar value), uncertainty analysts often observe that the sum of these importance values for the various parameters add up to something larger than 100%.
It is clearly not reasonable to claim that range is a somehow odd or unreasonable measure of uncertainty. The underlying truth is that uncertainty is a complicated and multivariate notion that cannot really be captured completely by the fairly simplistic notion of variance. Thus, variance is not the only measure of uncertainty, and moreover, variance is often not even a very useful measure of uncertainty if it is exceedance risks or tail probabilities that are of concern (which they usually are in risk assessments). Variance may partition, but uncertainty in the wider sense may not. Because the idea that partitioning of variance extends to other measures of uncertainty can lead to dangerous misconceptions, I call it the “anova myth”.
The report will review why variance is not the only possible measure, and often not even the best measure, of uncertainty. It will review a variety of other measures and suggest a schema for considering these measures. The report will motivate and illustrate the application of differentiation techniques to calculation problems for which the inputs are uncertain numbers (intervals, probability distributions, p-boxes, Dempster-Shafer structures, and random sets on the real line). The report will also describe the use of automatic differentiation for studying sensitivity of crystal boxes which are implementations for which explicit source code is available but cannot be changed. The report will also review the use of various “pinching” strategies in which uncertain numbers are replaced by hypothetical representations with reduced uncertainty. The reductions may be in variability, in incertitude or in both. These different kinds of pinchings are manifested as replacements of uncertain numbers by zero-variance intervals, precise distributions, and point values. There is no analog of the first two kinds of pinching in traditional Monte Carlo sampling approaches. The results of this approach will be compared to traditional methods for sensitivity analysis. The relative advantages and limitations of both will be highlighted. Practical guidance will be offered for analysts facing the general case, in which epistemic uncertainty plays a significant role, about what variance-based methods are good for and when alternative methods are to be preferred.
Task: Review and develop methods for conducting sensitivity analyses for models whose inputs are uncertain numbers (intervals, p-boxes, DSSs, probability distributions, random sets, on the real line). This effort will necessitate the use of different measures of uncertainty that could be appropriate for non-probabilistic analyses.
1. Approximation methods. Continue the effort begun this year to collate, review, implement and quantitatively study the behavior of methods that can be used to compute a surely conservative projection of uncertainty through a black box. Because such strategies trade off optimality of the results for computational convenience in obtaining them, they could be useful in screening assessments for problems where sampling is severely limited. The idea is to make calculations in a way that is sure to be conservative about uncertainty (i.e., sure not to underestimate uncertainty) but which can be completed without exhausting computational effort. In last year’s research, three approximation methods were studied: the Cauchy deviate method, the Kolmogorov-Smirnov method, and the Saw et al. inequality method. In the Cauchy deviate method, complicated Dempster-Shafer structures representing inputs were replaced with coarse intervals that enclose the structure completely and an approximate sampling strategy based on Cauchy deviates is used to propagate the intervals through a black box function. The deviates straddle and extend beyond the ranges of the intervals, but corrections to the summary results allow good estimates of the output interval when the function is roughly linear or the uncertainties are relatively small. In the Kolmogorov-Smirnov method, a black box function is treated as an oracle that produces a sample output for any set of inputs. Because these outputs are independent (if the sets of inputs are) and identically distributed, Kolmogorov-Smirnov confidence limits for a distribution can be used to obtain bounds on the distribution that account for uncertainty arising from the small sample size. The Saw et al. inequality method appeals to a generalization of the Chebyshev inequality, which yields bounds on the tail risks of a quantity given the mean and variance of the quantity. The sampling of the black box function is thus reduced to a problem of estimating the mean and variance of the output. Follow-on research this year will explore other possible approximation strategies. The effort will include comparing the performance characteristics of the approximation strategies in realistic example problems and contrasting the strategies in terms of their usefulness in different situations.
2. Software for approximation methods. Develop software implementing the most promising approximation methods considered in the previous task. The programming will be done in C++ and will adhere to the specifications for add-ons to Sandia’s Dakota software. This task is not contingent on the effort on the previous task, although it would be enhanced by it.
Old tasks
Whittling. Explore the practical application as screening tools of simple methods that permit the conservative propagation of uncertainty through a black box.
The idea here is to solve the problem by whittling away at a crude, conservative expression of the uncertainty, rather than expending a lot of computational power to construct the correct or tightest possible answer. For instance, replacing each input Dempster-Shafer structure with a simple interval having the same support reduces the problem to a simple (multivariate) interval problem. The Cauchy deviate method may be used to solve this problem if the black-box function is fairly linear over the uncertainty range. The result produced by this method would be conservative, and one might typically expect it to be highly conservative. But even if it is, the result may still be useful in management decisions or in demonstrating compliance to some target goal. Note that this approach is conservative with respect to ignorance about the dependencies among the input parameters.
Another comparably simple approach is to treat the black box as though it were a natural system and treat the sample calculations as though they were measurements of some process or quantity in the real world. Of course, the black box is unlike the natural world in at least one critical way: measurement error and stochastic variability are essentially zero in the black box itself, although both are still present in our problem (they’re in the inputs). If the inputs used to generate the samples are random and the sampling satisfies representativeness assumptions, then it would be reasonable to use standard statistical approaches to characterize these samples to make inferences about the underlying population of potential samples that would come from uncertain inputs. For instance, conservative, distribution-free Kolmogorov-Smirnov confidence limits could be constructed about the probability distribution of sample values. Alternatively, the inequalities of Saw et al. (which generalize the classical Chebyshev inequalities) could be used to infer conservative, distribution-free limits on the tail risks based on the moment estimates observed from the samples.
Because each of these methods uses very little information, and because each is intentionally conservative, they will all produce very wide bounds about the output variable. However, it is not clear how the results will compare to one another. One might expect that the interval approach would always yield the broadest answer, but this may not be true because small sample sizes will strongly inflate the Chebyshev bounds. For a particular black box and sample size, will the interval-Cauchy method yield an interval that is almost totally inside the Chebyshev-Saw limits? Or will it be the other way around? Will the Kolmogorov-Smirnov limits be tighter or looser? It seems likely that which methods are tight and which are loose will depend partially on the underlying model implemented in the black box. Therefore it may be useful to pursue all of them at the same time. It is entirely possible that one could superimpose these methods to obtain an inference about the result that is more informative than any alone.
It will be important to flesh out the details of how one can compute these crude bounds. Weak distributional assumptions (such as unimodality) that might be justified by knowledge of the underlying mechanisms implemented in the black box could allow a further tightening of the bounds. There are many advantages to simple random sampling, but low-order moment estimates are often improved by Latin hypercube sampling. Are there general features of the black box function that would tend to favor one sampling strategy over another? Are there tradeoffs between the efficacies of the methods?
Because the crude bounds are so simple and so conservative, such an approach will clearly not produce the best possible estimates of how the black box behaves given the stated uncertainty in the inputs. But we don’t necessarily want the best possible estimates. We want estimates that are good enough for the particular uses we need them for. Such analyses may serve as an analytical screening tool. For instance, if the answer shows that the system is compliant, it would not be necessary to expend further computational effort to get a finer, more resolved estimate. A better answer wouldn’t tell us anything more useful. A screening tool can also be used to classify the various subquestions considered in a compliance assessment and distinguished them as clearly compliant or deserving further specific study to establish compliance.
This whittling approach is complementary to Steve’s approach because, like his, it is conservative and simple, but adds the idea that moments in addition to ranges may contain useful information and be easy to propagate. The “quadratic transformations” idea should be complementary to Jon’s work because it could be used in conjunction with his sampling when the number of dimensions becomes too large for direct sampling.
Quadratic methods. Explore the use of efficient low-order (particularly, quadratic) representations for black box codes and develop algorithms for projecting uncertain inputs through such models.
The number of uncertain parameters and the general complexity of a black box code can severely constrain our ability to use sampling to propagate uncertainty through it. A low-order model will probably be needed when there are limits on the number of samples that can be practically computed for black box codes. Consider the minimum number of samples needed to parameterize a quadratic model. In the following table, n denotes the number of uncertain input parameters and k the complexity of the uncertain number used to represent what is known about an input (perhaps in terms of the number of intervals in a Dempster-Shafer structure per input).
Linear n + 1 n k + n + 1
Quadratic n(n - 1)/2 + 2n + 1 n(n - 1)/2 + 2n + 1 + nk
Monotonic 2n k 2n
Well-behaved O(22n) k O(22n)
(via a search algorithm)
The minimum number of samples required to specify the model increases dramatically as the generality of the surface increases. Moreover, the number of samples is also a sharply increasing function of the dimension of the problem, that is, how many inputs there are to the model. It takes two points to specify a line, three a plane, and so forth. If n = 6, determining a linear model requires an absolute minimum 7 generic samples. Determining a quadratic surface in 6 variables requires 28 samples. Constrast this with the minimum number of samples that would be required to propagate uncertainty through a model which is a monotonic (n-increasing) surface. For n = 6, one would need 64 samples.
Although an assumption of monotonicity is in one sense much weaker than linearity, it is probably too strong for practical use. It is unlikely that many of the black box codes of interest will be monotonic. It therefore seems reasonable to seek an approach aimed at a compromise between the oversimplicity of a linear model and the unworkable complexity of a general model. Quadratic surfaces are the simplest structure that begin to show the complexities of maxima, minima and saddle points which arise from functional tradeoffs in a system response. The designation “well-behaved” in the table means a quadratic-like function for which there is a special point that divides the input space into 2n monotonic quadrants.
The second column of the table above contains crude order of magnitude estimates of the computational effort required to both specify the surface via sampling and propagate uncertainty through the resulting model. Note that there are certain economies embodied in the formulas in this column. As the number of dimensions n increases, an approach based on a quadratic model offers a remarkably moderate increase in the computational effort demanded, at least in comparison to more general surfaces. In particular, the additional effort for uncertainty projection is merely nk more than the effort for sampling, rather than k times the sampling effort. It’s additive rather than multiplicative.
Any methodology employing a fitted regression model will need to assume that sampling is nearly perfect in the sense that the black box function is deterministic and (relatively) error-free.
Transformations. Explore the use of transformations that diagonalize paired quadratic forms for reducing repeated variables in the quadratic model and simultaneously accounting for the variance-covariance matrix among uncertain inputs.
It is well known that a single transformation can simultaneously diagonalize two quadratic functions if at least one of them is positive definite. This is not just an existence proof; we know how the transformation must be constructed. This theorem can be used to solve the repeated variables problem in the quadratic model and simultaneously account for the variance-covariance matrix among uncertain inputs. A quadratic model will have coefficients for the terms x, y, ..., z, xy,..., xz, ..., x², y², ..., z². The transformation will create an equivalent model with only the x, y, ..., z, x², y², ..., z² terms (which will then be easy to factor into an expression in terms of only x, y, ..., z, by repeatedly completing the squares). The same transformation can also be constructed so that it simultaneously accounts for the covariance structure among the various inputs. (The variance-covariance matrix is the positive definite form.) Although this approach cannot handle intricate details of complicated dependencies among variables, it should in principle be able to capture the overall patterns such as are represented in correlation coefficients.
Compare methods. Quantitatively evaluate the performances of three approaches for propagating uncertain numbers through a black box code by applying them to the same numerical example. The numerical example will be the mass-spring-damper model, which can also be solved analytically. The methods are (i) an ad libitum sampling strategy based on Cauchy deviates, (ii) Jon and Cliff’s sampling strategy (if it’s different), (iii) whittling methods, and (iv) the quadratic transformation method.
1.Approximation methods. Continue the effort begun this year to collate, review, implement and quantitatively study the behavior of methods that can be used to compute a surely conservative projection of uncertainty through a black box. Because such strategies trade off optimality of the results for computational convenience in obtaining them, they could be useful in screening assessments for problems where sampling is severely limited. The idea is to make calculations in a way that is sure to be conservative about uncertainty (i.e., sure not to underestimate uncertainty) but which can be completed without exhausting computational effort. In last year’s research, three approximation methods were studied: the Cauchy deviate method, the Kolmogorov-Smirnov method, and the Saw et al. inequality method. In the Cauchy deviate method, complicated Dempster-Shafer structures representing inputs were replaced with coarse intervals that enclose the structure completely and an approximate sampling strategy based on Cauchy deviates is used to propagate the intervals through a black box function. The deviates straddle and extend beyond the ranges of the intervals, but corrections to the summary results allow good estimates of the output interval when the function is roughly linear or the uncertainties are relatively small. In the Kolmogorov-Smirnov method, a black box function is treated as an oracle that produces a sample output for any set of inputs. Because these outputs are independent (if the sets of inputs are) and identically distributed, Kolmogorov-Smirnov confidence limits for a distribution can be used to obtain bounds on the distribution that account for uncertainty arising from the small sample size. The Saw et al. inequality method appeals to a generalization of the Chebyshev inequality, which yields bounds on the tail risks of a quantity given the mean and variance of the quantity. The sampling of the black box function is thus reduced to a problem of estimating the mean and variance of the output. Follow-on research this year will explore other possible approximation strategies. The effort will include comparing the performance characteristics of the approximation strategies in realistic example problems and contrasting the strategies in terms of their usefulness in different situations.
2.Software for approximation methods. Develop software implementing the most promising approximation methods considered in the previous task. The programming will be done in C++ and will adhere to the specifications for add-ons to Sandia’s Dakota software. This task is not contingent on the effort on the previous task, although it would be enhanced by it.
Old tasks
Whittling. Explore the practical application as screening tools of simple methods that permit the conservative propagation of uncertainty through a black box.
The idea here is to solve the problem by whittling away at a crude, conservative expression of the uncertainty, rather than expending a lot of computational power to construct the correct or tightest possible answer. For instance, replacing each input Dempster-Shafer structure with a simple interval having the same support reduces the problem to a simple (multivariate) interval problem. The Cauchy deviate method may be used to solve this problem if the black-box function is fairly linear over the uncertainty range. The result produced by this method would be conservative, and one might typically expect it to be highly conservative. But even if it is, the result may still be useful in management decisions or in demonstrating compliance to some target goal. Note that this approach is conservative with respect to ignorance about the dependencies among the input parameters.
Another comparably simple approach is to treat the black box as though it were a natural system and treat the sample calculations as though they were measurements of some process or quantity in the real world. Of course, the black box is unlike the natural world in at least one critical way: measurement error and stochastic variability are essentially zero in the black box itself, although both are still present in our problem (they’re in the inputs). If the inputs used to generate the samples are random and the sampling satisfies representativeness assumptions, then it would be reasonable to use standard statistical approaches to characterize these samples to make inferences about the underlying population of potential samples that would come from uncertain inputs. For instance, conservative, distribution-free Kolmogorov-Smirnov confidence limits could be constructed about the probability distribution of sample values. Alternatively, the inequalities of Saw et al. (which generalize the classical Chebyshev inequalities) could be used to infer conservative, distribution-free limits on the tail risks based on the moment estimates observed from the samples.
Because each of these methods uses very little information, and because each is intentionally conservative, they will all produce very wide bounds about the output variable. However, it is not clear how the results will compare to one another. One might expect that the interval approach would always yield the broadest answer, but this may not be true because small sample sizes will strongly inflate the Chebyshev bounds. For a particular black box and sample size, will the interval-Cauchy method yield an interval that is almost totally inside the Chebyshev-Saw limits? Or will it be the other way around? Will the Kolmogorov-Smirnov limits be tighter or looser? It seems likely that which methods are tight and which are loose will depend partially on the underlying model implemented in the black box. Therefore it may be useful to pursue all of them at the same time. It is entirely possible that one could superimpose these methods to obtain an inference about the result that is more informative than any alone.
It will be important to flesh out the details of how one can compute these crude bounds. Weak distributional assumptions (such as unimodality) that might be justified by knowledge of the underlying mechanisms implemented in the black box could allow a further tightening of the bounds. There are many advantages to simple random sampling, but low-order moment estimates are often improved by Latin hypercube sampling. Are there general features of the black box function that would tend to favor one sampling strategy over another? Are there tradeoffs between the efficacies of the methods?
Because the crude bounds are so simple and so conservative, such an approach will clearly not produce the best possible estimates of how the black box behaves given the stated uncertainty in the inputs. But we don’t necessarily want the best possible estimates. We want estimates that are good enough for the particular uses we need them for. Such analyses may serve as an analytical screening tool. For instance, if the answer shows that the system is compliant, it would not be necessary to expend further computational effort to get a finer, more resolved estimate. A better answer wouldn’t tell us anything more useful. A screening tool can also be used to classify the various subquestions considered in a compliance assessment and distinguished them as clearly compliant or deserving further specific study to establish compliance.
This whittling approach is complementary to Steve’s approach because, like his, it is conservative and simple, but adds the idea that moments in addition to ranges may contain useful information and be easy to propagate. The “quadratic transformations” idea should be complementary to Jon’s work because it could be used in conjunction with his sampling when the number of dimensions becomes too large for direct sampling.
Quadratic methods. Explore the use of efficient low-order (particularly, quadratic) representations for black box codes and develop algorithms for projecting uncertain inputs through such models.
The number of uncertain parameters and the general complexity of a black box code can severely constrain our ability to use sampling to propagate uncertainty through it. A low-order model will probably be needed when there are limits on the number of samples that can be practically computed for black box codes. Consider the minimum number of samples needed to parameterize a quadratic model. In the following table, n denotes the number of uncertain input parameters and k the complexity of the uncertain number used to represent what is known about an input (perhaps in terms of the number of intervals in a Dempster-Shafer structure per input).
Linear n + 1 n k + n + 1
Quadratic n(n - 1)/2 + 2n + 1 n(n - 1)/2 + 2n + 1 + nk
Monotonic 2n k 2n
Well-behaved O(22n) k O(22n)
(via a search algorithm)
The minimum number of samples required to specify the model increases dramatically as the generality of the surface increases. Moreover, the number of samples is also a sharply increasing function of the dimension of the problem, that is, how many inputs there are to the model. It takes two points to specify a line, three a plane, and so forth. If n = 6, determining a linear model requires an absolute minimum 7 generic samples. Determining a quadratic surface in 6 variables requires 28 samples. Constrast this with the minimum number of samples that would be required to propagate uncertainty through a model which is a monotonic (n-increasing) surface. For n = 6, one would need 64 samples.
Although an assumption of monotonicity is in one sense much weaker than linearity, it is probably too strong for practical use. It is unlikely that many of the black box codes of interest will be monotonic. It therefore seems reasonable to seek an approach aimed at a compromise between the oversimplicity of a linear model and the unworkable complexity of a general model. Quadratic surfaces are the simplest structure that begin to show the complexities of maxima, minima and saddle points which arise from functional tradeoffs in a system response. The designation “well-behaved” in the table means a quadratic-like function for which there is a special point that divides the input space into 2n monotonic quadrants.
The second column of the table above contains crude order of magnitude estimates of the computational effort required to both specify the surface via sampling and propagate uncertainty through the resulting model. Note that there are certain economies embodied in the formulas in this column. As the number of dimensions n increases, an approach based on a quadratic model offers a remarkably moderate increase in the computational effort demanded, at least in comparison to more general surfaces. In particular, the additional effort for uncertainty projection is merely nk more than the effort for sampling, rather than k times the sampling effort. It’s additive rather than multiplicative.
Any methodology employing a fitted regression model will need to assume that sampling is nearly perfect in the sense that the black box function is deterministic and (relatively) error-free.
Transformations. Explore the use of transformations that diagonalize paired quadratic forms for reducing repeated variables in the quadratic model and simultaneously accounting for the variance-covariance matrix among uncertain inputs.
It is well known that a single transformation can simultaneously diagonalize two quadratic functions if at least one of them is positive definite. This is not just an existence proof; we know how the transformation must be constructed. This theorem can be used to solve the repeated variables problem in the quadratic model and simultaneously account for the variance-covariance matrix among uncertain inputs. A quadratic model will have coefficients for the terms x, y, ..., z, xy,..., xz, ..., x², y², ..., z². The transformation will create an equivalent model with only the x, y, ..., z, x², y², ..., z² terms (which will then be easy to factor into an expression in terms of only x, y, ..., z, by repeatedly completing the squares). The same transformation can also be constructed so that it simultaneously accounts for the covariance structure among the various inputs. (The variance-covariance matrix is the positive definite form.) Although this approach cannot handle intricate details of complicated dependencies among variables, it should in principle be able to capture the overall patterns such as are represented in correlation coefficients.
Compare methods. Quantitatively evaluate the performances of three approaches for propagating uncertain numbers through a black box code by applying them to the same numerical example. The numerical example will be the mass-spring-damper model, which can also be solved analytically. The methods are (i) an ad libitum sampling strategy based on Cauchy deviates, (ii) Jon and Cliff’s sampling strategy (if it’s different), (iii) whittling methods, and (iv) the quadratic transformation method.