Accounting for uncertainty about the mathematical form of the model
Model-form uncertainty represents lack of knowledge about the form or structure of the physical model underlying a phenomenon or quantity of analytical interest. In many quantitative calculations or assessments, model-form uncertainty is often the elephant in the room. Everyone understands it is present and probably very significant, but no one mentions it, probably because no one knows quite what to do about it. Below are some challenge problems involving the projection of uncertainty through a mathematical expression when that uncertainty includes doubt about the correct form of the mathematical expression itself. How can our computations account for uncertainty about what we should be computing?
These problems are very simple, even cartoon-like, in other respects. They are not intended to be realistic in terms of the complexity that real-world problems have. They have been formulated to identify and isolate the crucial issues specific to the concern of how model-form uncertainty should be handled in quantitative analyses. Answering these challenge problems is a necessary first step in developing a workable approach to model uncertainty. If we cannot agree about the correct answers to these simplified problems, then it is surely premature to be addressing more complex problems.
This on-line collaboration intends to be a virtual and continuing round-table discussion on the advantages and disadvantages of various approaches in the face of the kinds of complexities commonly manifesting model-form uncertainty, including which functions should be used, enumerable possible models, bounded models or models on a continuum, doubt about the appropriate level of abstraction, uncertainty about distribution shape, ignorance about intervariable dependence, and partial information in the form of assumptions about qualitative model features or relevant observational data and how it was sampled. The purpose of this exercise is to highlight the practical differences among the various approaches in difficult but common problems, including the computational (and conceptual) burdens on the analyst of each approach.
Everyone is invited to contribute to this discussion, and solutions to the challenge problems as well as suggestions about the challenge problems themselves are most welcome. Terse contributions will typically be appended as comments at the bottom of this page, but more elaborate contributions, including alternative solutions to the challenge problems, may merit their own affiliated page. Contributors retain copyright to their submissions. Contributions will be moderated for relevance.
Keywords: model uncertainty, model-form uncertainty, structural uncertainty, model inadequacy, model bias, risk analysis, uncertainty propagation, challenge problems
Challenge problems
Assuming that A and B are independent triangular random variables with A ~ triangular(−2.6, 0, 2.6) and B ~ triangular(2.4, 5, 7.6), what can be said about the distribution f(A,B) given that the function f is one of two possibilities. Either
f(A,B) = fplus(A,B) = A + B
or
f(A,B) = ftimes(A,B) = A × B
is the correct model, but the analyst does not know which. One and only one function is correct. What can be said about f(A,B)?
Assuming the same inputs and two possible models described in the first challenge problem above, suppose there is one sample value for f(A,B) = 7.59 that was randomly observed, and that the analyst asserts that fplus is twice as likely as ftimes. Given this extra information, what can be said about f(A,B)?
The true model is one of a family of models parameterized by the real quantity λ ∈ [0,1],
f(A,B) = (A + B) + (1 − λ)(A × B).
The analyst feels confident that λ has a fixed value between zero and one but is not sure what it is. Suppose A ~ triangular(−2.6, 0, 2.6) and B ~ triangular(2.4, 5, 7.6). What can be said about f(A,B)?
The correct operation is known to be f(A,B) = A + B, but the distributions for A and B are not precisely known. Suppose the inputs are independent random variables but their distributions are imprecisely specified distributions. In particular, suppose
A ~ triangular([−5.2, −2.6], 0, [2.6, 5.2]),
which means that A has a triangular distribution with mode at zero, but the minimum value might be any number between −5.2 and −2.6, and the maximum value is something between 2.6 and 5.2. Suppose that
B ~ minmaxmeanvar(0, 12, 5, 1),
that is, the marginal distribution of B is unknown except that B's smallest possible value is 0, its largest possible value is 12, and the mean and standard deviation of B are 5 and 1 respectively. What can be said about f(A,B)?
The correct model is known to be f(A,B) = A + B, but the dependence between A and B is not known. What can be said about f(A,B) given that the marginal distributions of the inputs are known triangular distributions, A ~ triangular(−2.6, 0, 2.6) and B ~ triangular(2.4, 5, 7.6)?
The function f is known to be nondecreasing, and that f(A,B) cannot be smaller than –10 or larger than +10. Suppose the probability that f(A,B) < 0 is between 0.25 and 0.5, and the function f is quadratic. Assuming again that A ~ triangular(−2.6, 0, 2.6) and B ~ triangular(2.4, 5, 7.6), what can be said about f(A,B)?
The function that combines the inputs is actually changing in time, depending on local weather conditions. Sometimes it is
f(A,B) = fplus(A,B) = A + B
and sometimes it is
f(A,B) = ftimes(A,B) = A × B.
Assuming that fplus is ten times more frequent ftimes, but other information about which function might be encountered next is not available, and that the inputs, A ~ triangular(−2.6, 0, 2.6) and B ~ triangular(2.4, 5, 7.6), are independent of the weather and of each other, what can be said about f(A,B)?
Get involved
Everyone is welcome to join this discussion. Email your contribution to Scott Ferson at ferson(at)liv(dot)ac(dot)uk with the subject line "challenge problems". It may be in the form of simple text, PDF file, slide show or other document. Your contribution will be added to this site as you direct, perhaps appended to the bottom of this page, or maybe made into a new affiliated solution page. Your email address will not be published unless you indicate you want it to be used to sign your contribution.
Suggestions of other challenge problems involving model-form uncertainty are welcome, as are comments about the problems detailed above. Particularly of interest are problems involving surrogacy (knowing X when it is actually Y that is the concern) and uncertainty about the sampling model governing how data were collected (rather than the distribution generating them) such as stopping rules or nonrepresentativeness.
Bayesian model averaging (Vose) [https://www.vosesoftware.com/riskwiki/BayesianModelAveraging]
https://www.sciencedirect.com/science/article/abs/pii/S0167473012000458
https://journals.sagepub.com/doi/full/10.1177/2515245919898657
This website originated as part of the outreach efforts of the DigiTwin project, which is a research consortium funded by UK Research and Innovation (UKRI) through the Engineering and Physical Sciences Research Council (EPSRC reference EP/R006768/1, principal investigator D. Wagg). The views and opinions expressed herein and in comments below are those of the individual contributors and commenters, and should not be considered those of any of the other authors or collaborators, nor of the institutions with which they may be affiliated, nor of UK Research and Innovation, or other sponsors or affiliates. Copyrights for the contributed material and commentary remain with their respective authors.