This website is a collaboration to record discussions and collect material for papers introducing confidence boxes for characterizing uncertainty in reliability analysis used to study reliability block diagrams and fault trees based on binary data and failure-time data. The latest draft is available as a PDF file, and also as the LaTeX source file. Both are archived on the Files page under the filename "Paper".
We illustrate the use of confidence structures (c-boxes) in reliability assessment of engineered systems as a means to express uncertainties arising from using empirical data to estimate component reliabilities. C-boxes account for sampling uncertainty in a way that is more comprehensive than conventional approaches, including the effects of using discrete event data to characterize their continuous probabilities and uncertainty about details of the sampling process. Using c-boxes allows analysts to achieve statistical performances for the analyses at any planned or unplanned level of confidence. They express both epistemic and aleatory uncertainty without conflating these forms of uncertainty. We exemplify the propagation of c-boxes through mathematical computations for various exemplar problems, including Vesely's simple pressurized tank system, a circuit bridge system whose Boolean expression has repeated uncertain variables that cannot be removed by algebraic rearrangement, and Jackson's noncoherent pump system. We show how reliability calculations can be made assuming independence of component probabilities, and how this assumption can be relaxed. We extend the definitions of some classical importance measures to incorporate c-boxes. The results allow analysts to obtain true (Neyman) confidence intervals for overall system reliability and various importance measures of individual components at any desired level of confidence.
"It is tempting to use...information on failures from devices that are operating under real life conditions. However, such data cannot be used directly for an analysis of reliability questions." (Gnedenko et al. 1969, page 144)
Although reliability is defined and affected by stochastic parameters, according to some acknowledged specialists, quality, reliability and safety are not achieved by mathematics and statistics. Nearly all teaching and literature on the subject emphasizes these aspects, and ignores the reality that the ranges of uncertainty involved largely invalidate quantitative methods for prediction and measurement.[2] (Wikipedia article on Reliability engineering)
Hailperin (1986) considered the application of interval analysis to the evaluation of Boolean expressions. Haldar and Mahadevan (2000) argue for a probabilistic approach to engineering design, and Limbourg (2008) argued for an imprecise probability approach to dependability analysis and other problems especially in early engineering design where uncertainty can be substantial. Murtha (2009, <<>>) applied Dempster-Shafer theory to fault tree analysis, and Jacob et al. (2012) applied belief theory to evaluating Boolean expressions.
We consider three example engineering systems:
tank: the pressurized tank system described by Vesely et al. (1981) whose Boolean expression has no repeated variables.
bridge: a circuit bridge system whose Boolean expression has repeated uncertain variables that cannot be removed by algebraic rearrangement, but which is coherent, that is, all its variables are unate and relevant, and
pump: the noncoherent system described by Jackson (1982; 1983) whose Boolean expression involves nonunate variables.
For each of these three systems we show how to compute the
c-boxes to be used as inputs from binary sample data or failure-time data,
system reliability computed from these c-boxes assuming component independence,
system reliability computed from these c-boxes relaxing the assumption about dependence,
confidence interpretation of the resulting characterization of system reliability,
Birnbaum (<<>>) importance measures for the inputs to the system reliability,
"more trials" sensitivity analyses to characterize the effect on uncertainty of increasing sample sizes, and
performance simulations that serve as verification checks and illustrate how conservative the calculations may be.
The computational effort required for conducting these uncertainty analyses are rather modest, even trivial for some small to moderately sized problems, compared to that needed for general problems in risk analysis. We show that the computational effort needed to make basic reliability calculations may require a mere doubling of the effort needed for a Monte Carlo simulation of the same, which we might symbolize as M. For example, computing the system reliability for either the tank or the bridge system requires 2M effort. Computing reliability without any dependence assumptions requires <<>> effort. To compute the system reliability of a noncoherent system, the computational effort grows to 2ⁿM times the effort of a Monte Carlo simulation, where n is the number of input variables. Computing the Birnbaum importance measures likewise requires 2ⁿM effort for each input variable for which importance is to be computed.
The classical notion of confidence is due to Neyman (1937).
A confidence interval for a parameter θ with coverage probability γ has the interpretation that, among all confidence intervals independently computed by the same method from different data sets, a proportion γ will contain the true value of θ.
A confidence interval can serve as an estimate of the parameter that is more comprehensive than any point estimate because it encodes not only the available data but also the sampling uncertainty they imply.
Valid confidence intervals are more than merely subjective characterizations of uncertainty; they represent rigorous claims about coverage probabilities and their use establishes a standard of statistical performance that in principle can be checked empirically.
They represent a guarantee that, when used repeated,
Credible intervals (sometimes called Bayesian confidence intervals) are often considered to be the Bayesian analogs of confidence intervals (Lee 1997), but credible intervals have no general accompanying guarantee like that of the frequentist notion of confidence intervals.
A confidence interval is an estimate that has the form of an interval rather than a point value (such as is given the maximum value estimator) or a probability distribution (such as a Bayesian posterior distribution).
A confidence distribution is a distributional estimate for a parameter, in contrast with a point estimate like a sample mean or an interval estimate such as a confidence interval.
Confidence distributions were introduced by Cox (1958), but received little attention until recently (Efron 1998; Schweder and Hjort 2002; Singh et al. 2005; Xie et al. 2011; Balch 2012; Xie and Singh 2013; Ferson et al. 2013; 2014).
A confidence distribution has the form of a probability distribution function on the space of possible parameter values
that depends on a statistical sample in a way that encodes confidence intervals at all possible confidence levels. A confidence distribution for a parameter θ∈Θ is a function C: Θ→(0,1) such that, for every α in (0,1), (−∞, C−1(α)] is an exact lower-sided 100α% confidence interval for θ, where the inverse function C−1(α) = Cn−1(x1, …, xn, α) is increasing in α. This definition implies [C−1(α), C−1(β)] is a 100(β−α)% confidence for the parameter θ. Although related to many other ideas in statistical inference (Singh et al. 2005; Xie et al. 2011), a confidence distribution can be considered a purely frequentist concept (Schweder and Hjort 2002; Singh et al. 2005). Although a confidence distribution has the form of a probability distribution, it is not a probability distribution. It corresponds to no randomly varying quantity; the parameter it describes is presumed to be fixed and nonrandom. The value of the function C is not probability of θ, but rather confidence about θ (Cox 2006; cf. Lindley 1958). A confidence distribution is merely a ciphering device that encodes confidence intervals for each possible confidence level.
Confidence distributions are not widely known in statistics, but Efron (1998) characterized bootstrap distributions as approximate confidence distributions, and so the essential ideas are familiar and widely used, albeit under the guise of bootstrap distributions. Efron (2013) suggested that, because they can be thought of as a way to ground in frequentist theory objective Bayesian analyses that use uninformative priors, confidence distributions may be useful in resolving the most important problem in statistical inference, which is how to use Bayes’ theorem without prior information. There are two significant limitations that might prevent such a resolution. The first is that confidence distributions do not exist for many basic and important inferential problems. Notably, in particular, there is no confidence distribution for the binomial probability. Likewise, it is not clear how they could work in a nonparametric setting. The second limitation is that, although they have the form of probability distributions, they cannot be propagated in calculations. Distributions derived from confidence distributions via the probability calculus are not in general confidence distributions themselves (Schweder and Hjort 2013; Cox 2006).
Balch (2012) introduced the notion of confidence structures, which we have taken to calling confidence boxes, or c-boxes for short as an imprecise generalization of confidence distributions that redress some of their limitations. They encode frequentist confidence intervals, at every confidence level, for parameters of interest. If a c-box for a parameter θ has the form of a p-box specified by its left and right bounding cumulative distribution functions B1 and B2, then every interval [B1-1(α), B2-1(β)] is a 100(β−α)% confidence interval whenever α<β. They are analogous to Bayesian posterior distributions in that they characterize the inferential uncertainty about distribution parameters estimated from sparse or imprecise sample data, but they have a purely frequentist interpretation that makes them useful in engineering because they offer a guarantee of statistical performance through repeated use. Unlike traditional confidence intervals which cannot usually be propagated through mathematical calculations, c-boxes can be used in calculations using the standard methods of probability bounds analysis and yield results that also admit the same confidence interpretation. This means that analysts using them can now literally compute with confidence.
Balch (2012) described various ways to derive c-boxes, and proved that independent c-boxes characterizing different parameters can be combined in mathematical expressions using the conventional technology of probability bounds analysis (Ferson et al. 2003) and random-set convolutions via Cartesian products (Yager 1986) and the results also have the confidence interepretation. Ferson et al. (2013) reviewed the properties of c-boxes, provided algorithms to compute c-boxes for some special cases and confirm their coverage properties, and compared the c-box for the binomial probability to the Imprecise Beta Model (Walley 1991; Walley et al. 1996).
Table 1 is a compendium of formulas for several important c-box cases. For each of these cases, the first line defines the sampling model, and specifies summary statistics if needed. The second line describes the associated p-box estimator for the distribution of next observable values. This p-box is an imprecise generalization of a frequentist prediction distribution, and it is analogous to a Bayesian’s posterior predictive distribution. If its left and right edges are B1 and B2, the interval [B1−1(α), B2−1(β)] is a prediction interval for Xn+1 enclosing a fraction β−α of the observable values on average. Subsequent lines give formulas for c-boxes for the parameters. In the table, the env function denotes the envelope operation which forms a p-box from two bounding distribution functions. Note that the parameters of some of the named distributions in the table may be given as intervals denoted in square brackets, which of course also induce p-boxes (Ferson et al. 2003). For the sake of notational simplicity, we have generalized the tilde beyond its conventional use in frequentist statistics. An expression of the form X ~ F is understood to mean that the uncertainty about the quantity X is characterized by F. This tilde can still be read as “has the distribution”, or maybe better as “has uncertainty like”, but it obviously does not suggest that the left-hand side is necessarily a random variable. When the left-hand side is a parameter, it is after all a value that is fixed albeit unknown.
Vesely et al. (1981) described...
The c-boxes summarizing these data are depicted in the graphs below.
Applying conventional probability bounds analysis to these c-boxes yields the result depicted in the graph below. This is the characterization, given the inferential uncertainty characterized by the input c-boxes, of the probability of the top event, the tank rupturing under pressurization. The mean of this probability is an uncertain quantity in the interval [0.0060, 0.0085]. The left bound of the mean is the mean of the left bounding distribution depicted in the figure, and the right bound of the mean is the mean of the right bounding distribution. Bounds on the variance and other descriptive statistics can also be easily computed, though they will typically not be related in this simple way to the two bounding distributions.
The calculations leading to the result above assumed that all the variables were independent of one another. In this case, the formulas for the probabilities of the logical conjunctions and disjunctions are familiar:
P(A & B) = 1 − (1 − P(A)) × (1 − P(B))
P(A or B) = P(A) × P(B)
These assumptions about dependence can be relaxed, and when they are these formulations will change accordingly. The assumptions can even be removed entirely so that the analysis and its results are not contingent on any assumptions about stochastic dependence between these random variables. This can be done via the use of the Fréchet inequalities (Fréchet <<>>; Hailperin 1986; Ferson et al. <<>>). Alternatively, an analyst might specify only the sign of the dependence, or perhaps a range of a correlation coefficient along the scale from −1 (opposite dependence) to +1 (perfect dependence, perhaps justified by available empirical data or physics-based reasoning about the system as situated in its physical environment. Formulas generalizing the classical computations for probabilities assuming independence for logical conjunctions and disjunctions are depicted below.
We could have relaxed the assumption about dependence among any of the components whose dependency relationship might be in doubt. For the sake of simplicity in this example, we omitted all assumptions about dependencies among all of the variables. The increase in the uncertainty about the risk of the top event occurring is represented by a fatter p-box. This is the uncertainty that comes from not asserting all the variables are independent of one another. Any increase in risk that comes from relaxing the dependence assumptions is represented by a rightward shift of the p-box characterizing the top event's probability. Despite making no assumptions whatever about dependencies, both the increase in risk and the increase in its uncertainty are rather modest in this example. The mean of the failure probability is <<>>. <<this picture is way too tight; the frechet box is much fatter>>
This (below) seems to be the correct Frechet.
These results of these calculations encode confidence intervals and upper and lower confidence bounds at all confidence levels for the risk of the top event, i.e., the tank rupturing under pressurization. Based on Monte Carlo simulations involving 10,000 replications, we can estimate the
Whereas two-sided confidence limits form a confidence interval, their one-sided counterparts are referred to as lower or upper confidence bounds .
These numerical results are based on a total of one halfmillion Monte Carlo replicates for each simulation.
Independent
No assumption
Two-sided
Confidence interval
One-sided
Upper confidence bound
0.016
0.022
[0.0012, 0.018]
[0.0012, 0.02485]
a circuit bridge system whose Boolean expression has repeated uncertain variables that cannot be removed by algebraic rearrangement, but which is coherent, that is, all its variables are unate and relevant, and
Jackson (1982; 1983) described a pump system that exhibits noncoherence in the sense that failure of some component can result in reducing the probability of failure of the top event, and thus improving the performance of the system.
Because the Boolean expression for such a systems involves nonunate variables, evaluating it under uncertainty cannot be accomplished by making two separate probabilistic calculations with the lower and upper estimates
the noncoherent system described by Jackson (1982; 1983) whose Boolean expression involves nonunate variables.
t = 500 hours
func pexp() return 1 - exp(-$2 * $1) // pexp(x, lambda) yield cumulative probability at x for exponential distribution with mean 1/lambda
func q() return pexp(t, $1+$2) / (1 + $2/$1) // q(lambda, mu) translates lambda and mu values ($1 and $2) to characterizations of the component unavailability
The figure below compares various approaches to calculating the reliability of the noncoherent system. For the sake of simplicity in this comparison, the component reliabilities are given as simple interval ranges, with uncertainties arbitrarily set as plus or minus 60%:
pump = Q(1e-3 per hour, 0.1 per hour) ± 60% = [ 0.00396, 0.0158 ]
sensor = Q(2e-3 per hour, 5e-2 per hour) ± 60% = [ 0.0154, 0.0615 ]
controller = Q(3e-3 per hour, 1/60 hours) ± 60% = [ 0.0610, 0.244 ]
The black interval [0.000607, 0.0645] in the figure was computed using naive interval arithmetic (Moore <<>>), which is very inexpensive to compute and guaranteed to rigorously enclose the true range. However, it yields an overly conservative result which is unnecessarily wide because of the repetitions of uncertain quantities in the evaluated expression. The red interval [0.0151, 0.0556] was computed by Monte Carlo simulation by (uniformly) sampling 1000 possible values for the component probabilities from their respective intervals and finding the range of resulting system probability values. This represents inner bounds on the true range, which must be at least as wide as the red interval. Here and in practice generally, such a Monte Carlo simulation will underestimate the true range. The underestimation becomes worse and worse as the number of inputs grows. The blue interval [0.0120, 0.0590] was computed using subinterval reconstitution (<<>>) which is a special brute-force strategy to rigorously compute interval ranges. In this example, each interval was decomposed into twenty subintervals, and every combination of one subinterval from each of the three inputs was evaluated using (naive) interval analysis. The blue interval is the convex hull of all 200^3 = 8 million resulting intervals. Like the result of the naive interval analysis, the results of the subinterval reconstitution are guaranteed to enclose the true range of possible values, but they are often much tighter than the result given by the naive approach. However, they can be much more expensive to compute. Finally, the green interval [0.0125, 0.0588] represents the exact calculation based on evaluating all the corners of the box formed by the three input intervals. In this case, evaluating the reliability function only at these corners is sufficient to find rigorous bounds on the true range of probabilities given the stated uncertainties about the component probabilities. We know this because <<boolean expressions, even if not unate>>. In this example, the green bounds are squeezed almost up to the blue bounds, which are guaranteed to be outer bounds, but which are substantially more expensive to compute. We also know that the green interval is best-possible, meaning the true interval could not be any tighter than it is, because each of the corners is, after all, a set of possible values for the three inputs given the stated uncertainties.
Reliability calculations are very convenient with c-boxes for unate expressions which describe all coherent systems, and which represent most problems in reliability analysis.
Repeated variables, which can be extremely cumbersome in analyses of general risk-analytic expressions, are no problem in evaluating the risks when the system is coherent. Monte Carlo simulation with memoization to account for perfect dependence among different instantiations overcomes the difficulties arising from repeated variables. The calculation effort is essentially only doubled, no matter how many input variables there may be.
Even non-coherent systems can be analyzed, although the calculations will grow in complexity with the number of uncertain input variables.
Balch, M.S. (2012). Mathematical foundations for a theory of confidence structures. International Journal of Approximate Reasoning 53: 1003–1019.
Clopper, C. and E.S. Pearson (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26: 404-413.
Cox, D.R. (1958). Some problems with statistical inference. The Annals of Mathematical Statistics 29: 357-372.
Cox, D.R. (2006). Principles of Statistical Inference. Cambridge University Press.
Dempster, A.P. (1966). New methods for reasoning towards posterior distributions based on sample data. The Annals of Mathematical Statistics 37: 355–374. http://www.stat.purdue.edu/~chuanhai/projects/DS/docs/66Annals.pdf
Dempster, A.P. (1967). Upper and lower probabilities induced by a multivalued mapping. The Annals of Mathematical Statistics 38: 325–339.
Efron, B. (1998). R.A. Fisher in the 21st century. Statistical Science 13: 95–122.
Ferson, S., V. Kreinovich, L. Ginzburg, K. Sentz and D.S. Myers (2003). Constructing Probability Boxes and Dempster-Shafer Structures. SAND2002-4015, Sandia National Laboratories, Albuquerque, New Mexico. http://www.ramas.com/unabridged.zip
Ferson, S., R. Nelsen, J. Hajagos, D. Berleant, J. Zhang, W.T. Tucker, L. Ginzburg and W.L. Oberkampf (2004). Dependence in Probabilistic Modeling, Dempster-Shafer Theory, and Probability Bounds Analysis. Sandia National Laboratories, SAND2004-3072, Albuquerque, NM. www.ramas.com/depend.pdf
Ferson, S., M. Balch, K. Sentz, and J. Siegrist (2013). Computing with confidence. Proceedings of the 8th International Symposium on Imprecise Probability: Theories and Applications, F. Cozman et al. (eds.), SIPTA., T. Denœux, S. Destercke and T. Seidenfeld. SIPTA, Compiègne, France. https://sites.google.com/site/confidenceboxes/isipta
Fisher, R.A. (1930). Inverse probability. Proceedings of the Cambridge Philosophical Society 26: 528–535.
Fisher, R.A. (1935). The fiducial argument in statistical inference. Annals of Eugenics B: 391–398.
Gnedenko, B.V., Yu.K. Belyayev and A.D. Solovyev (1969). Mathematical Methods of Reliability Theory. Academic Press, New York.
Hailperin, T. (1986). Boole’s Logic and Probability. North-Holland, Amsterdam.
Haldar, A., and S. Mahadevan (2000). Probabability, Reliability, and Statistical Methods in Engineering Design. John Wiley & Sons.
Jackson, P.S. (1982). Comment on "Probabilistic evaluation of prime implicants and top-events for non-coherent systems". IEEE Transactions on Reliability R-31: 172-173.
Jackson, P.S. (1983). On the s-importance of elements and prime implicants of non-coherent systems. IEEE Transactions on Reliability R-32: 21-25.
Jacob, C., D. Dubois, and J. Cardoso (2012). Evaluating the uncertainty of a Boolean formula with belief functions. Pages 521-531 in Advances in Computational Intelligence. Communications in Computer and Information Science, volume 299. 14th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2012, Catania, Italy, July 9-13, 2012, Proceedings, Part III. Springer-Verlag, Berlin.
Lee, P.M. (2004). Bayesian Statistics: An Introduction. Wiley.
Limbourg, P. (2008). Dependability Modelling under Uncertainty: An Imprecise Probabilistic Approach. Studies in Computational Intelligence, volume 148. Springer-Verlag, Berlin.
Mayo, D. (1996). Error and the Growth of Experimental Knowledge. Chicago University Press.
Moore, R.E. (1966). Interval Analysis. Prentice-Hall.
Murtha, J.F. (<<>>). Evidence theory and fault tree analysis to cost-effectively improve reliability in small UAV design. <<>>. http://vsgc.odu.edu/src/Conf09/Grad%20Papers/Murtha,%20Paper_VSGC.pdf
Murtha, J.F. (2009). Evidence Theoretic Approach to Design of Reliable Low-Cost UAVs. MS thesis, Virginia Polytechnic Institute and State University, Blacksburg, Virginia. NTIS accession number ADA522686. http://scholar.lib.vt.edu/theses/available/etd-06262009-141905/unrestricted/JustinMurthaThesisFinal.pdf
Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society A237: 333–380.R Core Team (2013). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/
Robert, C.P. (2012). Comments on “Confidencedistribution, the frequentist distribution estimator of a parameter—a review” by Min-ge Xie and Kesar Singh. International Statistical Review [in press] http://arxiv.org/pdf/1206.1708.pdf.
Schweder, T., and N.L. Hjort (2002). Confidence and likelihood. Scandinavian Journal of Statistics 29: 309–332.
Singh, K., M. Xie and W.E. Strawderman (2005). Combining information from independent sources through confidence distributions. The Annals of Statistics 33: 159–183.
Student [W.S. Gosset] (1908). The probable error of a mean. Biometrika 6: 1–25. http://www.york.ac.uk/depts/maths/histstat/student.pdf
Vick, S.G. (2002). Degrees of Belief: Subjective Probability and Engineering Judgment. ASCE Press, Reston, Virginia.
Vesely, W.E., F.F. Goldberg, N.H. Roberts, D.F. Haasl (1981). Fault Tree Handbook. Nuclear Regulatory Commission, Washington, DC.
Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. Chapman and Hall.
Walley, P. (1996). Inferences from multinomial data: learning about a bag of marbles. Journal of the Royal Statistical Society, Series B 58: 3–57.
Walley, P., L. Gurrin and P. Barton (1996). Analysis of clinical data using imprecise prior probabilities. The Statistician 45: 457–485.
Winkler, R.L., J.E. Smith and D.G. Fryback (2002). The role of informative priors in zero-numerator problems: being conservative versus being candid. The American Statistician 56: 1–4. See also Comments by Browne and Eddings and Reply. The American Statistician 56: 252–253.
Xie, M., and K. Singh (2012). Confidence distribution, the frequentist distribution estimator of a parameter—a review. International Statistical Review [in press].
Xie, M., K. Singh and W.E. Strawderman (2011). Confidence distributions and a unifying framework for meta-analysis. Journal of the American Statistical Association 106(493): 320–333.