Self-sums and relaxing the IID assumptions

Finding the distribution of a sum of random variables that are each characterized by the same distribution function is an important classical problem in probability theory. A generalization of this problem is finding bounds on the distribution of sums of random variables each of which is consistent with a given p-box. When the p-boxes characterizing the addends are all the same, we call the sum a “self-sum”. (Is there a better term?)

We can bound self-sum distributions in Risk Calc very easily. For example, in the script below, the variable aa is the sum p-box (shown in red) of 4, 44 and 444 independent copies of a Bernoulli distribution.

a = s = bernoulli(0.2)

clear; show aa in red; show s

aa = 0; for i = 1 to 4 do aa = aa |+| a; s = binomial(4, 0.2)

aa = 0; for i = 1 to 44 do aa = aa |+| a; s = binomial(44, 0.2)

aa = 0; for i = 1 to 444 do aa = aa |+| a; s = binomial(444, 0.2)

The true distribution, which is a binomial distribution s, in each case is shown in black. The graphs reveal that the red bounds are grossly inflated over what they need to be. This inflation arises from discretization error which accumulates and becomes a dominating influence. In principle, we could increase the number of discretization levels to reduce this inflation. In R for instance, we could set pbox.steps to 2000 which would make the red and black bounds indistinguishable for the 4-fold self-sum. However, this strategy requires a huge increase in computation time, and doesn’t really solve the

problem. The discrepancies are still noticeable for the 44-fold sum, and substantial for the 444-fold sum. The following R script shows this, although I do not recommend running it, just because it takes forever to execute.

pbox.steps=2000

pbox.plotting = pbox.plotting.every = FALSE

a = bernoulli(0.2)

aa = 0

for (i in 1:4) aa = aa %|+|% a; s = binomial(4, 0.2); plot(s,col='black'); lines(aa,col='red')

for (i in 1:40) aa = aa %|+|% a; s = binomial(44, 0.2); plot(s,col='black'); lines(aa,col='red')

for (i in 1:400) aa = aa %|+|% a; s = binomial(444, 0.2); plot(s,col='black'); lines(aa,col='red')

Thus, although these bounds on the self-sums are very easy to compute, they are just not very good. We want to derive algorithms that sidestep the accumulation of discretization error and yet remain fully rigorous bounds on the self-sums.

When the p-box is accompanied by a distributional assumption such as normality, binomiality, etc., there are well known theorems that can be used to bound self-sums assuming independence of the addends and stationarity of the distribution from which they’re drawn, including (see https://en.wikipedia.org/wiki/List_of_convolutions_of_probability_distributions)

S^k bernoulli(m) = binomial(k, m),

S^k binomial() = binomial(k m/(1-s²/m), 1-s²/m)),

S^k binomial(ni, m) = binomial(∑ ni, m),

S^k geometric = negativebinomial(k, 1/(1+m))),

S^k poisson = poisson(k m),

S^k normal = normal(k m, sÖk)),

S^k gamma = gamma(s²/m, k(m/s)²), and

S^k exponential = gamma(m, k),

S^k negativebinomial = negativebinomial(k m/(s²/m-1), m/(s²)))

S^k chisquared = chisquared(k m)

S^k cauchy = k cauchy

S^k uniform = SumIID.uniform(k, m-s Ö3 , m+s Ö3)

where m and s are the mean and standard deviation of the addend distribution. (We’ve expressed the formulas in terms of means and standard deviations because these two moments are stored in the Risk Calc p-box data structure.) For p-boxes, the formulas above typically have parameters that are intervals rather than point values. Can the list of formulas above be usefully expanded? Ramsay (2008; 2009) describes an algorithm of sums of Pareto deviates.

We also want to relax the assumptions used in the formulas, in case analysts do not find them to be tenable. Bounds on the self-sum distributions will be markedly different under different sets of assumptions. For instance, the answer will depend on whether we assume the addends are independent, or perfectly dependent (comonotonic) with each other. Positively dependent or negatively associated quantities will likewise lead to different results. We might also be interested in bounds on the self-sum that are possible without making any assumption at all about the dependence among the addends.

The answer will also depend on assumptions about stationarity, i.e., whether the addends are identically distributed or not. Note that the assumption of stationarity is distinct from the precision of the characterization of the distribution function. For example, we may only have a p-box to constrain the distribution of each addend, which is to say that we might not know its exact distribution function. Despite not knowing the distribution precisely, it might still be reasonable to assume that that distribution, whatever it is, is not changing. Of course, in some cases we might not be able to assume stationarity. In such a case we are saying that the distribution of every addend is consistent with a particular p-box, but that they might have different distributions within that p-box. In this case, we might expect that the bounds on the sum distribution might be wider than those inferred under a stationarity assumption. The formulas above do not readily generalize for non-independence but some can generalize for nonstationarity. In particular, in some cases, the bounds assuming nonstationary and stationary have the same envelope so long as we retain the independence assumption.

We want computationally practical formulas for bounds on the distribution of the sum of random variables X_i each distributed according to some distribution constrained by the same p-box (bounds on the cumulative distribution function), but about which we cannot necessarily assume independence or equivalence of the distribution functions. We’d like formulas or algorithms for each of the following ten cases:

There are two separate cases that we want to be able to solve. The first case is when we know the shape family of distributions within the p-box, and the second case is when we do not know the shape family (or when knowing it doesn’t help because no

For example, in the case of summing addends whose distributions are consistent with the parametric p-box B = bernoulli([p₁, p₂]), we have the following rules:

Percus and Percus (1985), as well as Kolmogorov (1956), considered the problem of find distributions of sums without the stationarity assumption.

The Sum function assumes neither stationarity nor anything about dependence, but it makes the calculation of bounds the hard way, as the explicit sum, defined in R and in Risk Calc as:

Sum <- function(k, d) { s <- d; for (kk in 2:left(k)) s <- s %+% d; s }

function Sum() begin s = $2; for k = 2 to $1 do s = s + $2; return s; end

We expect that this function will be susceptible to the accumulation of discretization error and therefore it will often yield bounds that are unnecessarily inflated.

<<>>

<<new problem: theory>>Let’s specify some notation. If B = denotes a p-box, then we can let the notation F Î B mean that the distribution function F is consistent with the p-box B, which implies that B(x) £F(x) £

(x), for all x Î R. A tilde is traditionally used to indicate that a random variable has the same distribution as another random variable or has a particular distribution function. We can generalize this usage and understand the notation X ~ B, where B is a p-box, to mean the random variable X is distributed according to some distribution function that is consistent with the p-box B, i.e., X ~ B is short for X ~ F Î B. If all the random variables X_i are identically distributed, we can write X_i ~ F Î B. When they all have the same distribution, we sometimes call the process that generates them stationary. If the process is nonstationary so that the variables are not identically distributed, we can write X_i ~ F_i Î B to indicate that each random variable has its own distribution. Finally, X ^ Y means random variables X and Y are stochastically independent. X // Y means X and Y are perfectly dependent or comonotonic, i.e., F^-1(X) = G^-1(Y) where F and G are distribution functions and X ~ F and Y ~ G. If we do not know what dependence may exist between the random variables, we can write X ? Y.

The sum of k random variables X_i ~ B which are mutually perfectly dependent is consistent with kB. This allows us to compute best-possible bounds on the sum of perfect deviates from a p-box, with or without the stationarity assumption X₁ ~ X₂ ~ ... ~ X_k. Adding the stationarity assumption does not allow us to tighten these bounds. Like Sylvester, I am as sure of these assertions as I am of anything in mathematics, but I haven’t proved them yet. We want to prove the following statements:

- If X ~ B and Y ~ B, and X // Y, then X+Y ~ 2B and this p-box 2B is best-possible, in the sense that it cannot be made any smaller given only the assumptions.

- If X_i ~ B and X_i // X_j for any i,j Î {1, 2, …, k}, then SX_i ~ kB. The p-box kB is best-possible.

In other cases, of course, the stationarity assumption will have a substantial impact on the bounds of self-sums.

<<Williamson not quite right>> Williamson (1991) showed what happens with the envelope of the p-box alone, but a p-box has more information that merely the shell. Knowing even just the range and first two moments (mean and variance) of the original distribution tells us a lot about the distribution of the k-fold sum of IID random variates. Indeed, if the original distribution is precisely specified, we know infinitely many moments.

Cite Higdon et al. (mentioned in Kari’s manuscript) and McDiarmid’s inequality (which does not assume stationarity).

A theorem due to Cramer says the normality of a sum of independent random variables implying the addends are normal. I wonder if there is an analog of this theorem for perfect dependence.

<<new problem>>k-fold convolution of discrete CDFs

If there is no ready information about a p-box’s distributional shape, that is, if we cannot identify a p-box as normal or binomial or any particular distribution family, how should we estimate bounds on its self-sum? In our technology, we represent p-boxes using discrete cumulative distribution functions. Each CDF bound is a step function that jumps 1/n at each of n points along the real line, x_i, i=1,...,n. Can we compute, from these left and right bounds of a p-box, useful bounds on the distribution of self-sums?

The distribution of the sum of two independent random variables drawn from any distribution is the convolution of the distribution with itself. A k-fold convolution gives the distribution of the k-fold sum of independent random variables. A k-fold convolution of a distributions corresponds to raising its Laplace transform to the kth power. Is it possible to use use Laplace transforms of the discrete bounds of a p-box to compute the bounds on the distribution of sums?

It seems this problem can be reduced to the following: Given an ordered array of real values x_i, i=1,...,n, that define a discrete CDF,

1) Find the Laplace transform of that step function distribution,

2) Compute the kth power of this Laplace transform,

3) Back-transform the result back into a distribution function, and

4) Discretize the distribution into an ordered array y_i, i=1,...,n, of (1/n)-quantiles.

The final discretization should be outward-directed in the standard p-boxy way, and we know how to do this. Abate and Whitt (1995) give nice algorithms to numerically invert Laplace transforms, so maybe we know how to do both steps 3 and 4. But what is the Laplace transform of the step function representing the CDF bound? I believe Springer (1979) addresses how transforms can be used to compute convolutions. I’m not even sure whether we’re supposed to take the Laplace transform of the PDF or the CDF for this to work. Are we necessarily in the complex domain? Maybe it was a fantasy, but I used to think that Laplace transforms could be in the reals too. If they are complex, then I suppose step 2 represents the problem of raising a complex number to the kth power. Whether complex or not, presumably the hardest part of the problem would be step 2. The Abate and Whitt algorithms expect functions with complex arguments.

<<new problem>>I'm not at all sure that this problem has an easy or even tractable solution, but I'm hopeful that it does. If it does not, I'd be almost as happy with some way to bound the distribution of sums. There have been many attempts by mathematical probabilists over the years to describe such bounds for sums, including

Azuma–Hoeffding inequality http://en.wikipedia.org/wiki/Azuma%27s_inequality,

McDiarmid's inequality http://en.wikipedia.org/wiki/McDiarmid%27s_inequality,

Bernstein inequalities http://en.wikipedia.org/wiki/Bernstein_inequalities_(probability_theory),

Chernoff bound http://en.wikipedia.org/wiki/Chernoff_bound, and

Hoeffding's inequality http://en.wikipedia.org/wiki/Hoeffding%27s_inequality.

In principle, we can intersect these bounds so long as they are each rigorous. Of these inequalities, we’ve only implemented the last so far. Here is the function for R:

hoeffding <- function(n, min, max, mean) pbox(imp(env(minmaxmean(n*min, n*max, n*left(mean)),minmaxmean(n*min, n*max, n*right(mean))), pbox(u=n*left(mean)-sqrt(-n*((max-min)^2)*log(iii())/2), d=n*right(mean)+sqrt(-n*((max-min)^2)*log(1-jjj())/2))), shape='Hoeffding', name=name, ml=left(n*mean), mh=right(n*mean))

For instance, if B is the p-box, then the expression hoeffding(k, left(B), right(B), mean(B)) returns a p-box constraining the k-fold self-sum from B. It can be intersected with any putative sum p-box. If we cannot obtain exact bounds, we want to improve our bounds on self-sum distributions however possible, including robust implementation of the above inequalities.

What can be done via brute-force methods? The following code looks at the extreme edges of the convolution for IID sums.

# Better bounds on the tails of self-sum distributions

# under IID, i.e., independence and stationarity assumptions.

# The extreme tails can be bounded by the lefttail and

# righttail functions below which explore the extreme

# results of the convolution.

# The perfect sum distribution u*k, i.e., the sum under

# an assumption of perfect dependence, is an upper bound

# on the left tail and a lower bound on the right tail.

# We need to integrate all these functions, plus the

# other inequalities to obtain cheap, practical bounds.

edf = function(x,col='tan',lwd=3,...) {

# the empirical distribution function (Sn)

n <- length(x)

s <- sort(x)

lines(c(s[[1]],s[[1]]),c(0,1/n),lwd=lwd,col=col,...); for (i in 2:n) lines(c(s[[i-1]],s[[i]],s[[i]]),c(i-1,i-1,i)/n,lwd=lwd,col=col,...)

}

steps = 50

k = 6

u = sort(runif(steps))

plot(NULL,xlim=c(0,k),ylim=c(0,1),xlab=paste(k,'-fold sum',sep=''),ylab='Cumulative probability')

edf(u,col='gray')

lefttail = function(u,k,op='+') {

steps = length(u)

long = min(100000,steps^k)

big = u

for (i in 1:(k-1)) big = sort(as.vector(outer(u, big, op)))[1:long]

lines(big,(1:length(big))/(steps^k),type='s',col='red')

lines(big,(1:length(big))/(steps^k),type='S',col='red')

}

righttail = function(d,k,op='+') {

steps = length(d)

long = min(100000,steps^k)

big = d

for (i in 1:(k-1)) big = sort(as.vector(outer(d, big, op)),TRUE)[1:long]

big = rev(big)

#cat(length(big), length((steps^k-length(big)):(steps^k)), (steps^k-length(big)), (steps^k),'\n')

#lines(big,((steps^k-length(big)+1):(steps^k))/(steps^k),type='s')

lines(big,(1:long+steps^k-long)/(steps^k),type='S')

}

lefttail(u,k)

righttail(u,k)

if (steps^k < 1e6) {

s = u

for (i in 1:(k-1)) s = as.vector(outer(u, s, '+'))

edf(s)

lefttail(u,k)

righttail(u,k)

} else {

s = rep(0,steps)

for (i in 1:k) s = s + u[order(runif(length(u)))]

edf(s,col='gray95')

s = rep(0,steps)

for (i in 1:k) s = s + u

edf(s,col='pink')

}

In addition to numerical methods for establishing the bounds, we also need ancillary theoretical rules to specify the distribution shapes within p-boxes when this is appropriate. For instance, we know that summing independent deviates from a stationary normal or gamma distribution preserves the distribution family. That is, sums of independent normals are also normal, and sums of independent gamma deviates also have a gamma distribution. This allows us to conclude from the rule S^k gamma = gamma(s²/m, k(m/s)²), not only the bounds on the sum distribution but also the fact that that distribution has a gamma shape (even when we might not be perfectly sure of its parameters). Are there other distribution families for which this is true? Surely, all the so-called stable distributions (http://en.wikipedia.org/wiki/Stable_distribution).

Summing independent deviates from some distributions preserves the distribution family even without the stationarity assumption. For instance, if independent deviates are all from Poisson distributions, the sum is Poisson too, even when the addends are from different distributions. This allows us to extend the rule S^k poisson = poisson(k m) to compute both the bounds on the sum distribution and its family, whether or not the stationarity assumption is made. For which other distribution families is something like this is true?

Other relaxations are also known. For instance, summing perfectly dependent (comonotonic) deviates from some distributions likewise preserves the distribution family. For some distribution families, this is true even without a stationarity assumption. For what distributions is this true? The following is our synopsis of the relevant distribution preservations. Are these entries correct? Should there be others?

Page updated

Google Sites

Report abuse