rvs

 PROBABILITY, RANDOM VARIABLES, AND DISTRIBUTIONS

 During this laboratory, students will be taught the properties and uses of several continuous and discrete statistical distributions that are commonly used in ecological models. The students will learn how to generate random data from each distribution using R software and how to develop simple simulation models by passing randomly generated values from one distribution to another. This will provide students with the understanding of probability distributions that will be needed to quantify uncertainty and comprehend the basics of Bayesian probability and Monte Carlo simulation. Laboratory exercises will evaluate the ability of students to build simple simulation models that use randomly generated data to approximate ecological processes, such as survival or the occurrence of a disturbance.

Overview of probability and random variables

We introduced random (stochastic) variables and statistical distributions last week. Here we will somewhat more formally define these ideas, and get into the details of some important statistical distributions.

Probability (P) can be thought of as a measure of the uncertainty in a random outcome. If we say that event X occurs with P=1 then we are certain about X; if we say P=0 then we are certain that X does not occur; and if we say P=0.5 we are equally uncertain about whether X occurs or does not. The value or outcome “X” referred to is a random variable, as distinguished from a deterministic variable whose values may vary, but do so in a predictable or deterministic manner. 

A probability distribution or statistical distribution (or distribution for short) is a model that describes the relationship between values of a random variable and the probabilities of assuming these values. The 2 basic types of distributions are discrete and continuous. Discrete distributions model outcomes that occur in discrete classes or integer values; examples include the Bernoulli, Binomial, and Poisson, all discussed in more detail below. Continuous distributions model outcomes that take on continuous (generally, real) values and include the Uniform, Normal, Beta, and Gamma distributions, also discussed in more detail below.

The probability density function for a distribution describes the probability that the random variables take on particular values (for a discrete distribution) or are in the neighborhood of a value (for a continuous distribution). The density is often written as f(x). For example, for a discrete random variable (e..g., from a Poisson) distribution we may write

f(4)=0.45

indicating that the value of 4 is taken with probability 0.45. For a discrete distribution f(x) will sum to 1 over the support (region of x where f(x)>0) of the distribution. For example, if we have a binomial distribution with parameters n=5 and p=0.2, the distribution has support for x =0,1,2,3,4,5 with f(0)+f(1)+f(2)+f(3)+f(4)+f(5)= 0.32768+0.40960+0.20480+0.05120+0.00640+ 0.00032 =1. The density for continuous distributions follows a similar idea but because the support is continuous (and this uncountable) f(x) is not directly interpretable as a point probability. However, by analogy to the discrete distribution the f(x) integrates to one over the support of f(x).

The probability distribution function (or cumulative distribution function) represents the probability that a the random variable x is less than or equal to a particular value

F(x) = Prob(X≤x).

For discrete distributions, F(x) is readily obtained by summation. E.g., for the binomial example

F(3) = f(0)+f(1)+f(2)+f(3)= 0.32768+0.40960+0.20480+0.05120= 0.99328.

Calculation of F(x) for continuous distributions is trickier and requires integration. By definition

 

where the lower limit may sometimes be higher (e.g., if the support starts at zero). Usually these computations are done by computer functions (or looked up in standard tables). Once F(x) is available (for either discrete or continuous distributions) we can easily ask questions like “what is the probability that x is between a and b?” since

 

 

So for example if we have a Normal with mean 0 and standard deviation 1

F(2)=0.8=97725, F(1)=0.8413447 and

Prob(1<X<2)= F(2)-F(3)=0.1359

We can reverse the idea of distributions and, for a given probability level of a distribution function obtain the value of x or quantile associated with that value. The quantiles are essentially found by inverting the distribution function and solving for x, though for discrete distribution they can easily be gotten by examination and interpolating between values. In practice we will get the quantiles of standard distributions using built-in functions in R. To take the example of the normal distribution (mean=0, sd=1), the quantiles associated with F(x) =0.01, 0.5, and 0.99 are -2.33,0.00, and 2.33, respectively.

You will often encounter the term moments, which refer to a number of important functions of distributions. The 2 most important moments are the mean and the variance. The mean and variance are formally defined in terms of the density functions, with the mean

 

 

for discrete distributions

and

 

 

.

for continuous distributions, where in both cases summation or integration is over the support of x.  The variance of x follows from the definition of expectation and the relationship

 

 

We generally estimate these population moments by their sample equivalents, the sample mean and variance.

,

 

where n is the size of a random sample.

The Normal Distribution is an example where the distributions parameters-the constants that determine the behavior of the distribution and what it will predict about the data—are familiar: the parameters of the Normal are just the mean (

) and the variance ( ). We will introduce parameters for other distributions when we consider the distributions in detail, below.

Random number generation

The idea of random number generation is to produce a value (or a list/sample of values) of a random variable x, given some assumptions about the distribution of x and its parameters. For example, we wish to obtain a simulated sample of 100 values that come from a normal distribution with mean 5 and standard deviation 10. Depending on whether the random variable is discrete or continuous and the complexity of the distribution function, there are a variety of procedures to generate random variables. Most of the common ones rely on being able to find the inverse distribution function, that is the function that, given a value for U, the cumulating probability of x, returns the value for x. The idea is then to generate a uniform random variable between 0 and 1 (the range for a probability) and then solve to get x. Thus, many random number generators start from the capacity to generate a uniform random number, which can then be used to create random variables from other distributions. In practice, R goes through these steps for you, but we will illustrate them for a few simple examples so that you can see that there is often more than one way to obtain a simulated random variable.

Technically, we are not generating true random numbers with these procedures, but rather computer generated sequences of numbers that behave like random numbers, known as pseudorandom numbers. The exact means by which pseudorandom numbers are generated is an advanced topic beyond the scope of this course, and has been the subject of intensive development and refinement over the years. Suffice it to say that some pseudorandom number generators perform better (i.e., act like the “real deal”) than others, so it is important to be sure that you are using a generator that has been thoroughly tested. Fortunately for us, the developers of R and the R user community have thoroughly vetted the pseudorandom number generators in R, so you can be confident when you use these procedures that the results will be essentially “random”.

Probability distributions in R

R provides a very convenient way to calculate and plot many common statistical distributions and related functions and generate random variables, so we will perform most of these tasks using built-in R functions. In a few cases we’ll be able to see how to build functions “from scratch” or nearly so, which may help you to generalize these principles.

R-code for all the examples is accumulated and saved in a R-script file.

Uniform Distribution

Density, distribution, and quantiles

Perhaps the simplest distribution is the continuous Uniform (or Rectangular) Distribution, which assumes that values of x over the support of f(x) occur with equal probability. The parameters of the Uniform are simply the lower and upper bounds for x, so that x is equally likely to be anywhere inside the interval , but cannot occur outside the interval (i.e. the support is totally in the interval). Formally the density for x is then

  ,                 

 

0                                 ,

 

  

The distribution function is simply

,

 

The mean of the uniform is obtained from

 

and is just the midpoint between the minimum (a) and maximum (b), the 2 parameters of the distribution, while the variance is

The density is easily implemented in R by the command

>dunif(x,a,b)

where a and b are the parameters and x is a value or list of values. So for a simple example we can compute and plot the density for Uniform(a=2,b=8) over the range x from 0 to 10.

#generate 1000 equally spaced values between 0 and 10

>x<-(0:1000)*0.01

#compute the uniform density for each value of x

>density<-dunif(x,2,8)Alternatively, we could have written a short function in R to do the same thing:

#user-defined density

>my_dunif<-function(x,a,b){1/(b-a)*(x>=a & x<=b)}

>d2<-my_dunif(x,2,8)

Either approach should produce a plot from

>plot(x,density)

like this

 

 

 

Notice what happens to the density when x<2 or x>8. Likewise we can produce and plot a distribution for x by

>distrib<-punif(x,2,8)

>plot(x,distrib)

producing

Quantiles at specified probability levels are produced by the qunif() command, for example:

#quantiles at standard probability levels

>prob_levels<-c(0,.05,.25,.5,.75,.95,1)

>quants<-qunif(prob_levels,2,8)

>quants

[1] 2.0 2.3 3.5 5.0 6.5 7.7 8.0

Likelihood function

We are used to thinking of probability functions as describing the probability of an outcome x given the underlying model and parameter values (e.g., Uniform(2,8) above). We can turn this idea around though and ask the question of how “likely” a given parameter value is, given the data we have and an underlying model. In this way of the thinking the data are fixed and the model parameter(s) is(are) variable. Mathematically, the calculation is the same but we are just varying different quantities. In the example we just considered, we can ask the question: how likely, given a=2 and a value of x=5, are integer values of the parameter b in the range 3 to 8?

>a<-2

>x<-5

> b<-3:12

> b

[1] 3 4 5 6 7 8 9 10 11 12

> like<-dunif(x,a,b)

> like

[1] 0.0000000 0.0000000 0.3333333 0.2500000 0.2000000 0.1666667 0.1428571 0.1250000

[9] 0.1111111 0.1000000

We see that there is no likelihood that b is 3 or 4 (obviously ruled out by the value x=5) but that given the single observation x=5 we can’t rule out b being 6, 7 ,8 or even higher. We will come back to the likelihood and just how we use data to estimate parameters, in a later lab.

Random number generation

It is very easy to generate random uniform number in R using the runif() function. The first value for the function specifies the number of values you want, and the next to specify the

minimum and maximum (a,b) parameters. By default a=0 and b=1 so runif(100) for example would produce 100 uniform random number between 0 and 1, something that is often the starting point for simulating other, more complicated distributions. To take a specific case, suppose we want to generate 100 uniform random number between 5.5 and 10.4.

#generating n Uniform(a,b) random numbers

>n<-100

>a<-5.5

>b<-10.4

>x<-runif(n,a,b)

would produce a list of numbers (x) with these characteristics.

You can calculate the sample mean from the simulated data

>mean(x)

and confirm that while this gets close to the distribution mean of (a+b)/2 it’s not exact—why is that?

Normal Distribution

Density, distribution, and quantiles

The Normal distribution is perhaps the most familiar statistical distribution. It is symmetric about the mean, with the familiar bell-shaped curve, and is used to model continuous, real values with theoretical range from negative to positive infinity. It is the limiting distribution of many test statistics and functions and is commonly used as an approximation, even when the data are thought to follow some other distribution, often after transformation to reduce skewness or discontinuities in the data. The normal density is determined by the 2 parameters and ( >0) as

,

For example, the Normal density function over -50, 50 for =5 and =15 is produced by

#normal distribution

#generate 10000 equally spaced values between -50 and 50

>x<-(-5000:5000)*0.01

>mu<-5

>sigma<-15

>density<-dnorm(x,mu,sigma)

>plot(x,density)

We can produce a comparable distribution function by

#distribution function

>distrib<-pnorm(x,mu,sigma)

>plot(x,distrib)

producing

Specified probability quantiles are easily obtained from the qnorm() function, for example

>prob_levels<-c(0.001,.05,.25,.5,.75,.95,0.999)

> quants<-qnorm(prob_levels,mu,sigma)

> quants

[1] -41.353485 -19.672804 -5.117346 5.000000 15.117346 29.672804 51.353485

Equivalently, we could say that we are 90% confident that x is between -19.67 and 29.67, with 10% probability (5% in each tail) outside this range.

Notice that in the density dnorm() distribution pnorm() functions, we passed the data as a list to the function, for scalar (1-dimensioned) values of the parameter. Generally speaking any of these function arguments can be lists, and it will make sense below (under the likelihood function) to reverse which ones are.

Likelihood function

Again, we can “turn the model around” and ask the question: how likely is a specific parameter value, given an observation (or a sample of observations)? To keep things simple for the normal, let’s assume that we’ve observed the values x=5 and x = 10, and assume that the standard deviation is fixed at 1. Assuming the normal model, how likely are various values of (say between 2 and 16)? We can compute a likelihood for each data value by dnorm()

First, let’s make life simpler by introducing an R function that will generate a regular sequence at a specified interval, seq(). We use this to produce values for mu in the range of 2 to 16, at 0.5 spacing (finer if we wish), and then feed them into the likelihoods for x=5 and x = 10.

#likelihood

>mu<-seq(2,16,0.5)

>like1<-dnorm(5,mu,1)

>like2<-dnorm(10,mu,1)

At this point, we can first recognize that, assuming that the 2 observations of x are independent, we can multiple their likelihoods or add them on the log scale to get a joint (log) likelihood for the data.

>loglike<-dnorm(5,mu,1,log=TRUE)+dnorm(10,mu,1,log=TRUE)

Finally, we can examine our log likelihood, see which one is biggest, and see which value of mu produced that log likelihood. R has a nice built in index function that will do this which works like this:

>mu[loglike==max(loglike)]

which basically says “find the index of loglik associated with the biggest value and then tell me what the corresponding mu value is at that same index”. In this example, the result is 7.5, which (not coincidentally) is the arithmetic mean of 5 and 10. What we just did is a very crude way (but sometimes effective) way to get the maximum likelihood estimate under a specified model, something we’ll explore more in a later lab.

Random number generation

The easiest way to generate random Normal numbers is by using the built-in R function rnorm().

#Generate 100 random numbers for mu=5 and sigma =10

#method 1

n<-100

mu<-5

sigma<-10

x<-rnorm(n,mu,sigma)

The second (and just as valid) way is to first generate 100 random Uniform(0,1) numbers, and then treat these as

#method 2

#first generate 100 random uniform(0,1) deviates

U<-runif(100)

#now treat these as probability values in qnorm(), which functions as the inverse distribution function to return values of x given U.

x<-qnorm(U,mu,sigma)

You should be able to test these 2 approaches out convince yourself that they produce equivalent results.

Poisson Distribution

The Poisson Distribution is a very important discrete distribution that models outcomes that take on non-negative integer values (0, 1, 2, ……., ). Examples include counts of animals, plants, =the process generating the counts in space is “random” in the sense that counts are not clustered separated except by chance.

Density, distribution, and quantiles

The Poisson Distribution is specified by the single parameter which is equal to both the population mean and variance. Thus sometimes the ratio of the sample mean to the variance is used as evidence (or lack thereof) of Poisson assumptions, with values of this ration ~1 taken as support for a Poisson count model. The density function of the Poisson is given by

, x=0, 1, 2, 3, ……

where e is the base of the natural logarithm, λ>0, and x! denotes the factorial function x*(x-1)*(x-2)…..1. The distribution function is simply given by summation over the discrete values of x of the density

, x=0, 1, 2, 3, …….

The Poisson density and distribution are easily implemented in R., for example for λ=5:

#poisson distribution

#generate a sequence between 0 and 20

>x<-0:20

>lambda<-5

>density<-dpois(x,lambda)

>plot(x,density,"h")

#distribution function

>distrib<-ppois(x,lambda)

>plot(x,distrib,"h")

Likewise, standard quantiles are easily computed-

> #quantiles

> prob_levels<-c(0.001,.05,.25,.5,.75,.95,0.999)

> quants<-qpois(prob_levels,lambda)

> quants

[1] 0 2 3 5 6 9 13

Likelihood function

As with the other distributions we have considered, the likelihood function is formed from the density, but reversing the roles of parameters and data, with the latter now fixed and the former varying over some specified range. We will “cheat” here because we know that given the data λ cannot be huge (say >15) and we know that it has to be >0, so we will only look at the likelihood in that range:

> #likelihood

> lambda<-seq(0.01,15.01,0.01)

> x<-8

> like<-dpois(x,lambda)

> loglike<-dpois(x,lambda,log=TRUE)

> plot(lambda,like)

We can also use a device similar to what we used for the normal to get a fairly good approximation of the maximum likelihood value for λ

> #find the maximum over list of lambdas

> lambda[loglike==max(loglike)]

[1] 8

which is not surprising: given the simplicity of this model (mean=variance) and the single observation x=8, we expect λ to be around 8.

Random number generation

Given the discrete nature of the random variable in the Poisson distribution, there are several options for generating random variables, some of them quite simple, others not so simple but more flexible. We illustrate all 3 with the example of generating 100 random values from a Poisson(10) distribution.

Method 1- built in R function

First, we can of course rely on the standard, built-in R function rpois().

#Generate 100 random numbers for lambda=10

#method 1

n<-100

lambda<-10

x<-rpois(n,lambda)

#Method 2 – Uniform deviate, quantile (inverse distribution function)

The second method also relies on an R-function, the quantile or inverse distribution function, but performs the calculations by first computing 100 random(0,1) numbers and then transforming them with the inverse distribution (quantile) function.

#method 2

>n<-100

>lambda<-100

>U<-runif(100)

>x<-qpois(U,lambda)

Method 3- Uniform deviate and interpolation from distribution

In the third approach we have built a random number generator based directly on the cumulative discrete distribution F(x). As in Method 2 we generate 100 random Uniform variables and then use these and the definition of F(x) to obtain values of x (see Evans et a. 2000); this requires a user-defined function (pfun) to map the continuous values of U into discrete values of x:

#method 3

>n<-100

>lambda<-10

#generate F(x) from 0 to 50 (arbitrary but thought to include most of the prob. density)

values<-0:50

>F<-ppois(values,10)

#define the function to do interpolation from F(x)

> pfun<-function(f,u,v)

> {

> x<-array(0,dim<-c(length(u)))

> for (j in 1:length(u))

> {

> for (i in 1:50)

> {

> if (f[i]<=u[j] & u[j]<f[i+1]) {x[j]<-v[i+1]}

> }

> }

> x

> }

>#generate the values of U and x

>U<-runif(N)

>x<-pfun(F,U,values)

You can confirm that all 3 methods give similar results using large values for n and that the last 2 methods give identical results for the same vector of Uniform numbers. However, Method 3 is much slower than Methods 2 or 1, which simply confirms that (usually) the built-in functions in R tend to be more computationally efficient than what beginning users can build.

Building a function like this on your own though does illustrate that in can be done, and this can be handy in situations where no built-in function exists in R. For example, suppose we have a discrete distribution F(x) without a known mathematical form, but for which we can write out numerical values F(x). A simple example of this is where we use the quantile function to summarize the data from a sample into an empirical distribution function and treat these as F(x). We can then use an approach such as Method 3 to simulate values from this distribution, even though we have no idea of its mathematical form. We’ll return to these ideas later when we get more deeply into simulation and bootstrapping in a later lab.

Bernoulli Distribution/ Binomial Distribution

The Bernoulli Distribution is the natural distribution for modeling outcomes that can occur in 1 of 2 classes, such as success or failure, lived or died, heads or tails, male or female. The Bernoulli Distribution has a single parameter p that describes the probability of a “success” (however it is defined). The Binomial Distribution defines the number of successes that occur in n independent Bernoulli trials, each with the same probability of success p. The Binomial is thus based on summing Bernoulli distributions, and has 2 parameters, n and p. Because these 2 distributions are so closely related we will consider them together below.

Density, distribution, and quantiles

The Bernoulli random variable x takes on 2 possible values, either 1 (indicating success) or 0 (failure), and has a single parameter, p, denoting the probability of success. The probability density function is written as

, x=0, 1

which simplifies to and Note that we assume that there are only 2 possible outcomes, a success with probability p and a failure with probability 1-p, and that by definition the probability that it is either a success or failure adds to 1.

The mean of the Bernoulli distribution is E(x)=µ=p and the variance is Var(x)= p(1-p).

The Binomial distribution is closely related, with the Binomial variable x defined as number of success in n independent Bernoulli trials, each with probability p of success. The Binomial thus has 2 parameters (n and p) though one of these (n) ordinarily is known and will not be estimated from data. The Binomial density function is

., x=0, n

The Binomial distribution function is

, x=0, n

The mean and variance are given by E(x)=µ=np and the variance is Var(x)= np(1-p). The Binomial density and distribution are easily implemented in R by the dbinom() and pbinom() functions (there is no separate Bernoulli function in R, with the Bernoulli simply being a Binomial with a single trial n=1).

E.g., for a Bernoulli with p= 0.4

>#Bernoulli

>p<-0.4

>x<-0:1

>density<-dbinom(x,1,p)

>distrib<-pbinom(x,1,p)

>plot(x,density,"h",ylim=c(0,1))

>plot(x,distrib,"h",ylim=c(0,1))

This produces plots for the density and distribution of:

Taking a Binomial with p=0.4 and n=10 trials we have

>#Binomial

>n<-10

>p<-0.4

>x<-0:n

>density<-dbinom(x,n,p)

>distrib<-pbinom(x,n,p)

>plot(x,density,"h",ylim=c(0,1))

>plot(x,distrib,"h",ylim=c(0,1))

This produces plots

and

Quantiles at specified p-values are easy to produce using the qbinom() function, e.g.,

>n<-10

> p<-0.4

> #quantiles

> prob_levels<-c(0.001,.05,.25,.5,.75,.95,0.999)

> quants<-qbinom(prob_levels,n,p)

> quants

[1] 0 2 3 4 5 7 9

Likelihood function

As with other distributions, we can reverse the roles of the data and the parameters and now treat the parameters as variables. In the case of either the Bernoulli or the Binomial there is generally only one parameter of interest, since we usually know how many trials there are. Take a case where we have 10 trials and we observe 4 successes. We can examine the likelihood over the range of p =(0,1) and try a brute force maximization as before:

> #Likelihood

> p<-seq(0,1,0.001)

> n<-10

> x<-4

> like<-dbinom(x,n,p)

> loglike<-dbinom(x,n,p,log=TRUE)

> plot(p,like)

> plot(p,loglike)

> #find the maximum over list of lambdas

> p[loglike==max(loglike)]

[1] 0.4

The results suggest a value of p =0.4 maximizes the log likelihood. However, notice how “flat” the log likelihood function is, with many values of p larger and smaller than 0.4 returning similar values. This suggests that the data (4 successes but only 10 trials) provides relatively poor information about the parameter value. We will return to this point when we consider estimation in more depth in a later lab.

Random number generation

Generating random numbers for the Bernoulli and Binomial is quite easy and can be accomplished with either a simple random uniform number generator or with the built-in function rbinom() . The first approach computes Bernoulli outcome by simply comparing a Uniform(0,1) random number to p; if U> p then x=0, otherwise x =1

>#generating bernoulli random variables

>#specify p

>p<-0.35

>#specify n_reps

>n_reps<-1000

>#method 1

>x<-(runif(n_reps)<p)*1

>#method 2

>x<-rbinom(n_reps,1,p)

Generating Binomial random variables can be accomplished by generating a series of n Bernoulli variables and then summing these.

>#generating Binomial random variables

>#specify n

>#specify p

>p<-0.35

>#specify n_reps

>n_reps<-100

>n<-10

>#method 1

>x<-array(0,c(n_reps))

>for (i in 1:n_reps)

>{

>x[i]<-sum(runif(n)<=p)*1

>}

Alternatively, you can directly used the rbinom() function in R

>#method 2

>x<-rbinom(n_reps,n,p)

The advantage of the first approach is that sometimes we will not want to assume that the parameter p remains constant, but instead allow it to vary from sample to sample (or even among Bernoulli trials within a sample). In such cases we can still simulate or model the data but no longer under Binomial assumptions (which require p to be constant). We will look at an example of this in a bit.

Multinomial Distribution

The Multinomial Distribution is similar to the Binomial, but instead of modeling outcomes that occur in 2 ways (“success” or “failure”) the outcomes can occur in 3 or more ways. For example, suppose that an animal can die, and if it lives can either reproduce or not reproduce, and that these are the only possibilities. If we assign the probabilities to these events as =probability of death, =probability of living and reproducing, and =probability of living and not reproducing, by definition . Thus, if we know 2 of the 3 probabilities we know the 3rd by subtraction, e.g., . In general, if we have k categories of outcomes we have k-1 probabilities to describe them, with the last by subtraction.

Like the Binomial, the Multinomial is built from a series of n independent trials, each with the same probabilities describing the outcomes. The random variable x is now a vector, denoting the number of the n trials that fall into each category. For example, if we have 100 animals, the outcomes might be 25 die, 50 live and reproduce, and 25 live but do not reproduce. The Multinomial density is

Because of its multivariate nature it is difficult to visualize the density, but density and distribution values are readily computed in R. For example, the density for a 3- category multinomial with 10 trials is calculated by

> #example

> n<-10

> p<-c(.25,.5,.25)

> x<-c(1,5,4)

> density<-dmultinom(x,n,p)

> density<-dmultinom(x,n,p)

> density

[1] 0.03845215

Random Multinomial variables are generated by the rmultinom() function. For instance, to generate 20 instances of the above 10-trial trinomial we would use:

> #Random variables

> rmultinom(20,10,p)

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]

[1,] 2 0 2 4 4 1 2 2 3 5 0 2 2 4

[2,] 6 9 5 3 2 4 6 5 6 1 7 6 3 4

[3,] 2 1 3 3 4 5 2 3 1 4 3 2 5 2

[,15] [,16] [,17] [,18] [,19] [,20]

[1,] 1 0 4 2 3 5

[2,] 6 8 6 5 5 4

[3,] 3 2 0 3 2 1

>

Beta Distribution

The Beta Distribution is a continuous distribution that models a random variable x that can take on values in the range . The Beta is therefore appropriate for modeling the distribution of probability values, and in particular for modeling heterogeneity in probabilities. The parameters α and β (or a and b) control the location and shape of the Beta Distribution. The Uniform(0,1) distribution is a special case of the Beta with α=β=1.

The Beta Distribution assumes additional importance as a natural (or conjugate) distribution for the Binomial, describing uncertainty in the Binomial parameter p before (prior) and after (posterior) data collection. Finally, the Beta and the Binomial can be combined hierarchically in model (the Beta-Binomial) in which the random outcome is a binary, but the process describing “success” is heterogeneous. We will return to both of these themes in later labs.

Density, distribution, and quantiles

The mathematical form of the Beta density is

where is the Gamma function

.

The “kernel” of the Beta (part that involves the random variable x) is actually quite simple and, not coincidentally resembles the Binomial distribution

The mean of the Beta distribution is

and the variance is

In a later lab will use the relationship between the mean and variance and the parameters to solve for parameter estimates via the Method of Moments.

The Beta variable x is sometimes interpreted as modeling the probability of success based on previously observing successes and failures. The Beta distribution function is obtained by integrating the density from 0 to x. Both the density and the distribution can easily be evaluated in R using the dbeta() and pbeta() functions. For example, for we can produce density and distribution values over the range of x.

#Beta distribution

>a<-10

>b<-15

>x<-seq(0,1,0.001)

>#density

>density<-dbeta(x,as,b)

>distrib<-pbeta(x,a,b)

>plot(x,density)

>plot(x,distrib)

This code will produce a plot of the density

and of the distribution

Notice that the density is centered near 0.4, but takes on a fairly wide range, indicating that Beta(10,15) would be appropriate for modeling success probability that averaged about 0.4 but exhibits heterogeneity. We will return to this theme later when we consider the Beta-Binomial distribution.

Standard quantiles of the Beta are easily produced with the qbeta() function, for example

> #quantiles

> prob_levels<-c(0.001,.05,.25,.5,.75,.95,0.999)

> quants<-qbeta(prob_levels,a,b)

> quants

[1] 0.1427044 0.2463886 0.3322060 0.3972924 0.4649202 0.5628929 0.6995664

Likelihood function

As in our previous examples, we can consider the data (x) as fixed (observed) and treat the parameters as variables, producing a likelihood function. Note that with the Beta distribution, like the Normal, we have 2 parameters, so we have to find a combination of and that maximizes the likelihood. The likelihood is easy to compute using R, for example if we observe x= 0.4

>#Likelihood

>a<-seq(5,25,0.001)

>b<-seq(10,30,0.001)

>x<-0.4

>like<-dbeta(x,a,b)

>loglike<-dbeta(x,a,b,log=TRUE)

It is a bit trickier to use “brute force” methods to get the maximum, and instead we will use graphical methods to get an approximation. Because the parameter space is 2 dimensional, we need to display the likelihood in 3 dimensions

The scatterplot function in R will produce a 3-D scatterplot

>library(scatterplot3d)

>scatterplot3d(a,b,loglike)

The graph indicates that the log likelihood has a maximum at around α=20 and β=20. Graphical methods become cumbersome (and inaccurate) for 2 or more parameters; we will consider more exact methods for maximizing the likelihood in a later chapter.

Random number generation

Random number generation is easily performed in R using the rbeta() function. For example, the following code will generate 100 Beta(10,15) random variables.

>#Random betas

>n<-100

>a<-10

>b<-15

>#Method 1

>x<-rbeta(n,a,b)

If α and β are integers, the following code can also be used to generate Beta random variables from 2 Gamma random variables, which in turn are generated by a –log transformation from Uniform distributions:

>#Method 2 (if a and b are integers)

>x<-array(0,c(n))

>for (i in 1:n)

>{

>

#g1 and g2 are Gamma random variables

>g1<-sum(-log(runif(a)))

>g2<-sum(-log(runif(b)))

>x[i]<-g1/(g1+g2)

>}

Gamma

The Gamma distribution is a continuous distribution which has 2 parameters, and where x takes on nonnegative values ( ). The Gamma is important in statistics because several other important distributions such as the Chi-square and Exponential are special cases. We also saw above how Gamma distributions can be used to generate Beta random variables.However, most of our interest in the Gamma will be because of its special relationship to the Poisson distribution, both for modeling heterogeneity in the Poisson parameter and for as a conjugate distribution for the Poisson in Bayesian analysis.

Density, distribution, and quantiles

The density function of the Gamma is

The distribution function of the Gamma is given by integration from 0 to x.

The mean and variance of the Gamma are related to the parameters in a straightforward way by

and

.

As we will see these relationships lead to easy (but not particularly optimal) parameter estimation by the Method of Moments.

The density and distribution are easily generated in R using the dgamma() and pgamma() functions. For example, we can plot the density and distribution for Gamma(b=1,c=5) by

>#Gamma distribution

>c<-5

>b<-1

>x<-seq(0,10,0.001)

>#density

>density<-dgamma(x,c,b)

>distrib<-pgamma(x,c,b)

>plot(x,density)

>plot(x,distrib)

This produces a density over the range of 0 to 10 of

and a distribution of

Quantiles are produced by the qgamma() function. For the same parameter values we can produce several quantiles by

> #quantiles

> prob_levels<-c(0.001,.05,.25,.5,.75,.95,0.999)

> quants<-qgamma(prob_levels,c,b)

> quants

[1] 0.7393717 1.9701496 3.3686004 4.6709089 6.2744307 9.1535190 14.7941492

This indicates, for example, that median (0.5 quantile) of Gamma(1,5) is around 4.7, and that 99% of the data can be expected to lie below 14.8.

Likelihood function

As with other distributions, we can form the likelihood by considering the data (x) as fixed and allowing the parameter values to vary. For example suppose we observe x=5, we can plot the log likelihood versus values of b and c

>#likelihood

>b<-seq(0.01,1,0.001)

>c<-b*10

>x<-5

>like<-dgamma(x,c,b)

>loglike<-dgamma(x,c,b,log=TRUE)

>library(scatterplot3d)

>scatterplot3d(c,b,loglike)

By eyeballing this graphic, we can see that values of c ≈6 and b<0.5 appear to maximize the log likelihood.

Random number generation

We present 2 methods for producing random numbers from the Gamma distribution. The easiest is the built in rgamma() function. For example to generate 1000 Gamma(1,5) random variables:

>#random number generation

>#method 1

>n<-1000

>c<-5

>b<-1

>x<-rgamma(n,c,b)

Gamma variables can be generated directly from Uniform(0,1) random variables via a –log transformation (we used this approach already for Beta random variables), if the parameter c has an integer value:

>#method 2 - if c is integer

>n<-1000

>c<-5

>b<-1

> x<-array(0,c(n))

>for (i in 1:n)

>{

>

>x[i]<-b*sum(-log(runif(c)))

>

>}

Writing simulation programs in R

We have already done a great deal of simulation with individual distributions in R; here we will focus on putting things together into more complicated analyses, and in efficiency.

Setting up functions and iterative loops to do simulation

Simulations almost always involved repeating the same or similar operations over and over, often many times. Those of you familiar with other programming languages may be, thus, naturally inclined to use the “for” loop in R (similar to the “do” loop in FORTRAN). The “for” loop works as follows: the limits in paranetheses specify how many times to repeat what is contained inside the curly brackets. For example

> x<-array(0,10)

> for(i in 1:10){x[i]<-rnorm(1,0,1)}

> x

first sets up an empty array (list) to hold x values, then loops through and generates 10 Normal(0,1) random variables one at a time. This is actually a very inefficient way to do things, because the rnorm() function is designed to return the entire vector at once. That is,

> x<-rnorm(10,0,1)

performs this task in a single line (and a single function call to rnorm). Where possible, try to avoid loops, because these are often (usually) unnecessary. Of course, we won’t always be able to avoid looping, but we can often reduce their complexity by paying attention to what the r functions actually do, and by taking advantage of the object orientation of R. We will see examples below of how to (hopefully) use the features of R in an efficient way to simulate some fairly complex problems.

Linking distributions together

General principles of linked (conditional) stochastic simulations

In much of the class, both in approaching simulation as well as data analysis, we invoke the concept of conditional independence and the related principal of modularity. Under conditional independence, we may have 2 random variables linked together, so that the outcome of y depends on what the outcome of x is, or vice-versa. We can represent the idea of conditioning by means of conditional probability distributions, so that

indicates that probability of the y outcome depends on knowing what the x outcome is. Thus, to fully model y we have to model x. This idea is expressed as

that is, we cannot find the separate or marginal distribution of y without knowing the distribution of x . This seems like it really complicates things, but actually we can make things quite simple by invoking the idea of conditional independence. Essentially this means that once we have determined the value x that another value y depends on, we no longer need to be concerned about how we got x. So for example, let’s say we are interested in modeling y in the relationship

Suppose that we determine (say from simulation from f(x) ) that x=3.5. We now simply have the model

That is, having determined that x = 3.5 we no longer need that model f(x). This is also connected to modularity, leading to the ability to separate complicated models into bits that are connected by conditional relationships. The basic steps are

· Generate the variable or outcome that another outcome is dependent upon. In the above example, that would be x, so x=3.5

· Conditioned on this outcome, generate the dependent outcome, in this example y from the distribution .

· Continue this for any other outcomes that are sequentially dependent (e.g., an outcome that depends on y.

This approach is both simple and extremely general, and only depends on the ability to decompose a more complicated model into conditionally independent pieces. Often, we will model the parameter of one distribution via a statistical distribution, and then use the values that we’ve modeled to describe some other outcome. We will use this approach in each of the examples below

Poisson-Gamma

A mixture of Poisson and Gamma distributions can be appropriate when density is heterogeneous. For a given density (λ), we might model counts by a Poisson distribution

and in turn model λ by a Gamma distribution

By conditional independence we get

We could try to model directly using this rather complicated distribution, perhaps after some simplification—but there is no need to. By conditional independent and modularity we simply

· Model the distribution of λ by for specified values of b and c.

· Model the distribution of x conditional on the value of λ

Question: how many parameters does this model have? Hint- it is not 3!

Implementation is very simple in R. For example, we can generate 100 Poisson-gamma random variables with b=1, c=5 by :

>n_reps<-100

> c<-5

> b<-1

> lambda<-rgamma(n_reps,c,b)

> x<-rpois(n_reps,lambda)

> lambda

Poisson – Binomial

A similar approach can be used to create a mixture of Poisson and Binomial distributions, often appropriate for simulating or modeling detections of animals. The basic idea is that abundance (N) occurs according to a Poisson process with mean (density) λ.

However, the N animals present at each sample i are each detected with probability p. Assuming independence among the detections, this is simply a Binomial(N,p).

This approach is, once again, very easy to implement in R. For example to simulate detections from animals that are distributed with density λ=10 and a binomial detection of p=0.6:.

>#Poisson-Binomial

> n_reps<-100

> lambda<-10

> p_detect<-0.6

> N<-rpois(n_reps,lambda)

> x<-rbinom(n_reps,N,p_detect)

Here, the N values simulated by rpois are simply passed to the rbinom() function, and used instead of a constant N.

Exercise for you: Combine the above model with (i.e., model λ as heterogeneous instead of constant.

Beta – Binomial

As we have seen, the Binomial Distribution assumes that the x successes in our n Bernoulli trials, in addition to being independent, have identical or homogeneous probability of success, p. Instead, we may wish to allow p to vary from over time, space, or even among individual animals. A natural approach for doing this is to combine the Binomial Distribution with the Beta Distribution. Thus we model the Binomial outcome x as usual by

but now instead of assuming that p is fixed we model it as from a Beta Distribution.

We show 2 ways how to implement this combined model in R. In the first, we do explicit looping to generate and sum Bernoulli outcomes conditional on the (heterogeneous) p in each simulation replication.

>#beta binomial

> #method 1

> a<-10

> b<-15

> n<-10

> n_reps<-100

> p<-rbeta(n_reps,a,b)

> x<-array(0,c(n_reps))

> for (i in 1:n_reps)

+ {

+

+ x[i]<-sum(runif(n)<=p[i])*1

+ }

The second method is somewhat easier and simply involves creating a list object of p for all the simulation replications at once, and then passing that object to the rbinom() function to create the x list. However, the first approach could be advantageous in situations where p is heterogeneous not only among replicates but also among trials within each replicate, as is the case with individual animal heterogeneity. In that case, Binomial assumptions no longer hold, requiring explicit generation of individual Bernoulli outcomes.

> #method 2

> a<-10

> b<-15

> n<-10

> n_reps<-100

> p<-rbeta(n_reps,a,b)

> x<-rbinom(n_reps,n,p)

>

Exercise for you: Take the last exercise, in which the Gamma, Poisson, and Binomial were combined, and make detection heterogeneous using a Beta distribution. .

Logit- normal regression

In this example, we wish to generate 1000 Binomial replicates, each based on 10 Bernoulli trials but with p varying according to a linear-logit model. Furthermore, we want to introduce uncertainty in the coefficients of the linear-logit model, specifying that the intercept has a mean of -2.0 and SD of 0.05, and that the slope has a mean of 0.5 and SD of 0.01.Finally, we will assume that the predictor z in the linear-logit model is uniformly distributed over the range of 0 to 10. The following steps with accompanying R code will do the trick:

· Set model parameters and conditions

> #logit-normal regression

> #coefficient means and sds

> n_reps<-1000

> b0_mean<--2.0

> b0_sd<-0.05

> b1_mean<-0.5

> b1_sd<-0.01

· Generate z from

> #simulate the predictor over a range 0-1

> z<-runif(n_reps,0,10)

· Simulate model coefficients from

> #generate coefficients from Normal

> b0<-rnorm(n_reps,b0_mean,b0_sd)

> b1<-rnorm(n_reps,b1_mean,b1_sd)

· Build linear prediction and transform to probability scale

> #prediction

> y<-b0+b1*z

> #inverse logit

> p<-1./(1.+exp(-y))

· Generate detections from Binomial model

> #generate detections in samples of 10

> x<-rbinom(n_reps,10,p)

Exercise for you: Build a simulation that combines Gamma-Poisson heterogeneity, logit-Normal prediction of detection, and Binomial detection.

Additional References

Evans, M., N. Hastings, and B. Peacock. 2000. Statistical distributions. Wiley.