Assignment

1. Statistical distributions

Using R distribution functions 

- Generate 1000 random variables from a normal distribution with mean =5 and sd=2

-  for the above distribution find the quantiles associated with probabilities

   =c(0.01,0.05,0.25,0.5,0.75, 0.95,0.99)

- find the probability that the random variable x is between 2 and 6

- Generate 1000 random variables from a binomial distribution with n=100 and p=0.6

-  for the above distribution find the quantiles for x associated with probabilities

   =c(0.01,0.05,0.25,0.5,0.75, 0.95,0.99)

- find the probability that the random variable x is 50

2. Maximum likelihood

Assume that x is the number of success in 100 Bernoulli trials with constant p and you

observe x=75.   Show that p.hat= 0.75 is the maximum likelihood estimate of p by 

-- graphing the likelihood values over the range of p =0,1 and visual inspection

- Calculus

-- using a numerical solver (see example R code but this can be done with Solver in excel too!). This example uses nlm (nonlinear minimization);  we used optim in the class examples, which does basically the same thing.

Note 1--- Why (natural) log transformations?

Most of the distributions (and hence likelihood functions) we encounter will be members of the exponential family. Taking derivatives and similar operations (needed for maximization) is a lot easier if you first transform the function by taking natural logarithms. Taking the binomial as an example, the likelihood function (disregarding the combinatorial term, which does not involved the parameter p) is:

f(p) = px(1-p)n-x

you can find the maximum of this (value of p that maximizes the function given n and x) but it is going to be  a bit messier than taking logarithms:

                                                            log[f(p)] = xlog(p) + (n-x) log(1-p)

The derivative of this with respect to p is fairly easy to get, as we see in the example.  Since the log scale is a monotonic transform, the maximum of this function is the same as the maximum of the untransformed one.  Finally, taking logs scales the results to more reasonable values (closer to 0), whereas the untransformed function can produce huge or tiny numbers depending on the data and the distribution.  For this reason, even purely numerical optimization packages will work better on the log scale. 

 Log transformation also makes the resulting system of equations easier to solve algebraically, since for many distributions these will be linear equations. Note that if we have a distribution like the binomial with a single parameter we ordinarily wind up with just 1 equation to solve for the parameter value as a function of the data. If we have a multi- parameter distribution we have a system of equations single equation to solve for the parameter value as a function of the data. E.g., for the normal distribution we will have 2 equations and 2 unknowns (one for the mean, mu, and one for the sd, sigma).

Note 2---Why bother?

Why might you want to do this instead of using a canned ML estimator? Because it provides a general way for estimation for any likelihood problem where a) you can write out the likelihood, and b) the parameters are statistically identifiable. And sometimes that will be a non-standard likelihood that nevertheless can produce legitimate MLEs.  In fact, in Week 8 we will produce estimates that are based on combining 2 standard likelihoods, but combine parameters in a non-standard (but important) way when we analyze age structure in harvest data.

3 . Continuing your exercises...

-- Compute the 95% profile likelihood confidence interval for p. This is an approach based on direct application of the

likelihood function that provides more accurate CIs than asymptotic normal ones (but is computationally more intensive).

The example R code does this for a binomial example. You will encounter profile likelihood as an option in several statistical packages (like MARK) and it is important to know what it does. Using the binomial example example, you can find out for example that when conditions lie in "asymptopia" (big n, p around 0.5) the normal and profile CIs are very similar-- meaning that the asymptotic approach is a good approximation.  Try something wacky like n=10 and p=0.9.  You should see that the normal (asymptotic) CI gives very different values than the profile, and in fact includes values for p >1.  Is this even possible? Hmm.

4. AIC and model averaging 

Run AIC / model averaging examples covered in class (script and data below) and briefly summarize the results