LAB –PROBABILITY DISTRIBUTIONS AND LIKELIHOOD - Sortie-ND

advertisement
LAB 3 LIKELIHOOD
Likelihood Methods in Ecology
June 2008
Goals:
The goals of this lab are to familiarize you with the procedures required to obtain Maximum
Likelihood Estimates (MLE) of scientific and statistical models. We will be using built-in R
functions. You will need to install the stats4 package for the second part of the exercise.
PART 1. From Bolker (ms). The first application of maximum likelihood estimation is when we
have a collection of observations that we believe follow a particular distribution and we want to
estimate the true parameters of the distribution. The method of moment estimates are typically
biased and ML estimates are preferred. To convince you, we are going to simulate a data set,
using parameter that we actually know with certainty, and demonstrate how accurate MOM and
ML methods are at recovering these parameter values. Start by simulating some negative
binomial data, using specific parameters values:
mu.true <- 1
k.true <- 0.4
x <- rnbinom(50, mu = mu.true, size = k.true)
Take a look at the data using some graphics.
hist(x)
Now we will build a function that calculates the negative log-likelihood for this distribution
given a set a parameters. Arguments for this function are p, the vector of parameters (mu and k),
and dat the vector of data:
NLLfun1 = function(p, dat = x) {
mu = p[1]
k = p[2]
-sum(dnbinom(x, mu = mu, size = k, log = TRUE))
}
First we will calculate the negative log-likelihood with the true values of the distribution. We
have to combine these values into a vector to be able to pass them to the NLL function, using
c():
nll.true = NLLfun1(p = c(mu = mu.true, k = k.true), dat = x)
nll.true
Let’s first try to estimate the parameter values using the method of moments. From last week
you know that: MOM  x
And since
 var( x)  x 

x2
, then we also know that
k
kMOM 
Page 1-Lab 3

x
var( x) 

1
 x 
m = mean(x)
v = var(x)
mu.mom = m
k.mom = m/(v/m - 1)
Compare these MOM estimates to the true values. The negative log-likelihood estimate for
method of moments parameters:
nll.mom = NLLfun1(c(mu = mu.mom, k = k.mom))
nll.mom
What is the difference in the likelihood of the two estimates? The LRT test would say that it has
to be greater than a chi-square with two degrees of freedom (0.95)/2. Is it?
Ldiff <- nll.true - nll.mom
Ldiff
qchisq(0.95, df = 2)/2
So there doesn’t appear to be a difference in the NLL values of the true and MOM parameters.
How about the MLE estimates? We use optim with the default Nelder-Mead algorithm (more
about this later…) and we use the MOM estimates as starting conditions:
sol1 = optim(fn = NLLfun1, par = c(mu = mu.mom, k = k.mom), hessian = TRUE)
sol1
$par
mu
k
1.1599145 0.3638942
$value
[1] 71.22185
$counts
function gradient
43
NA
$convergence
[1] 0
$message
NULL
$hessian
mu
k
mu 10.29547962 -0.00184977
k -0.00184977 55.55088551
The optimization result is a list with elements:

The best-fit parameters (sol1$par),

The minimum negative log-likelihood (sol1$value);

Information on the number of function evaluations (sol1$counts; the gradient
Page 2-Lab 3



part is NA because we didn’t specify a function to calculate the derivatives;
Information on whether the algorithm thinks it found a good answer
(sol1$convergence, which is zero if R thinks everything worked and uses
various numeric codes (see ?optim for details) if something goes wrong;
sol1$message which may give further information about whether the fit
converged or how it failed to converge;
Because we set hessian = TRUE, we also get sol1$hessian, which gives the
(finite difference approximation of) the second derivatives evaluated at the MLE (See
lecture).
Is the NLL value better (i.e., closer to the true values) than for the MOM? By how much? Note
that answers may differ among you because initial values for x were drawn at random from a
negative binomial distribution.
Let’s find likelihood surfaces, profiles and confidence intervals. First, we set up a matrix
(resmat) that will contain values of mu and k around the estimates, say mu from 0.4 to 3 in steps
of 0.05 and k from 0.01 to 0.7 in steps of 0.01.
muvec <- seq(0.4, 3, by = 0.05)
kvec <- seq(0.01, 0.7, by = 0.01)
resmat <- matrix(nrow = length(muvec), ncol = length(kvec))
Now we calculate the negative log-likelihood for all these values and store them in the results
matrix. Remember that indexing in a matrix works as [rows, columns], so this loop first goes
down each row, and then across each column, filling in the NLL values for that combination of
mu and k:
for (i in 1:length(muvec)) {
for (j in 1:length(kvec)) {
resmat[i, j] = NLLfun1(c(muvec[i], kvec[j]))
}
}
Now plot this.
contour(muvec, kvec, resmat, xlab = "mu", ylab = "k")
And to add more resolution in the middle:
contour(muvec, kvec, resmat, levels = 70:80, lty = 2, add = TRUE)
Note that your NLL values may center differ depending on the initial values of x, and if so you
will want to change the "levels" to add resolution near your actual best NLL value.
Also, note that we used dashed lines by setting 'lty' to 2, and defined the resolution with the
'levels' option. You can use other graphical parameters here, too: look at the help file for par for
more detail.
Page 3-Lab 3
We can now add support intervals to our parameters estimates. This is a bit involved. You may
want to consider leaving this for later and moving on to Question 2.
Before we calculate the support intervals, we want to be able to see what the range of values is
for each one of the parameters, when holding the other one constant at the maximum likelihood
estimate; this is the profile of each paramets. To calculate profile for mu we define a function that
takes mu as a separate parameter (which optim will not change as it goes along), and optimizes
with respect to k.
NLLfun.mu <- function(p, mu){
k <- p[1]
-sum(dnbinom(x, mu = mu, size = k, log = TRUE))
}
Now we set up a matrix with 2 columns, one for the best fit k value, and for the min NLL found:
mu.profile <- matrix(ncol = 2, nrow = length(muvec))
Now run the optimization for each value stored in the mu vector and store the result. This is
similar to what we initially did looking at many values of both mu and k, except here the mu
values are constant at each one of the values in muvec:
for (i in 1:length(muvec)) {
Oval = optim(fn = NLLfun.mu, par = sol1$par["k"],
method = "L-BFGS-B", lower = 0.002, mu = muvec[i])
mu.profile[i, ] = c(Oval$par, Oval$value)
}
colnames(mu.profile) = c("k", "NLL")
Do the same for k:
NLLfun.k = function(p, k) {
mu = p[1]
-sum(dnbinom(x, mu = mu, size = k, log = TRUE))
}
k.profile = matrix(ncol = 2, nrow = length(kvec))
for (i in 1:length(kvec)) {
Oval = optim(fn = NLLfun.k, par = sol1$par["mu"], method = "L-BFGS-B",
lower = 0.002, k = kvec[i])
k.profile[i, ] = c(Oval$par, Oval$value)
}
colnames(k.profile) = c("mu", "NLL")
Redraw the contour plot with profiles added. The first two lines will re-draw the contour plot
exactly as above, and then the second two add the profiles, giving us a "crosshairs" on the
likelihood surface:
contour(muvec, kvec, resmat, xlab = "mu", ylab = "k")
Page 4-Lab 3
contour(muvec, kvec, resmat, levels = 70:80, lty = 2, add = TRUE)
lines(muvec, mu.profile[, "k"], lwd = 2)
lines(k.profile[, "mu"], kvec, lwd = 2, lty = 2)
Here, the command lines() draws a line given a set of x and y values (the first two arguments).
The other values set the weight of the lines (lwd) and the type of line (lty).
You can also plot univariate likelihood profiles. Let's do this for mu; this is the NLL for the
model when mu was held to each value along the x-axis, and k was allowed to vary:
plot(muvec, mu.profile[, "NLL"], type = "l", xlab = "mu", ylab = "Negative
log-likelihood")
and define the cutoff value by the LRT. This is the highest negative log-likelihood value that we
would accept as being no different from that given the maximum likelihood estimates of the
parameters:
cutoff <- sol1$value + qchisq(0.95, 1)/2
This allows us to set up a search routine in R to find the support limits. This function returns the
difference between the NLL of the restricted model (i.e., the one where the value of mu is held
constant) and our cutoff NLL value:
relheight = function(mu) {
O2 = optim(fn = NLLfun.mu, par = sol1$par["k"],
method = "L-BFGS-B", lower = 0.002, mu = mu)
O2$value -cutoff
}
We can use R’s uniroot function which takes a single-parameter function and searches for the
value that makes it zero. We use it to find the value of mu where the NLL of the restricted model
equals the cutoff NLL value. Make sure the interval that you give contains endpoints with
different signs(+/-). Otherwise uniroot will not find the point that makes the function 0. Try it
first with a few different values of mu, then put the lower and upper ranges into uniroot.
lower = uniroot(relheight, interval = c(0.4, 1))$root
upper = uniroot(relheight, interval = c(1.2, 5))$root
ci.uniroot = c(lower, upper)
plot(muvec, mu.profile[, "NLL"], type = "l", xlab = "mu", ylab = "Negative
log-likelihood")
Add the confidence intervals using abline(), which we used previously to draw "an a-b line" (a
straight line, like y = a + bx). Here we give it values of v (for vertical lines) and h (for horizontal
lines).
abline(v = ci.uniroot, lty = 3)
cutoffs = c(0, qchisq(c(0.95), 1)/2)
nll.levels = sol1$value + cutoffs
abline(h = nll.levels, lty = 1:2)
Page 5-Lab 3
mtext("Lower", side = 3, at = ci.uniroot[1], line = 1)
mtext("Upper", side = 3, at = ci.uniroot[2], line = 1)
mtext("LRT cutoff", side = 4, at = cutoff, line = 1)
EXERCISE 1: Generate data of your choice using the probability functions you learned last
week. Calculate the MOM estimates of the parameters. Calculate the MLE and estimate the
difference in likelihood and in prediction. Calculate the support intervals for the parameters.
PART 2. Let’s look at another dataset. Schmitt et al. (1999) explored reef fish recruitment using
a dataset from 603 lagoons in Polynesia. I have provided you with a simulated data set that
resembles their data (Reef_fish.txt). Read the data, attach it, and plot it. Looking at the data, does
the following scientific model seem reasonable to you?
Re cruits 
aSettlers
a
1  Settlers
b
Plot the frequency distribution of the data. What probability distribution(s) would you choose?
Why?
Assuming your probability distribution of choice, how would you about calculating MLE for
your scientific model and error structure?
First, we define our scientific model:
adult.recruits <- function(a, b, settlers) {
a * settlers/(1 + (a/b) * settlers)
}
and our error structure. I will choose a binomial error structure since I am interested in the
probability of settlement, that is, an easy way of thinking about this is to say:
a
Pr ob Re cruits 
a
1  Settlers
b
which turns this into a very straight-forward binomial probability. You will need to install and
load the stats4 package before you proceed. Now our likelihood function would be:
library(stats4)
NLLfun<-function(a,b) {
recprob=a/(1+(a/b)*settlers)
-sum(dbinom (recruits, prob = recprob, size = settlers,
log = TRUE), na.rm = TRUE)
}
Now we need to call an optimization procedure (NOTE: R is notoriously fickle about
optimization problems and you may end up using a number of different packages to get the
procedure to work….More in this later).
Page 6-Lab 3
results <- mle(minuslogl = NLLfun, start = list(a = 0.5, b = 10), method =
"L-BFGS-B", lower = 0.003)
You will get some error messages from R because it is trying to calculate the likelihood. This is
because when you have 0 settlers the probability of having 0 recruits is 1 but according to your
function it is a.
You could now calculate the NLL by plugging the solutions into the likelihood function and plot
the fitted curve using the estimated parameters:
a <- coef(results)["a"]
b <- coef(results)["b"]
plot(settlers, recruits)
curve(a * x/(1+(a/b) * x), add=TRUE, col=”blue”)
The negative loglikelihood is:
NLL <- NLLfun(a, b)
Don't forget to detach your reef fish data frame when you're done.
EXERCISE 2: We chose the binomial likelihood function for convenience. However, there are a
number of other possibilities. Choose another likelihood function that fits the data and calculate
the NLL. Is it better or worse?
EXERCISE 3: Read the file we used yesterday in Sapling_Growth.txt. Fit a single model to
saplings from both sites. Plot the data to determine which equation would be appropriate to
model the data. There are many reasonable candidates. One common one (Michaelis-Menten
function) is:
Growth 
a * Light
a
1  Light
b
You will notice that the function is the same one we used to model fish recruitment. It is a
convenient, well-behaved function that works well for many biological models. The parameters
are the slope in the linear part of the function (a) and the asymptotic value it reaches. Plot the
likelihood surface.
Page 7-Lab 3
Download