LAB 3 LIKELIHOOD Likelihood Methods in Ecology June 2008 Goals: The goals of this lab are to familiarize you with the procedures required to obtain Maximum Likelihood Estimates (MLE) of scientific and statistical models. We will be using built-in R functions. You will need to install the stats4 package for the second part of the exercise. PART 1. From Bolker (ms). The first application of maximum likelihood estimation is when we have a collection of observations that we believe follow a particular distribution and we want to estimate the true parameters of the distribution. The method of moment estimates are typically biased and ML estimates are preferred. To convince you, we are going to simulate a data set, using parameter that we actually know with certainty, and demonstrate how accurate MOM and ML methods are at recovering these parameter values. Start by simulating some negative binomial data, using specific parameters values: mu.true <- 1 k.true <- 0.4 x <- rnbinom(50, mu = mu.true, size = k.true) Take a look at the data using some graphics. hist(x) Now we will build a function that calculates the negative log-likelihood for this distribution given a set a parameters. Arguments for this function are p, the vector of parameters (mu and k), and dat the vector of data: NLLfun1 = function(p, dat = x) { mu = p[1] k = p[2] -sum(dnbinom(x, mu = mu, size = k, log = TRUE)) } First we will calculate the negative log-likelihood with the true values of the distribution. We have to combine these values into a vector to be able to pass them to the NLL function, using c(): nll.true = NLLfun1(p = c(mu = mu.true, k = k.true), dat = x) nll.true Let’s first try to estimate the parameter values using the method of moments. From last week you know that: MOM x And since var( x) x x2 , then we also know that k kMOM Page 1-Lab 3 x var( x) 1 x m = mean(x) v = var(x) mu.mom = m k.mom = m/(v/m - 1) Compare these MOM estimates to the true values. The negative log-likelihood estimate for method of moments parameters: nll.mom = NLLfun1(c(mu = mu.mom, k = k.mom)) nll.mom What is the difference in the likelihood of the two estimates? The LRT test would say that it has to be greater than a chi-square with two degrees of freedom (0.95)/2. Is it? Ldiff <- nll.true - nll.mom Ldiff qchisq(0.95, df = 2)/2 So there doesn’t appear to be a difference in the NLL values of the true and MOM parameters. How about the MLE estimates? We use optim with the default Nelder-Mead algorithm (more about this later…) and we use the MOM estimates as starting conditions: sol1 = optim(fn = NLLfun1, par = c(mu = mu.mom, k = k.mom), hessian = TRUE) sol1 $par mu k 1.1599145 0.3638942 $value [1] 71.22185 $counts function gradient 43 NA $convergence [1] 0 $message NULL $hessian mu k mu 10.29547962 -0.00184977 k -0.00184977 55.55088551 The optimization result is a list with elements: The best-fit parameters (sol1$par), The minimum negative log-likelihood (sol1$value); Information on the number of function evaluations (sol1$counts; the gradient Page 2-Lab 3 part is NA because we didn’t specify a function to calculate the derivatives; Information on whether the algorithm thinks it found a good answer (sol1$convergence, which is zero if R thinks everything worked and uses various numeric codes (see ?optim for details) if something goes wrong; sol1$message which may give further information about whether the fit converged or how it failed to converge; Because we set hessian = TRUE, we also get sol1$hessian, which gives the (finite difference approximation of) the second derivatives evaluated at the MLE (See lecture). Is the NLL value better (i.e., closer to the true values) than for the MOM? By how much? Note that answers may differ among you because initial values for x were drawn at random from a negative binomial distribution. Let’s find likelihood surfaces, profiles and confidence intervals. First, we set up a matrix (resmat) that will contain values of mu and k around the estimates, say mu from 0.4 to 3 in steps of 0.05 and k from 0.01 to 0.7 in steps of 0.01. muvec <- seq(0.4, 3, by = 0.05) kvec <- seq(0.01, 0.7, by = 0.01) resmat <- matrix(nrow = length(muvec), ncol = length(kvec)) Now we calculate the negative log-likelihood for all these values and store them in the results matrix. Remember that indexing in a matrix works as [rows, columns], so this loop first goes down each row, and then across each column, filling in the NLL values for that combination of mu and k: for (i in 1:length(muvec)) { for (j in 1:length(kvec)) { resmat[i, j] = NLLfun1(c(muvec[i], kvec[j])) } } Now plot this. contour(muvec, kvec, resmat, xlab = "mu", ylab = "k") And to add more resolution in the middle: contour(muvec, kvec, resmat, levels = 70:80, lty = 2, add = TRUE) Note that your NLL values may center differ depending on the initial values of x, and if so you will want to change the "levels" to add resolution near your actual best NLL value. Also, note that we used dashed lines by setting 'lty' to 2, and defined the resolution with the 'levels' option. You can use other graphical parameters here, too: look at the help file for par for more detail. Page 3-Lab 3 We can now add support intervals to our parameters estimates. This is a bit involved. You may want to consider leaving this for later and moving on to Question 2. Before we calculate the support intervals, we want to be able to see what the range of values is for each one of the parameters, when holding the other one constant at the maximum likelihood estimate; this is the profile of each paramets. To calculate profile for mu we define a function that takes mu as a separate parameter (which optim will not change as it goes along), and optimizes with respect to k. NLLfun.mu <- function(p, mu){ k <- p[1] -sum(dnbinom(x, mu = mu, size = k, log = TRUE)) } Now we set up a matrix with 2 columns, one for the best fit k value, and for the min NLL found: mu.profile <- matrix(ncol = 2, nrow = length(muvec)) Now run the optimization for each value stored in the mu vector and store the result. This is similar to what we initially did looking at many values of both mu and k, except here the mu values are constant at each one of the values in muvec: for (i in 1:length(muvec)) { Oval = optim(fn = NLLfun.mu, par = sol1$par["k"], method = "L-BFGS-B", lower = 0.002, mu = muvec[i]) mu.profile[i, ] = c(Oval$par, Oval$value) } colnames(mu.profile) = c("k", "NLL") Do the same for k: NLLfun.k = function(p, k) { mu = p[1] -sum(dnbinom(x, mu = mu, size = k, log = TRUE)) } k.profile = matrix(ncol = 2, nrow = length(kvec)) for (i in 1:length(kvec)) { Oval = optim(fn = NLLfun.k, par = sol1$par["mu"], method = "L-BFGS-B", lower = 0.002, k = kvec[i]) k.profile[i, ] = c(Oval$par, Oval$value) } colnames(k.profile) = c("mu", "NLL") Redraw the contour plot with profiles added. The first two lines will re-draw the contour plot exactly as above, and then the second two add the profiles, giving us a "crosshairs" on the likelihood surface: contour(muvec, kvec, resmat, xlab = "mu", ylab = "k") Page 4-Lab 3 contour(muvec, kvec, resmat, levels = 70:80, lty = 2, add = TRUE) lines(muvec, mu.profile[, "k"], lwd = 2) lines(k.profile[, "mu"], kvec, lwd = 2, lty = 2) Here, the command lines() draws a line given a set of x and y values (the first two arguments). The other values set the weight of the lines (lwd) and the type of line (lty). You can also plot univariate likelihood profiles. Let's do this for mu; this is the NLL for the model when mu was held to each value along the x-axis, and k was allowed to vary: plot(muvec, mu.profile[, "NLL"], type = "l", xlab = "mu", ylab = "Negative log-likelihood") and define the cutoff value by the LRT. This is the highest negative log-likelihood value that we would accept as being no different from that given the maximum likelihood estimates of the parameters: cutoff <- sol1$value + qchisq(0.95, 1)/2 This allows us to set up a search routine in R to find the support limits. This function returns the difference between the NLL of the restricted model (i.e., the one where the value of mu is held constant) and our cutoff NLL value: relheight = function(mu) { O2 = optim(fn = NLLfun.mu, par = sol1$par["k"], method = "L-BFGS-B", lower = 0.002, mu = mu) O2$value -cutoff } We can use R’s uniroot function which takes a single-parameter function and searches for the value that makes it zero. We use it to find the value of mu where the NLL of the restricted model equals the cutoff NLL value. Make sure the interval that you give contains endpoints with different signs(+/-). Otherwise uniroot will not find the point that makes the function 0. Try it first with a few different values of mu, then put the lower and upper ranges into uniroot. lower = uniroot(relheight, interval = c(0.4, 1))$root upper = uniroot(relheight, interval = c(1.2, 5))$root ci.uniroot = c(lower, upper) plot(muvec, mu.profile[, "NLL"], type = "l", xlab = "mu", ylab = "Negative log-likelihood") Add the confidence intervals using abline(), which we used previously to draw "an a-b line" (a straight line, like y = a + bx). Here we give it values of v (for vertical lines) and h (for horizontal lines). abline(v = ci.uniroot, lty = 3) cutoffs = c(0, qchisq(c(0.95), 1)/2) nll.levels = sol1$value + cutoffs abline(h = nll.levels, lty = 1:2) Page 5-Lab 3 mtext("Lower", side = 3, at = ci.uniroot[1], line = 1) mtext("Upper", side = 3, at = ci.uniroot[2], line = 1) mtext("LRT cutoff", side = 4, at = cutoff, line = 1) EXERCISE 1: Generate data of your choice using the probability functions you learned last week. Calculate the MOM estimates of the parameters. Calculate the MLE and estimate the difference in likelihood and in prediction. Calculate the support intervals for the parameters. PART 2. Let’s look at another dataset. Schmitt et al. (1999) explored reef fish recruitment using a dataset from 603 lagoons in Polynesia. I have provided you with a simulated data set that resembles their data (Reef_fish.txt). Read the data, attach it, and plot it. Looking at the data, does the following scientific model seem reasonable to you? Re cruits aSettlers a 1 Settlers b Plot the frequency distribution of the data. What probability distribution(s) would you choose? Why? Assuming your probability distribution of choice, how would you about calculating MLE for your scientific model and error structure? First, we define our scientific model: adult.recruits <- function(a, b, settlers) { a * settlers/(1 + (a/b) * settlers) } and our error structure. I will choose a binomial error structure since I am interested in the probability of settlement, that is, an easy way of thinking about this is to say: a Pr ob Re cruits a 1 Settlers b which turns this into a very straight-forward binomial probability. You will need to install and load the stats4 package before you proceed. Now our likelihood function would be: library(stats4) NLLfun<-function(a,b) { recprob=a/(1+(a/b)*settlers) -sum(dbinom (recruits, prob = recprob, size = settlers, log = TRUE), na.rm = TRUE) } Now we need to call an optimization procedure (NOTE: R is notoriously fickle about optimization problems and you may end up using a number of different packages to get the procedure to work….More in this later). Page 6-Lab 3 results <- mle(minuslogl = NLLfun, start = list(a = 0.5, b = 10), method = "L-BFGS-B", lower = 0.003) You will get some error messages from R because it is trying to calculate the likelihood. This is because when you have 0 settlers the probability of having 0 recruits is 1 but according to your function it is a. You could now calculate the NLL by plugging the solutions into the likelihood function and plot the fitted curve using the estimated parameters: a <- coef(results)["a"] b <- coef(results)["b"] plot(settlers, recruits) curve(a * x/(1+(a/b) * x), add=TRUE, col=”blue”) The negative loglikelihood is: NLL <- NLLfun(a, b) Don't forget to detach your reef fish data frame when you're done. EXERCISE 2: We chose the binomial likelihood function for convenience. However, there are a number of other possibilities. Choose another likelihood function that fits the data and calculate the NLL. Is it better or worse? EXERCISE 3: Read the file we used yesterday in Sapling_Growth.txt. Fit a single model to saplings from both sites. Plot the data to determine which equation would be appropriate to model the data. There are many reasonable candidates. One common one (Michaelis-Menten function) is: Growth a * Light a 1 Light b You will notice that the function is the same one we used to model fish recruitment. It is a convenient, well-behaved function that works well for many biological models. The parameters are the slope in the linear part of the function (a) and the asymptotic value it reaches. Plot the likelihood surface. Page 7-Lab 3