MT2004 Olivier GIMENEZ Telephone: 01334 461827 E-mail: olivier@mcs.st-and.ac.uk Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html 11. Maximum Likelihood Estimation • So far, we’ve provided confidence intervals for and tested hypotheses about model parameters (mean of a normally distributed population) • Objective here: estimating the parameters of a model, using data • Example: We want to estimate the probability of getting a head upon flipping a particular coin. • We flip the coin ‘independently’ 10 times (i.e. we sample n = 10 flips), obtaining the following result: H H T H H H T T H H 11. Maximum Likelihood Estimation • We flip the coin ‘independently’ 10 times (i.e. we sample n = 10 flips), obtaining the following result: H H T H H H T T H H • The probability of obtaining this sequence – in advance of collecting the data – is a function of the unknown parameter : • Pr(data | parameter) = Pr(H H T H H H T T H H | ) = (1 - ) (1 - ) (1 - ) = 7 (1 - )3 • But the data for our particular sample are fixed: we have already collected them! • The parameter also has a fixed value, but this value is unknown. We know that it varies between 0 and 1. 11. Maximum Likelihood Estimation • The value of varies between 0 and 1 • We shall treat the probability of the observed data as a function of • This function is called the likelihood function: • L(parameter | data) = Pr(H H T H H H T T H H | ) = L( | H H T H H H T T H H) = L( | data) = 7 (1-)3 • The probability function and the likelihood function are the same equation. But the probability function is a function of the data with the value of the parameter fixed, while the likelihood function is a function of the parameter with the data fixed. 11. Maximum Likelihood Estimation • Here are some representative values of the likelihood for different values of Likelihood of observing 7 heads and 3 tails for different values of the probability of observing a head, 11. Maximum Likelihood Estimation • The probability of obtaining the sample of data that we have in hand, H H T H H H T T H H, is small regardless of the true value of . • This is usually the case; any specific sample result – including the one that is realised – will have low probability • Nevertheless, the likelihood contains useful information about the unknown parameter • E.g. cannot be zero or one (probability 0), and is unlikely to be close to zero or one. • Reversing this reasoning, the value of that is most supported by the data is the one for which the likelihood is largest • This value is the maximum-likelihood estimate (MLE) of 11. Maximum Likelihood Estimation • • More generally, for n independent flips of the coin, producing a particular sequence that includes x head and n - x tails, L( | data) = Pr(data | ) = x (1-)n-x • We want the value of that maximises L( | data), which we often abbreviate L() • It is simpler – and equivalent – to find the value of that maximises the log of the likelihood Log L() = x log + (n – x ) log(1-) 11. Maximum Likelihood Estimation • Differentiating log L() with respect to produces • Setting the derivative to 0 and solving produces the MLE which, as before, is the sample proportion x / n 11. Maximum Likelihood Estimation • In greater generality: consider a set of observations x1,…,xn which are modelled as observations of independent discrete random variables with probability function f(x;) which depends on some (vector of) parameters . • According to the model, the probability of obtaining the observed data is the product of the probability functions for each observations, i.e. • We seek the parameters of the model that make the data look most probable, in other words, we seek to maximise the likelihood L(; x1,…, xn) (function of the parameters with data fixed) with respect to 11. Maximum Likelihood Estimation • Equivalently, we seek to maximise the log-likelihood • Recall that log(a b) = log(a) + log(b) • Example: suppose that you have n observations x1,…,xn on independent Poisson distributed random variables each with probability function 1. Form the likelihood and then the corresponding log-likelihood 2. Maximise the log-likelihood w.r.t. and obtain its MLE 11. Maximum Likelihood Estimation 1. Form the likelihood… … and then the corresponding log-likelihood 11. Maximum Likelihood Estimation 2. Maximise the log-likelihood w.r.t. and obtain its MLE so the MLE of is: • The expression that we’ve just derived is an estimator, i.e. a function of the random variables X1,…, Xn • The value of this function which is obtained by evaluating it on observation values x1,…, xn is an estimate. 11. Maximum Likelihood Estimation The MLE of is: • The expression that we’ve just derived is an estimator, i.e. a function of the random variables X1,…, Xn • The value of this function which is obtained by evaluating it on observation values x1,…, xn is an estimate. • Suppose that in this case we have 4 observations 1, 3, 8 and 2. What is the maximum likelihood estimate? 11. Maximum Likelihood Estimation The MLE of is: • Suppose that in this case we have 4 observations 1, 3, 8 and 2. What is the maximum likelihood estimate? • The maximum likelihood estimate is: • Note that, in general, we should check that we have obtained a maximum likelihood estimator, so we should calculate the second derivative d2 l()/ d 2 and check that it’s negative (which it is in this example)… In other words, l() is concave. 11. Maximum Likelihood Estimation A more complicated example… Suppose that you have a series of measurements y1,…, yn of radioactive emission counts from samples of caesium of masses x1,…, xn, respectively. You wish to model the counts as Poisson random variables, where each Yi has mean xi. Obtain the maximum likelihood estimator of (the radioactivity per unit mass) 1. Form the likelihood and then the corresponding log-likelihood 2. Maximise the log-likelihood w.r.t. and obtain its MLE 11. Maximum Likelihood Estimation Suppose that you have a series of measurements y1,…, yn of radioactive emission counts from samples of caesium of masses x1,…, xn, respectively. You wish to model the counts as Poisson random variables, where each Yi has mean xi. Obtain the maximum likelihood estimator of (the radioactivity per unit mass) 11. Maximum Likelihood Estimation Suppose that you have a series of measurements y1,…, yn of radioactive emission counts from samples of caesium of masses x1,…, xn, respectively. You wish to model the counts as Poisson random variables, where each Yi has mean xi. Obtain the maximum likelihood estimator of (the radioactivity per unit mass) 11. Maximum Likelihood Estimation Likelihood for continuous distributions • So far, MLE for a single parameter, using discrete data (Binomial, Poisson) • Maximum likelihood estimation works as well for continuous random variables • The likelihood is the product of the p.d.f.’s of the r.v.’s • BUT, the likelihood (or log-likelihood) can no longer be interpreted as a probability of getting the observed data, given , only as a probability density of getting the observed data. 11. Maximum Likelihood Estimation Likelihood for continuous distributions • BUT, the likelihood (or log-likelihood) can no longer be interpreted as a probability of getting the observed data, given , only as a probability density of getting the observed data. • In practice, this makes no difference. We maximise the likelihood w.r.t. the parameters as usual. 11. Maximum Likelihood Estimation Likelihood for continuous distributions Example 1: The following data are a small part of a dataset on coal mining disasters; the numbers are times in days between major disasters: 157, 33, 186, 78, 538, 3. One model for such data assumes that the times are independent random variables T1,…, T6, all with the same p.d.f. 1. Form the likelihood and then the corresponding log-likelihood 2. Maximise the log-likelihood w.r.t. and obtain the MLEs (estimator and estimate) 11. Maximum Likelihood Estimation Likelihood for continuous distributions Example 1: The following data are a small part of a dataset on coal mining disasters; the numbers are times in days between major disasters: 157, 33, 186, 78, 538, 3. One model for such data assumes that the times are independent random variables T1,…, T6, all with same p.d.f. f(t) = exp(- t) 1. Form the likelihood and then the corresponding log-likelihood 11. Maximum Likelihood Estimation Likelihood for continuous distributions Example 1: The following data are a small part of a dataset on coal mining disasters; the numbers are times in days between major disasters: 157, 33, 186, 78, 538, 3. One model for such data assumes that the times are independent random variables T1,…, T6, all with same p.d.f. f(t) = exp(- t) 2. Maximise the log-likelihood w.r.t. and obtain the MLEs (estimator and estimate) and setting this to zero gives the maximum-likelihood estimator: 11. Maximum Likelihood Estimation Likelihood for continuous distributions Example 1: The following data are a small part of a dataset on coal mining disasters; the numbers are times in days between major disasters: 157, 33, 186, 78, 538, 3. One model for such data assumes that the times are independent random variables T1,…, T6, all with same p.d.f. f(t) = exp(- t) 2. Maximise the log-likelihood w.r.t. and obtain the MLEs (estimator and estimate) Plugging in the observed values for T1,…, T6, we get an estimate: 11. Maximum Likelihood Estimation Likelihood for continuous distributions • Example 2: Suppose that we have some observations x1,…, xn which we wish to model as observations of i.i.d. r.v.’s from a normal distribution with unknown mean and unknown variance 2, to be estimated. 1. Form the likelihood and then the corresponding log-likelihood 2. Maximise the log-likelihood w.r.t. and 2 and obtain the MLEs 11. Maximum Likelihood Estimation Likelihood for continuous distributions 1. Form the likelihood and then the corresponding log-likelihood 11. Maximum Likelihood Estimation Likelihood for continuous distributions 2. Maximise the log-likelihood w.r.t. and 2 and obtain the MLEs First, we find the partial derivative w.r.t. and setting this this to zero gives: so, 11. Maximum Likelihood Estimation Likelihood for continuous distributions 2. Maximise the log-likelihood w.r.t. and 2 and obtain the MLEs Then, we find the partial derivative w.r.t. and setting this to zero gives: so, 11. Maximum Likelihood Estimation Likelihood for continuous distributions To sum up, we have that: and • Note that the maximum likelihood estimator of the variance 2 is NOT the sample variance s2 • In general, MLEs are biased (but the bias tends to zero as the sample size gets larger) • The MLEs do have the advantage of being consistent 11. Maximum Likelihood Estimation Likelihood for continuous distributions • Example 3: Suppose that we have some observations x1,…, xn which we wish to model as observations of i.i.d. r.v.’s from a Weibull distribution with unknown parameters and , to be estimated. 1. Form the log-likelihood 2. Maximise the log-likelihood w.r.t. and and obtain the MLEs 11. Maximum Likelihood Estimation Likelihood for continuous distributions • Example 3: Suppose that we have some observations x1,…, xn which we wish to model as observations of i.i.d. r.v.’s from a Weibull distribution with unknown parameters and , to be estimated. 1. Form the log-likelihood 11. Maximum Likelihood Estimation Likelihood for continuous distributions • Example 3: Suppose that we have some observations x1,…, xn which we wish to model as observations of i.i.d. r.v.’s from a Weibull distribution with unknown parameters and , to be estimated. 2. Maximise the log-likelihood w.r.t. and and obtain the MLEs One ends up with a nonlinear equation in that cannot be solved in closed form. We need to use optimisation routines, e.g. optim with program R, to find the maximum of the log-likelihood (or equivalently the minimum of the negative log-likelihood) 11. Maximum Likelihood Estimation Invariance of MLEs The invariance property of maximum likelihood estimators: Example 1: suppose that x1,…, xn are observations N(,2). Find the MLE of . We saw that the MLE for 2 is: If we consider the one-to-one function: Then the invariance property says that the MLE for is: 11. Maximum Likelihood Estimation Invariance of MLEs The invariance property of maximum likelihood estimators: Example 2: suppose that x1,…, xk are observations on independent binomial r.v.’s, each with n trials and unknown probability p. The likelihood of p is: Find the MLE of p, and deduce the MLE of the mean of the Bin(n,p) distribution 11. Maximum Likelihood Estimation Invariance of MLEs Example: suppose that x1,…, xk are observations on independent binomial r.v.’s, each with n trials and unknown probability p. Find the MLE of p, and deduce the MLE of the mean of the Bin(n,p) distribution 11. Maximum Likelihood Estimation Invariance of MLEs Example: suppose that x1,…, xk are observations on independent binomial r.v.’s, each with n trials and unknown probability p. Find the MLE of p, and deduce the MLE of the mean of the Bin(n,p) distribution and setting this to zero gives: 11. Maximum Likelihood Estimation Invariance of MLEs Example: suppose that x1,…, xk are observations on independent binomial r.v.’s, each with n trials and unknown probability p. Find the MLE of p, and deduce the MLE of the mean of the Bin(n,p) distribution