Methods of Estimation We have discussed the properties of a good estimator. How do you obtain a good estimator for an unknown parameter? (1) Method of Moments (Page 472-475) The method of moments is a very simply procedure of finding an estimator for one or more population parameters. Recall the sample mean X̄ is an unbiased and consistent estimator for the population mean EX . - the first population moment X̄ - the first sample moment The estimator for , can be obtained by letting the first population moment the first sample moment X̄ , then we have ̂ m.o.m. X̄ . Recall: the kth population moment k ∑ i px i x ki ; or fxx k dx. the kth sample moment m k 1 n ∑ i1 x ki . n Method of moments is to choose the estimators such that the population moments can be matched by the corresponding sample moments up to the l th moments, where l the number of parameters that are being estimated or the number of equations that you need to solve for all the parameters you want to estimate. For instance, N1, 2 with a given but need to estimate 2 . 1 X̄ , Let 2 m 2 , ..., Here we have l l m l . equations and l unknowns, so that our l unknown parameters can be estimated by solving these equations. Previous Exercise: Assume i.i.d. X i U, 1, i 1, 2, . . . , n with n 2k. How do we estimate ? How do we find ̂ ? (Hint: for uniform [ 1 , 2 ], 2 − 1 2 1 2 EX i 2 , VarX i .) 12 The number of population parameter is 1, so l 1. Let EX 1 X̄ . EX 0. 5. By the method of moment: Letting 0. 5 X̄ , solve for : ̂ X̄ − 0. 5 is the method of moment estimator for . Denote: ̂ m.o.m. X̄ − 0. 5. Example 5. Using a sample X i N, 2 , i 1, . . . , n, find the method of moments estimators for , and 2 . HW: Show that n 1 n ∑ X 2i − X̄ 2 i1 n 1 n ∑X i − X̄ 2 . i1 Example 6. Using a sample X i Gamma, , i 1, . . . , n, find the method of moments estimators for , and. EX i , VarX i 2 nX̄ 2 ̂ Answer: m.o.m. n−1S 2 , and ̂ m.o.m. n−1S 2 . nX̄ 2. The Method of MLE (Page 476-481): Suppose X 1 , X 2 , . . . , X n is randomly selected sample from a population with probability function Px| 1 , 2 , . . . , k (for discrete case) or density function fx| 1 , 2 , . . . , k (for continuous case). We choose the estimates of these parameters 1 , 2 , . . . , k so that the likelihood function L can be maximized. What is a likelihood function? A likelihood function is the joint probability function or joint density function of the observed sample. Example 1 (a discrete example): Suppose X 1 , X 2 , . . . , X n is a random sample from a Binomial distribution with n trials, and with success probability, p. Let X i 1 if ith trial was a success, and X i 0 if ith trial was a failure. The likelihood function is: Lp|x 1 , x 2 , . . . , x n Px 1 , x 2 , . . . , x n |p Independence Px 1 |pPx 2 |p. . . Px n |p ni1 Px i |p. Let Y the number of successes, i.e. n Y ∑ Xi, i1 then the likelihood function is: Lp|x 1 , x 2 , . . . , x n p y 1 − p n−y . Now we try to estimate p by maximizing L over p. Note that: Maximizing L Maximizing lnL. lnL y ln p n − y ln1 − p ∂lnL y n−y p 1−p ∂p ∂lnL Let ∂p 0 y n−y p 1−p −1, y1 − p n − yp np y Solve for p: n p̂ MLE y n 1 n ∑ x i x̄ . i1 Example 2 (a continuous example): i.i.d A random sample X i N, 2 , i 1, . . . , n. Find the maximum likelihood estimators (MLEs) for both , and 2 . First we derive the likelihood function: L, 2 |x 1 , x 2 , . . . , x n fx 1 , x 2 , . . . , x n |, 2 Independence fx 1 |, 2 fx 2 |, 2 . . . fx n |, 2 ni1 fx i |, 2 ni1 1 2 1 2 e − x i − 2 2 2 n − ni1 x i − 2 2 e 2 . Taking a log makes things easier for maximization. lnL −n ln 2 − ∂lnL ∂ − ni1 2x i −−1 ∂lnL ∂ 2 − n 1 2 2 Let 2 2 − ∂lnL ∂ 0, ∂lnL ∂ 2 0. We have ni1 x i − 2 2 2 ni1 x i − 2 −1 4 ni1 x i − 0, − n 1 2 2 − ̂ MLE x̄ , ln − 2 ni1 x i − 2 2 Solve for , and 2 : ̂ 2MLE n 2 ni1 x i −x̄ 2 n . ni1 x i − 2 2 −1 4 0. The steps to follow in finding MLEs: Step 1: Write the likelihood function L Step 2: Taking log: If L is an exponential type of function, normally taking log makes the maximization easier. Then you may choose maximizing lnL instead of L. If L is a simple function or its maximum can be found easily directly from L, then skip Step 2. Step 3: Take derivative with respect to each unknown parameter that you are estimating and let these derivative functions 0. Step 4: Solve for these parameters. The solutions are the MLEs. Another example: i.i.d A random sample X i Uniform0, , i 1, . . . , n. Find the maximum likelihood estimator (MLE) for . The probability density function for X i is 1 fx i | , 0 ≤ x i ≤ ; 0, otherwise. , i 1, 2, . . . , n. First we derive the likelihood function: L|x 1 , x 2 , . . . , x n fx 1 , x 2 , . . . , x n | Independence fx 1 |fx 2 |. . . fx n | ni1 fx i | n 1 , 0 ≤ all x i ≤ for i 1, 2, . . . , n; 0, otherwise. Note that L is not maximized when L 0. So the maximizing has to be at least no less than all x ′i s. Namely, in order to make Δ L ≠ 0, ≥ maxx 1 , x 2 , . . . , x n x n . On the other hand, when ≥ x n , L 1n , which is a monotonically decreasing function of , its maximum reach at the smallest possible x n . Therefore, ̂ MLE x n . See the graph. This example doesn’t need to take a log of L. When the density or probability function’s domain is associated with the unknown parameter, pay attention to the piecewiseness of the likelihood function. A very good and important property that makes MLE particularly attractive is called its invariance property: If is the parameter associated with a distribution and we are interested in estimating some function of , say t, rather than itself. If ̂ MLE is the MLE for , for any function t, then t ̂ MLE is the MLE for t. From Example 2: ̂ 2MLE ni1 x i −x̄ 2 n , according to the invariance property of MLE, ̂ MLE ni1 x i −x̄ 2 n . From Example 1: What is the MLE of the variance of Y? Y bn, p, so VarY np1 − p. y Since p̂ MLE n x̄ , VarY MLE np̂ MLE 1 − p̂ MLE y y1 − n nx̄ 1 − x̄ . Some remarks for finding MLEs: (1) Ask yourself: is it necessary to take log before taking derivatives? Which is easier, ∂lnL solve ∂ 0 or ∂L 0? ∂ (2) Is the likelihood function is a piece-wise function (a function that is constrained over more than one given intervals) with a interval associated with the unknown parameter? (3) Remember to use MLE’s invariance property whenever you can. (4) Some MLEs are not easy to solve, see the link given in our course webpage for the MLEs of the parameters in a Gamma distribution: http://spartan.ac.brocku.ca/~xxu/minka-gamma