Slide set 26 Stat 330 (Spring 2015) Last update: March 30, 2015 Stat 330 (Spring 2015): slide set 26 Estimators (Cont’d) Review: What is an estimator? What are estimates? properties we use to compare estimators? What are the Example: The sample mean x̄ is consistent for µ. That means that, as the sample size gets large, then X̄ gets very close to µ in the sense of probability. Derivation: Using Chebyshev’s inequality, Var(X̄) σ2 P (|X̄ − µ| > ) ≤ = 2 2 n so that if n → ∞ P (|X̄ − µ| > ) → 0 which means X̄ is consistent for µ. 1 Stat 330 (Spring 2015): slide set 26 Estimating parameters Method of moments: is one of the methods available for estimate parameters. The basic idea is to equate the sample moments with the population moments based on the sample (x1, · · · , xn), where xi are the observed values of Xi for i = 1, . . . , n. ♣ The k-th population moment is defined as µk = E(X k ) ♦ For example: E(X) = µ, Var(X) = µ2 − µ21 ♣ The k-th sample moments is defined as n P −1 mk = n xki , i=1 ♦ For example: m1 = n P xi/n = x̄ i=1 2 Stat 330 (Spring 2015): slide set 26 Estimating parameters (Cont’d) ♣ The k-th population central moments is defined as µ0k = E((X − µ)k ) ♣ The k-th sample central moments is defined as m0k = n−1 n X (xi − x̄)k , xi is the realization/sample value of Xi, i=1 where x̄ is the sample mean. ♠ To estimate k parameters, equate the first k population and sample moments µi = mi, i = 1, 2, · · · , k So we have k equations to solve. 3 Stat 330 (Spring 2015): slide set 26 Estimating parameters: Examples Example 1: To estimate the parameter λ of Poisson distribution, we need to set µ1 = m1 ⇐⇒ λ = m1 = x̄ since µ1 = E(X) = λ Only one unknown is there; solving for λ we obtain λ̂ = x̄ which is the method of moment estimator (short as MoM) of λ. Example 2: (Roll a die) A die is rolled until a face shows a 6. Repeating this experiment 20 times, and the number of trials needed to show a 6 are 3, 9, 1, 6, 11, 10, 4, 1, 1, 4, 4, 10, 17, 1, 23, 3, 2, 7, 2, 3 Estimate the probability of getting 6, say θ when rolling a die using method of moments. 4 Stat 330 (Spring 2015): slide set 26 Solution: X is the number of trials needed for getting a 6 and x is the value of X for each realization. This is a geometric random variable with parameter θ, i.e. P (X = x) = θ(1 − θ)x−1. The method of moment gives equation 20 1 X xi since µ1 = E(X) = 1/θ µ1 = m1 ⇐⇒ 1/θ = m1 = 20 i=1 so that for given realization xi, θ̂ = 20/( 20 P xi) = 0.16393 i=1 Example 3: Assume that IQ score follows a normal distribution. Suppose we already know that the mean IQ score of human population is µ = 100 and would like to estimate the variance σ 2 based on the IQ score of randomly selected 20 people: 92, 115, 103, 81, 107, 95, 92, 118, 99, 124, 90, 87, 108, 103, 91, 74, 84, 124, 81, 100 5 Stat 330 (Spring 2015): slide set 26 Solution: Only one parameter is unknown, σ 2. µ2 = E(X 2) = σ 2 + E(X)2 = σ 2 + µ21 and We know that 20 1 X 2 m2 = x , for given realization m2 = 9879.5 20 i=1 i so that µ2 = m2 ⇔ σˆ2 = m2 − µ21. If using µ1 = 100 then σˆ2 = 9879.5 − 1002 < 0!! problem? Using central moment, µ02 = m02 How to resolve this 1 X ⇔ σ = E((X − µ) ) = (xi − x̄)2 20 2 2 so that σˆ2 = 196.94. That is an estimate of σ 2 is obtained directly. 6 Stat 330 (Spring 2015): slide set 26 Estimating parameters: Maximum likelihood estimation (MLE) Situation and motivation: We have n data values (some numbers, e.g.), and assume that these data values are realizations of n i.i.d. random variables X1, . . . , Xn with distribution Fθ . Unfortunately the value for θ is unknown. Motivation: By changing the value for θ we can ”move the density function fθ around” (below), so that the density function fits the data best. 7 Stat 330 (Spring 2015): slide set 26 Principle: since we do not know the true value θ of the distribution, we take that value θ̂ that most likely produced the observed values x1, · · · , xn Sketch of idea: ”Most likely” is equivalent to maximize Pθ (X1 = x1 ∩ X2 = x2 ∩ . . . ∩ Xn = xn) Xi are independent! = = or = Pθ (X1 = x1) · Pθ (X2 = x2) · . . . · Pθ (Xn = xn) n Y Pθ (Xi = xi) i=1 n Y fθ (xi) i=1 Qn i.e. θ̂ is the maximizer of i=1 fθ (xi). Qn ♠ Summary: θ̂ = argmaxθ i=1 fθ (xi) 8 Stat 330 (Spring 2015): slide set 26 Estimating parameters: Maximum likelihood estimation Maximum likelihood: (a) The joint probability density function (for continuous random variables) or joint probability mass function (for discrete random variables) for a randomm sample X1, · · · , Xn is defined as Qn Qn L(θ) = L(θ; X1 = x1, · · · , Xn = xn) = i=1 Pθ (xi) or i=1 fθ (xi) The above function is a function of θ ♠ L(θ) is called the Likelihood function of X1, · · · , Xn. (b) The maximum likelihood estimator of θ is the maximizer of likelihood L(θ) on its parameter space Θ (c) How do we get a maximum of L(θ)? Differentiate it and set it to zero, or try to find the θ to maximize the likelihood directly. 9 Stat 330 (Spring 2015): slide set 26 (d) Very often, it is difficult to find a derivative of L(θ) - instead we use another trick, and find a maximum for l(θ) = log L(θ), the Log-Likelihood function. ♣ Five steps for finding MLE: 1. Find Likelihood function L(θ). 2. Get natural log of Likelihood function l(θ) = log L(θ). 3. Differentiate log-Likelihood function with respect to θ. 4. Set derivative to zero. 5. Solve for θ. 10 Stat 330 (Spring 2015): slide set 26 Examples: MLE Example (Rolling dies): Find the MLE for the probability of getting a face 6, θ, in the ”Roll a Die” example: repeating this experiment 100 times gave the following results, x # 1 18 2 20 3 8 4 9 5 9 6 5 7 8 8 3 9 5 11 3 14 3 15 3 16 1 17 1 20 1 21 1 27 1 29 1 ♣ Recall: We know, that X, the number of rolls until a 6 shows up has a geometric distribution Geop and x in the table is its realization. ♣ For a fair die, p is 1/6, and the Geometric distribution has probability mass function p(x) = (1 − p)x−1 · p. Step 1: Find the likelihood function L(θ): Since we have observed 100 Q100 outcomes x1, ..., x100, the likelihood function L(p) = i=1 p(xi), L(p) = 100 Y i=1 fp(xi) = 100 Y i=1 (1 − p)xi−1p = p100 · 100 Y (1 − p)xi−1 i=1 11 Stat 330 (Spring 2015): slide set 26 =p 100 P100 · (1 − p) i=1 (xi −1) =p 100 · (1 − p) P100 i=1 xi −100 . Step 2: log of Likelihood function log L(p): P100 l(p) = log L(p) = log p100 · (1 − p) i=1 xi−100 P100 = log p100 + log (1 − p) i=1 xi−100 ! 100 X = 100 log p + xi − 100 log(1 − p). i=1 Step 3: Differentiate log-Likelihood with respect to p: d d 100 l(p) = log L(p) = − dp dp p 100 X i=1 ! xi − 100 1 1−p 12 Stat 330 (Spring 2015): slide set 26 = = 1 p(1 − p) 1 p(1 − p) 100(1 − p) − 100 X ! ! xi − 100 p i=1 100 − p 100 X ! xi . i=1 Step 4: Set derivative to zero: For the estimate p̂ the derivative must be zero: ! 100 X d 1 log L(p̂) = 0 ⇐⇒ 100 − p̂ xi = 0 dp p̂(1 − p̂) i=1 Step 5: Solve for p̂: 1 p̂(1 − p̂) 100 − p̂ 100 X i=1 ! xi = 0 ⇐⇒ 100 − p̂ 100 X xi = 0 i=1 13 Stat 330 (Spring 2015): slide set 26 so that 100 p̂ = P100 i=1 xi = 1 100 1 P100 i=1 xi . ♠ In total, we have an estimate 100 p̂ = = 0.1710 568 for the given xi from the above table. 14