Probability & Statistics Professor Wei Zhu July 30th Today’s topic: How to evaluate an estimator? Part I. Mean Squared Error (M.S.E.) Definition: Mean Squared Error (MSE) Let T=t(X1, X2, …, Xn ) be an estimator of θ, then the M.S.E. of the estimator T is defined as : MSEt (θ) = E[(T − θ)2 ]: average squared distance from T to θ =E[(T − E(T) + E(T) − θ)2 ] 2 =E [(T − E(T)) ] + E[(E(T) − θ)2 ] + 2E[(T − E(T))(E(T) − θ)] 2 =E [(T − E(T)) ] + E[(E(T) − θ)2 ] + 0 =Var(T) + (E(T) − θ)2 Here |E(T) − θ| is “the bias of T ” If unbiased, (E(T) − θ)2 = 0. The estimator has smaller mean-squared error is better. Example. Let X1, X2, …, Xn i. i. d N(μ, σ2 ) ~ n ̅ 2 ∑ (X −X) M.L.E. for μ is μ̂ = ̅ X ; M.L.E. for σ2 is ̂ σ2 = i=1 i n 1. M.S.E. of ̂σ2 ? 2. M.S.E. of S2 as an estimator of σ2 Solution. 1. 2 MSEσ̂2 (σ2 ) = E [(σ ̂2 − σ2 ) ] = Var(σ ̂2 ) + (E(σ ̂2 ) − σ2 )2 1 To get Var(σ ̂2 ), there are 2 approaches. a. By the first definition of the Chi-square distribution. Note Xi ~N(μ, σ2 ); W = E(W) = ̅)2 ∑ni=1(Xi −X σ2 1 ~χn−1 2 , Gamma(λ = 2 , r = n−1 2 ) r r = n − 1; Var(W) = 2 = 2(n − 1) λ λ σ4 W σ4 Var(σ ̂2 ) = Var( n σ2 ) = n2 Var(W) = n2 2(n − 1) b. By the second definition of the Chi-squre distribution. For Z~N(0,1), W=∑ni=1 Zi 2 2 2 Var(Z2 ) = E [(Z2 − E(Z2 )) ] = E [(Z2 − (var(Z2 ) + E(Z))) ] = E[(Z2 − 1)2 ] Since Var(Z) = E(Z2 ) − E(Z) = 1 from Z~N(0,1), E(Z2 ) = 1 = E[Z4 − 2E(Z2 ) + 1] = E(Z4 ) − 1 Calculate the 4th moment of Z~N(0,1) using the mgf of Z; MZ (t) = et 2 /2 M′Z (t) = tet 2 /2 M′′Z (t) = tet 2 /2 M(3) Z (t) = 3tet + t2 et 2 /2 2 /2 + t2 et 2 2 /2 2 2 M(4) Z (t) = 3et /2 + 6t2 et /2 + t4 et /2 Set t 0 , M(4) Z (0) = 3 = E(Z4 ) Var(Z2 ) = 3 − 1 = 2 n−1 Var(W) = ∑ Var(Zi 2 ) = 2(n − 1) i=1 σ σ4 W, Var(σ ̂2 ) = 2 2(n − 1) n n 2 MSEσ̂2 (σ ) = Var(σ ̂2 ) + [E(σ ̂2 ) − σ2 ]2 2(n − 1) 4 n−1 2 = σ + [E ( S ) − σ2 ]2 2 n n 2 2(n − 1) 4 n−1 2 2n − 1 4 2 2 2 = σ + [ σ − σ ] (we know E(S ) = σ ) = σ n2 n n2 σ2 = ̂ 2 2 The M.S.E. of σ ̂2 is 2n−1 n2 σ4 We know S 2 is an unbiased estimator of σ2 2 σ2 W σ2 2σ4 E [(S − σ ) ] = Var(S ) + 0 = Var ( )=( ) var(W) = n−1 n−1 n−1 2 2 2 2 Let’s have some fun! Exercise: Compare the MSE of σ ̂2 = ̅)2 ∑ni=1(Xi −X n and σ ̂2 = S 2 = ̅)2 ∑ni=1(Xi −X n−1 . Which one is a better estimator (in terms of the MSE)? Part II. Cramer-Rao Lower Bound, Efficient Estimator, Best Estimator ̂1 , θ ̂2 , θ ̂3 … Unbiased Estimator of θ, say θ ̂ i ) when there are many of them. It could be really difficult for us to compare Var(θ Theorem. Cramer-Rao Lower Bound ̂= Let Y1 , Y2 , … , Yn be a random sample from a population with p.d.f. f(y; θ). Let θ h(y1 , y2 , … , yn ) be an unbiased estimator of θ. Given some regularity conditions (continuous differentiable etc.) and the domain of f(yi ; θ). does not depend on θ. Then, we have Var(θ̂) ≥ 1 1 = ∂lnf(θ) 2 ∂2 lnf(θ) nE[( ) ] −nE [ ] ∂θ ∂θ2 Theorem. Properties of the MLE Let Yi i. i. d. f(y; θ), i = 1,2, … , n ~ ̂ be the MLE of θ, then Let θ θ̂ ⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑ n → ∞ N (θ, 1 ) ∂lnf(θ) 2 nE[( ) ] ∂θ The MLE is asymptotically unbiased and its asymptotic variance : C-R lower bound 3 Harald Cramér (left) was born in Stockholm, Sweden on September 25, 1893, and died there on October 25, 1985 . (wiki). Calyampudi Radhakrishna Rao (right), FRS known as C R Rao (born 10 September 1920) is an Indian statistician. He is professor emeritus at Penn State University and Research Professor at the University at Buffalo. Rao was awarded the US National Medal of Science in 2002. (wiki) Example 1. Let Y1 , Y2 , … , Yn i. i. d. Bernoulli(p) ~ 1. MLE of p ? 2. What are the mean and variance of the MLE of p ? 3. What is the Cramer-Rao lower bound for an unbiased estimator of p ? Solution. P(Y = y) = f(y; p) = py (1 − p)1−y , y = 0,1 1. L = ∏ni=1 f(yi ; p) = ∏ni=1[pyi (1 − p)1−yi ] = p∑ yi (1 − p)n−∑ yi l = lnL = (∑ yi ) lnp + (n − ∑ yi )ln(1 − p) Solving: 4 dl dp = ∑ yi p − n − ∑ yi 1−p = 0, we have the MLE: ∑ni=1 Yi p̂ = n 2. 𝐸(p̂) = p, 𝑉𝑎𝑟(p̂) = p(1 − p) n 3. lnf(y, p) = ylnp + (1 − y) ln(1 − p) ∂ln f(y, p) y 1 − y = − ∂p p 1−p 2 ∂ ln f(y, p) y 1−y = 2− 2 ∂p p (1 − p)2 Y 1−Y 1 E [− 2 − ] = − (1 − p)2 p p(1 − p) C-R lower bound ∂2 lnf(y, p) −1 p(1 − p) Var(p̂) ≥ {−nE[ ]} = ∂p2 n Thus, the MLE of p is unbiased and its variance = C-R lower bound. Definition. Efficient Estimator ̂ is an unbiased estimator of θ and its variance = C-R lower bound, then θ ̂ is an efficient If θ estimator of θ. Definition. Best Estimator ̂ is an unbiased estimator of θ and var( θ ̂ )≤ var(θ̃) for all unbiased estimator θ̃, then θ ̂ is a If θ best estimator for θ. Efficient Estimator True under the C − R Regularity Conditions ⇌ Best Estimator May not be true 5 Example 2. If Y1 , Y2 , … , Yn is a random sample from f(y; θ) = 2y θ2 3 , 0 < 𝑦 < 𝜃, then θ̂ = 2 ̅ Y is an unbiased estimator for θ. ̂ ) and 2. C-R lower bound for fY (y; θ) Compute 1. Var(θ Solution. 1. n n i=1 i=1 3 9 1 9 ̅) = Var ( ∑ Yi ) = Var(θ̂) = Var ( Y ∑ V ar(Yi ) 2 4 n 4n2 θ θ 2y 2y θ2 Var(Yi ) = E(Yi 2 ) − [E(Yi )]2 = ∫ y 2 2 dy − [∫ y 2 2 dy]2 = θ θ 18 0 0 2 2 ̂) = 9 nθ = θ Therefore, Var(θ 4n2 18 8n 2. C-R lower bound 2y ln fY (y; θ) = ln ( 2 ) = ln2y − 2lnθ θ ∂ln fY (y; θ) 2 =− ∂θ θ 2 θ ∂ ln fY (y; θ) 4 4 2y 4 E[( )] = E( 2 ) = ∫ 2 2 dy = 2 ∂θ θ θ 0 θ θ 1 ∂ ln fY (y; θ)2 nE[( )] ∂θ = θ2 4n The value of 1 is less than the value of 2. But, it is NOT a contradiction to the theorem. Because the domain, 0 < 𝑦 < 𝜃, depends on θ. Thus the C-R Theorem doesn’t hold for this problem. i .i .d . Example 3. Let X 1 , , X n ~ N ( , 2 ) , be a random sample from the normal population where both and 2 are unknown. Please derive (1) The maximum likelihood estimators for and 2 . (2) The best estimator for assuming that 2 is known. Solution: (1) MLEs for and 2 . The Likelihood function is: 6 n n n 1 L f ( xi ; , 2 ) i 1 2 i 1 2 e ( xi ) 2 2 n 2 ( 1 2 )e 2 ( xi )2 i 1 2 2 n ln L (n)ln( 2 2 ) (x ) i 1 2 i 2 2 n X ln L 2 ( xi ) 0 ˆ i n i 1 n ln L 1 ( n ) 2 2 2 (x ) i i 1 4 2 n 2 0 ˆ 2 ( x ˆ ) i 1 i n n 2 (x x ) i 1 2 i n (2) Use the Cramer-Rao lower bound: 1 f X ( , ) 2 2 2 e ( x )2 2 2 ( x )2 ln f ln( ) 2 2 2 2 d ln f ( x ) d 2 1 d 2 ln f 1 2 2 d Hence, the C-R lower bound of variance is 2 n . Since the normal pdf satisfies all the regularity conditions for the C-R lower bound theorem to hold, and since ̅ ~𝑵 (𝜇, 𝑿 𝜎2 ), 𝑛 ̅ equals to the C-R lower bound, and thus this unbiased estimator The variance of 𝑿 is an efficient estimator for 𝝁, and thus it is also a best estimator for 𝝁. 7