docx - Tsinghua Math Camp 2015

advertisement
Probability & Statistics
Professor Wei Zhu
July 30th
Today’s topic: How to evaluate an estimator?
Part I. Mean Squared Error (M.S.E.)
Definition: Mean Squared Error (MSE)
Let T=t(X1, X2, …, Xn ) be an estimator of θ, then the M.S.E. of the estimator T is
defined as :
MSEt (θ) = E[(T − θ)2 ]: average squared distance from T to θ
=E[(T − E(T) + E(T) − θ)2 ]
2
=E [(T − E(T)) ] + E[(E(T) − θ)2 ] + 2E[(T − E(T))(E(T) − θ)]
2
=E [(T − E(T)) ] + E[(E(T) − θ)2 ] + 0
=Var(T) + (E(T) − θ)2
Here |E(T) − θ| is “the bias of T ”
If unbiased, (E(T) − θ)2 = 0.
The estimator has smaller mean-squared error is better.
Example. Let X1, X2, …, Xn
i. i. d
N(μ, σ2 )
~
n
̅ 2
∑ (X −X)
M.L.E. for μ is μ̂ = ̅
X ; M.L.E. for σ2 is ̂
σ2 = i=1 i
n
1. M.S.E. of ̂σ2 ?
2. M.S.E. of S2 as an estimator of σ2
Solution.
1.
2
MSEσ̂2 (σ2 ) = E [(σ
̂2 − σ2 ) ] = Var(σ
̂2 ) + (E(σ
̂2 ) − σ2 )2
1
To get Var(σ
̂2 ), there are 2 approaches.
a. By the first definition of the Chi-square distribution.
Note Xi ~N(μ, σ2 ); W =
E(W) =
̅)2
∑ni=1(Xi −X
σ2
1
~χn−1 2 , Gamma(λ = 2 , r =
n−1
2
)
r
r
= n − 1; Var(W) = 2 = 2(n − 1)
λ
λ
σ4
W
σ4
Var(σ
̂2 ) = Var( n σ2 ) = n2 Var(W) = n2 2(n − 1)
b. By the second definition of the Chi-squre distribution.
For Z~N(0,1), W=∑ni=1 Zi 2
2
2
Var(Z2 ) = E [(Z2 − E(Z2 )) ] = E [(Z2 − (var(Z2 ) + E(Z))) ]
= E[(Z2 − 1)2 ]
Since Var(Z) = E(Z2 ) − E(Z) = 1 from Z~N(0,1), E(Z2 ) = 1 =
E[Z4 − 2E(Z2 ) + 1]
= E(Z4 ) − 1
Calculate the 4th moment of Z~N(0,1) using the mgf of Z;
MZ (t) = et
2 /2
M′Z (t) = tet
2 /2
M′′Z (t) = tet
2 /2
M(3) Z (t) = 3tet
+ t2 et
2 /2
2 /2
+ t2 et
2
2 /2
2
2
M(4) Z (t) = 3et /2 + 6t2 et /2 + t4 et /2
Set t  0 , M(4) Z (0) = 3 = E(Z4 )
Var(Z2 ) = 3 − 1 = 2
n−1
Var(W) = ∑ Var(Zi 2 ) = 2(n − 1)
i=1
σ
σ4
W, Var(σ
̂2 ) = 2 2(n − 1)
n
n
2
MSEσ̂2 (σ ) = Var(σ
̂2 ) + [E(σ
̂2 ) − σ2 ]2
2(n − 1) 4
n−1 2
=
σ + [E (
S ) − σ2 ]2
2
n
n
2
2(n − 1) 4
n−1 2
2n − 1 4
2
2
2
=
σ
+
[
σ
−
σ
]
(we
know
E(S
)
=
σ
)
=
σ
n2
n
n2
σ2 =
̂
2
2
The M.S.E. of σ
̂2 is
2n−1
n2
σ4
We know S 2 is an unbiased estimator of σ2
2
σ2 W
σ2
2σ4
E [(S − σ ) ] = Var(S ) + 0 = Var (
)=(
) var(W) =
n−1
n−1
n−1
2
2 2
2
Let’s have some fun!
Exercise: Compare the MSE of σ
̂2 =
̅)2
∑ni=1(Xi −X
n
and σ
̂2 = S 2 =
̅)2
∑ni=1(Xi −X
n−1
.
Which one is a better estimator (in terms of the MSE)?
Part II. Cramer-Rao Lower Bound, Efficient Estimator, Best Estimator
̂1 , θ
̂2 , θ
̂3 …
Unbiased Estimator of θ, say θ
̂ i ) when there are many of them.
It could be really difficult for us to compare Var(θ
Theorem. Cramer-Rao Lower Bound
̂=
Let Y1 , Y2 , … , Yn be a random sample from a population with p.d.f. f(y; θ). Let θ
h(y1 , y2 , … , yn ) be an unbiased estimator of θ.
Given some regularity conditions (continuous differentiable etc.) and the domain of f(yi ; θ).
does not depend on θ. Then, we have
Var(θ̂) ≥
1
1
=
∂lnf(θ) 2
∂2 lnf(θ)
nE[(
) ] −nE [
]
∂θ
∂θ2
Theorem. Properties of the MLE
Let Yi
i. i. d.
f(y; θ), i = 1,2, … , n
~
̂ be the MLE of θ, then
Let θ
θ̂ ⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑⃑
n → ∞ N (θ,
1
)
∂lnf(θ) 2
nE[(
) ]
∂θ
The MLE is asymptotically unbiased and its asymptotic variance : C-R lower bound
3
Harald Cramér (left) was born in Stockholm, Sweden on September 25, 1893, and died there on
October 25, 1985 . (wiki). Calyampudi Radhakrishna Rao (right), FRS known as C R Rao (born 10
September 1920) is an Indian statistician. He is professor emeritus at Penn State University and
Research Professor at the University at Buffalo. Rao was awarded the US National Medal of
Science in 2002. (wiki)
Example 1. Let Y1 , Y2 , … , Yn
i. i. d.
Bernoulli(p)
~
1. MLE of p ?
2. What are the mean and variance of the MLE of p ?
3. What is the Cramer-Rao lower bound for an unbiased estimator of p ?
Solution. P(Y = y) = f(y; p) = py (1 − p)1−y , y = 0,1
1.
L = ∏ni=1 f(yi ; p)
= ∏ni=1[pyi (1 − p)1−yi ] = p∑ yi (1 − p)n−∑ yi
l = lnL = (∑ yi ) lnp + (n − ∑ yi )ln(1 − p)
Solving:
4
dl
dp
=
∑ yi
p
−
n − ∑ yi
1−p
= 0,
we have the MLE:
∑ni=1 Yi
p̂ =
n
2.
𝐸(p̂) = p, 𝑉𝑎𝑟(p̂) =
p(1 − p)
n
3.
lnf(y, p) = ylnp + (1 − y) ln(1 − p)
∂ln f(y, p) y 1 − y
= −
∂p
p 1−p
2
∂ ln f(y, p)
y
1−y
= 2−
2
∂p
p
(1 − p)2
Y
1−Y
1
E [− 2 −
]
=
−
(1 − p)2
p
p(1 − p)
C-R lower bound
∂2 lnf(y, p) −1 p(1 − p)
Var(p̂) ≥ {−nE[
]} =
∂p2
n
Thus, the MLE of p is unbiased and its variance = C-R lower bound.
Definition. Efficient Estimator
̂ is an unbiased estimator of θ and its variance = C-R lower bound, then θ
̂ is an efficient
If θ
estimator of θ.
Definition. Best Estimator
̂ is an unbiased estimator of θ and var( θ
̂ )≤ var(θ̃) for all unbiased estimator θ̃, then θ
̂ is a
If θ
best estimator for θ.
Efficient Estimator
True under the C − R Regularity Conditions
⇌
Best Estimator
May not be true
5
Example 2. If Y1 , Y2 , … , Yn is a random sample from f(y; θ) =
2y
θ2
3
, 0 < 𝑦 < 𝜃, then θ̂ = 2 ̅
Y is
an unbiased estimator for θ.
̂ ) and 2. C-R lower bound for fY (y; θ)
Compute 1. Var(θ
Solution.
1.
n
n
i=1
i=1
3
9
1
9
̅) = Var ( ∑ Yi ) =
Var(θ̂) = Var ( Y
∑ V ar(Yi )
2
4
n
4n2
θ
θ
2y
2y
θ2
Var(Yi ) = E(Yi 2 ) − [E(Yi )]2 = ∫ y 2 2 dy − [∫ y 2 2 dy]2 =
θ
θ
18
0
0
2
2
̂) = 9 nθ = θ
Therefore, Var(θ
4n2 18 8n
2. C-R lower bound
2y
ln fY (y; θ) = ln ( 2 ) = ln2y − 2lnθ
θ
∂ln fY (y; θ)
2
=−
∂θ
θ
2
θ
∂ ln fY (y; θ)
4
4 2y
4
E[(
)] = E( 2 ) = ∫ 2 2 dy = 2
∂θ
θ
θ
0 θ θ
1
∂ ln fY (y; θ)2
nE[(
)]
∂θ
=
θ2
4n
The value of 1 is less than the value of 2. But, it is NOT a contradiction to the theorem. Because
the domain, 0 < 𝑦 < 𝜃, depends on θ. Thus the C-R Theorem doesn’t hold for this problem.
i .i .d .
Example 3. Let X 1 ,
, X n ~ N (  ,  2 ) , be a random sample from the normal
population where both  and  2 are unknown. Please derive
(1) The maximum likelihood estimators for  and  2 .
(2) The best estimator for  assuming that  2 is known.
Solution:
(1) MLEs for  and  2 . The Likelihood function is:
6
n
n
n
1
L   f ( xi ;  , 2 )  
i 1
2
i 1
2
e

( xi   )
2 2
n
2
(
1
2
)e
2
 ( xi  )2
 i 1
2 2
n
ln L  (n)ln( 2 2 ) 
 (x  )
i 1
2
i
2 2
n
X
 ln L
 2 ( xi   )  0  ˆ   i

n
i 1
n
 ln L
1
 ( n ) 2 
2
2


(x  )
i
i 1
4
2
n
2
 0  ˆ 2 
 ( x  ˆ )
i 1
i
n
n
2

(x  x )
i 1
2
i
n
(2) Use the Cramer-Rao lower bound:
1
f X (  , ) 
2
2 2
e

( x   )2
2 2
( x   )2
ln f  ln(
)
2 2
2 2
d ln f ( x   )

d
2
1
d 2 ln f
1
 2
2
d

Hence, the C-R lower bound of variance is
2
n
.
Since the normal pdf satisfies all the regularity conditions for the C-R lower bound
theorem to hold, and since
̅ ~𝑵 (𝜇,
𝑿
𝜎2
),
𝑛
̅ equals to the C-R lower bound, and thus this unbiased estimator
The variance of 𝑿
is an efficient estimator for 𝝁, and thus it is also a best estimator for 𝝁.
7
Download