Stat 643 Spring 2010 Assignment 08 Asymptotic of Likelihood-Based Inference

advertisement
Stat 643 Spring 2010
Assignment 08
Asymptotic of Likelihood-Based Inference
In the following arguments, we adopt the notations Op (1), op (1). Specifically, we say a sequence of
random variables {Yn } is Op (1), or stochastically bounded, if and only if, ∀ > 0, there exists M > 0
such that P (|Yn | ≤ M ) ≥ 1 − . Note that convergence in distribution implies stochastically bounded.
p
We say {Yn }n≥1 is op (1) if and only if Yn →
− 0. One can see A+L or Shao for references.
65. (a) Let µ be the sum of two measures, one is the Lebesgue measure on (1, ∞), the other is the
point mass measure at 0. Then the R-N derivative of the probability measure of X with
respect to µ is λe−λx I{x 6= 0} + (1 − e−λ )I{x =P0} almost
surely. AdoptPthe notation in
P
n
n
−λ
δ
X
δ
i
i
i
i
(1 − e−λ )n− i δi , where δi =
the class, the likelihood-function is fλ (X ) = P
λ i e
I{Xi 6= 0}. Then under the event M = n − i δi = n, the likelihood function becomes
(1 − e−λ )n , which has no maximizer
∞). In all the other
one of δi
P for λ on (0,P
P cases, at least
n
−λ
is 1, notice that
log fλ (X ) = i δi log(λ) − λ i δi Xi + (n − i δi ) log(1 − e ). Further,
P
P
P
∂ n
n
i δi
0, the score
f (X ) = λ − i δi Xi + (n − i δi )/(1 − e−λ ). As we could see, when λ →P
∂λ λ
function converges to +∞, and when λ → ∞, the score function converges to − i δi Xi . By
the intermediate value theorem, the likelihood equation has at least one root on (0, ∞) in this
case. Therefore, Pλ (MLE exists) ≥ Pλ (M < n) = 1 − Pλ (Xi = 0, ∀i) = 1 − (1 − e−λ )n → 1 as
n → ∞, for any λ > 0.
(b) Let Tn = − log(1 − M/n), then by SLLN, M/n → 1 − e−λ , w.p.1., which implies Tn
converges
to λP almost surely
Pλ . A “1-step Newton improvement” would then be λ̃n =
P
P
Xi +(n− i δi )n/M
i δi /Tn − i δiP
P
Tn − − δi /T 2 −(n− δi )(n/M −1)n/M .
i
n
i
(c) In the event that the MLE exists, we can use the Newton-Raphson method with Tn as the
starting value. Basically, we are repeating the procedure as done in (b) and update each step
with a “1-step newton improvement” adjusted value.
(d) For a (1 − α)100% asymptotic confidence interval, we can use λ̂n ± z1−α/2 {nI1 (λ̂n )}−1/2 and
λ̂nP
±z1−α/2 {−L00n (λ̂nP
)}−1/2 , where Ln is the log-likelihood function. More specifically, L00n (λ) =
− i δi /λ2 − (n − i δi )e−λ /(1 − e−λ )2 , nI1 (λ) = Eλ (−L00n (λ)) = ne−λ /λ2 + ne−λ /(1 − e−λ ).
66. The log-likelihood function for the uncensored data (suppose the dominating measure is the
Lebesgue measure on R+ )is
n
X
`n (λ) = n log λ − λ
Xi
(1)
i=1
1
−30
−50
−70
Loglikelihood
−90
Uncencored
Cencored
0
1
2
3
4
λ
Figure 1: Log-likelihood function for the uncensored and censored data.
For the censored data, the R-N derivative of XI{X > 1}, with respect to the σ-finite measure
that is the sum of Lebesgue measure on (0, ∞) and point mass measure at 0, is (see problem 35)
X
X
X
`∗n (λ) =
δi log λ − λ
δi Xi + (n −
δi ) log(1 − e−λ ),
(2)
i
i
i
where δi = I{Xi > 1}. For the data given in this problem, their likelihood function is shown in
the Figure 1.
(a) Given the uncensored data, the 90% two-sided confidence set for λ is {λ | χ21,0.05 ≤ 2{`n (λ̂) −
`n (λ)} ≤ χ21,0.95 }. In this case, λ̂ = 1/X̄n = 0.9341, `n (λ̂) = −21.3625, χ21,0.05 = 0.00393, χ21,0.95 =
3.84146. Then one can solve some nonlinear equations and get the confidence interval of λ
(OK, a huge literature that is not the focus of this course!).
(b) Their relationship can be seen in Figure 1.
(c) The Fisher-information in λ about XI{X > 1} evaluated at λ = λ0 is
I1∗ (λ0 ) = Eλ0 δ/λ20 + (1 − δ)eλ0 /(eλ0 − 1)2 = e−λ0 /λ20 + e−λ0 /(1 − e−λ0 ).
2
(3)
q
Two different approximate (Wald) 90% confidence sets would be λ̃ ± z0.05 nI1 (λ̃) and
q
λ̃ ± z0.05 −`00n (λ̃), where λ̃ is the maximum likelihood estimator for censored data.
67. So θ = (p1 , p2 , p3 , p4 )0 and the parameter space Θ is [0, 1]4 . The hypothesis testing problem here
is H0 : θ ∈ Θ0 vs H1 : θ ∈
/ Θ0 , where Θ0 = {(p1 , p2 , p3 , p4 ) ∈ Θ | p1 = p2 , p3 = p4 }.
(a) The likelihood function is
fθn (X n )
=
4 Y
n
Xi
i=1
i
pX
i (1
− pi )
n−Xi
.
(4)
Then the LRT corresponding to this hypothesis testing problem is
4
Y
supθ∈Θ fθn (X n )
=
LRT =
supθ∈Θ0 fθn (X n ) i=1
2Xi
X[i/3]+1 + X[i/3]+2
Xi 2n − 2Xi
2n − X[i/3]+1 − X[i/3]+2
n−Xi
, (5)
where [i/3] stands for the integer part of i/3, e.g. [1/3] = [2/3] = 0, [3/3] = [4/3] = 1.
The log-likelihood ratio would then be the sum of the following two terms
2
X
i=1
4
X
Xi log{2Xi /(X1 + X2 )} + (n − Xi ) log{(2n − 2Xi )/(2n − X1 − X2 )}
(6)
Xj log{2Xj /(X3 + X4 )} + (n − Xj ) log{(2n − 2Xj )/(2n − X3 − X4 )}.
(7)
j=3
They corresponds to the logs of independent LRT statistics for H0 : p = p2 and H0 : p3 = p4
respectively.
1 −1 0 0
(b) h(θ) = (p1 − p2 , p3 − p4 ) and H(θ) =
, and FI is I(θ) = diag(n/{p1 (1 −
0 0 1 −1
p1 )}, n/{p2 (1 − p2 )}, n/{p3 (1 − p3 )}, n/{p4 (1 − p4 )}) Then the Wald statistic is
!
n2
X1 −X2
0
X1 (n−X1 )+X2 (n−X2 )
X3 −X4
X1 −X2
n
(8)
Wn =
X3 −X4
n2
n
n
.
0
n
X (n−X )+X (n−X )
3
3
3
3
A simplification gives us
√
√
{ n(X1 − X2 )/n}2
{ n(X3 − X4 )/n}2
Wn =
+
.
X1 (n − X1 )/n + X2 (n − X2 )/n X3 (n − X3 )/n + X4 (n − X4 )/n
(9)
d
Then we can see that, under H0 , Wn →
− χ22 .
(c) Under H0 , θ̃ = (p̃1 , p̃2 , p̃3 , p̃4 )0 , where p̃1 = p̃2 = (X1 + X2 )/(2n), p̃3 = p̃4 = (X3 + X4 )/(2n).
Then the score test statistic is
Vn =
(X1 − np̃1 )2 (X2 − np̃2 )2 (X3 − np̃3 )2 (X4 − np̃4 )2
+
+
+
.
np̃1 (1 − p̃1 )
np̃2 (1 − p̃2 )
np̃3 (1 − p̃3 )
np̃4 (1 − p̃4 )
3
(10)
After some algebra, it becomes
Vn =
(X3 − X4 )2
(X1 − X2 )2
+
,
2np̃1 (1 − p̃1 ) 2np̃3 (1 − p̃3 )
(11)
which converges to χ22 under H0 .
68. (a) 1/Pθ ({0}) = 1 + eθ + e2θ , thus a moment estimator of θ based on n0 is
(
)
p
−1 + 4n/n0 − 3
θ̂ = log
.
(12)
2
p
√
Denote g(t) = log{(−1 + 4/t − 3)/2}. As n(n0 /n − Pθ ({0})) is asymptotically normal,
√
and g 0 (Pθ ({0})) 6= 0. By delta method, n(θ̂ − θ) = Op (1), specifically, it is asymptotically
normally distributed.
P
P
P
(b) Note that i I{Xi = 1}θ + 2 i I{Xi = 2} = i Xi . The log-likelihood is
X
Ln (θ) = θ
Xi − n log(1 + eθ + e2θ )
(13)
i
L0n (θ) =
X
i
Xi −
neθ + 2ne2θ
1 + eθ + e2θ
(14)
(neθ + 4ne2θ )(1 + eθ + e2θ ) − n(eθ + 2e2θ )2
(1 + eθ + e2θ )2
(15)
∗
L0n (θ̂n ) = L0n (θ) + (θ̂n − θ)L00n (θ) + (θ̂n − θ)2 L000
n (θn )/2,
(16)
L00n (θ) =
If we write
where θn∗ is some estimator between θ̂n and θ0 . Then we have
(
) √
√ 0
∗
√
nLn (θ) √
L00n (θ)
n(θ̂n − θ)2 L000
n (θn )
n(θ̃n − θ) =
+ n(θ̂n − θ) 1 −
−
.
−L00n (θ̂n )
L00n (θ̂n )
2L00n (θ̂n )
(17)
It can be seen that the last two terms of the above expression is op (1) under the conditions
√
p
d
given. Moreover, it can be verified that n{L0n (θ)/n} →
− N (0, I1 (θ)) and −L00n (θ̂n )/n →
− I1 (θ),
thus the result follows.
(c) The likelihood equation is
(2 − x̄)e2θ + (1 − x̄)eθ − x̄ = 0.
When x̄ ∈ (0, 2), the solution is
(
)
p
x̄ − 1 + (1 − x̄)2 + 4x̄(2 − x̄)
b
θ̂ = log
.
2 − x̄
(18)
(19)
√ b
d
− N (0, 1/I1 (θ)) in θ probability. It suffices to verify the condiTo show that n(θ̂n − θ) →
tions of Theorem 173 (which would be easier than proving directly). Conditions 1-5 can be
b
easily verified to be true. Then Pθ (θ̂n is a root of likelihood equation) ≥ Pθ (0 < X̄ < 2) →
1, as n → ∞. Moreover, as x̄ converges to (eθ + 2e2θ )/(1 + eθ + e2θ ) almost surely. Plug the
b
limit back into the above equation, we get θ̂ converges to θ almost surely.
4
(d) The FI is I1 (θ) = V arθ (X) = eθ (1 + 4eθ + e2θ )/(1 + eθ + e2θ )2 . Then any consistent estimator
p
p
of θ say θ̃n such that θn →
− θ, I1 (θ̃n ) →
− I1 (θ). And the observed fisher information is
−L00n (θ)/n = eθ (1+4eθ +e2θ )/(1+eθ +e2θ )2 , which is exactly I1 (θ). Thus their approximations
b
are the same. If we use θ̂n , then
√
(x̄ − 1) + −3x̄2 + 6x̄ + 1
b
√
I1 (θ̂n ) = x̄ 1 − x̄ + 2
.
(20)
x̄ + 2 −3x̄2 + 6x̄ + 1
69. Without loss of generality, assume that h(x) = 1 (otherwise consider µ̃(A) =
this one-parameter density, the R-N derivative has the form
R
A
hdµ). Then for
fη (x) = K(η) exp{ηT (x)}, a.e.(µ),
(21)
R
where K(η) is the normalizing constant. The parameter space is Γ = η | exp{ηT (x)}dµ(x) < ∞ .
(a) For any η0 in the interior of Γ, consider the open neighborhood of η0 , O = (η0 − , η0 + ).
First of all, fη (x) > 0 ∀x ∈ X and η ∈ O. Second, by DCT, fη Rhas first partial derivative
at
R η0 , especially, when taking derivative on both sides of 1 = fθ (x)dµ(x), we get 0 =
∂fη (x)/∂η|η=η0 dµ(x). Thus P is FI regular
at every η0 in the interior of Γ. (The detail for
R
applying DCT is like Rthis. Let g(η) = exp{ηT (x)}dµ(x), then for η ∈ O, g(η) < ∞. Also
{g(η+h)−g(η)}/h = exp{ηT (x)}·{(exp(ηh)−1)/h}dµ(x). Then notice that (exp(ηh)−1)/h
is actually bounded for small enough h and all η ∈ O).
(b) Because of (a), by definition of Fisher Information, the fisher information in X about η
evaluated at η0 is
2
Z ∂{log{K(η)} + ηT (x)} K(η0 ) exp{η0 T (x)}dµ(x).
(22)
I(η0 ) =
∂η
η=η0
R
As 0 = ∂fη (x)/∂η|η=η0 dµ(x), some algebra work will give us K 0 (η0 )/K(η0 ) = −Eη0 {T (X)}.
Thus the above expression simplifies to I(η0 ) = V arη0 {T (X)}.
(c)
Z
I(η0 , η) =
fη (x)
dµ(x) =
fη0 (x) log 0
fη (x)
Z K(η0 )
log
+ (η0 − η)T (x) fη0 (x)dµ(x),
K(η)
(23)
which is log{K(η0 )/K(η)} + (η0 − η)Eη0 {T (X)}.
(d) The likelihood equation is
nK 0 (η)/K(η) +
X
T (Xi ) = 0
(24)
i
(e) Eη0 | log fη (X)| ≤ | log K(η)| + |η0 |Eη0 |T (X)|. Therefore, all conditions of Theorem 170 are
met.
(f) The general form of “one-step Newton improvement” would be
P
K 0 (η̃)
+ n1 i T (Xi )
K(η̃)
η̃ − n
o2
00 (η̃)
K 0 (η̃)
− KK(η̃)
K(η̃)
5
(25)
(g) The proof is straightforward by applying DCT with similar approach as in part (a).
70. To rigorously argue this, define the following set
An = {θ̂n is a root of likelihood equation} ∩ {Ln (θ) is unimodal},
(26)
then limn→∞ Pθ0 (An ) = 1. Then for any random variable Tn , we can claim that |Tn IAn − Tn | =
op (1), in θ0 probability. This is because, for any > 0,
Pθ0 {|Tn IAn − Tn | > } ≤ Pθ0 {Acn } → 0, as n → ∞.
(27)
Therefore, we only need to consider 2IAn {Ln (θ̂n ) − maxθ<θ0 Ln (θ)} here, which equals to 2I{θ0 <
√
d
θ̂n }{Ln (θ̂n ) − Ln (θ0 )}. Also, Theorem 173 says n(θ̂n − θ0 ) →
− N (0, I1−1 (θ0 )) in θ0 probability.
Further, under conditions of Theorem 173, Taylor expansion says Ln (θ0 ) − Ln (θ̂n ) = L0n (θ̂n )(θ0 −
θ̂n ) + 21 L00n (θn∗ )(θ̂n − θ0 )2 , where θn∗ is between θ̂n and θ0 . Therefore, in θ0 probability, 2I{θ0 <
p
p
00 ∗ )/n
d
n
θ̂n }{Ln (θ̂n ) − Ln (θ0 )} = −LIn1(θ
− I{Z > 0}Z 2 ,
I{ nI1 (θ0 )(θ̂n − θ0 ) > 0}{ nI1 (θ0 )(θ̂n − θ0 )}2 →
(θ0 )
where Z ∼ N (0, 1).
p
71. By slustky’s theorem, itP
suffices to prove that, in θ0 , −n−1 L00n (θ̂n ) →
− I1 (θ0 ). Note that n−1 |L00n (θ̂n )−
00
−1
00
−1 00
−1
00
Ln (θ0 )| ≤ |θ̂n − θ0 |n
i M (Xi ) = op (1). Therefore, −n Ln (θ̂n ) = −n {Ln (θ̂n ) − Ln (θ0 )} −
p
n−1 L00n (θ0 ) = op (1) − n−1 L00n (θ0 ) →
− I1 (θ0 ).
72. The hypothesis 1-5 of Theorem 173 guarantee that L0n (Tn ) = L0n (θ0 ) + L00n (θn∗ )(Tn − θ0 ), where θn∗
is between Tn and θ0 . Plug this into the one-step newton improvement formula, we have
√ 0
√
√
L00n (θn∗ )
nLn (θ0 )
n(θ̄n − θ0 ) = n(Tn − θ0 ) 1 − 00
.
(28)
−
Ln (Tn )
L00n (Tn )
√
Notice that Tn = θ0 + Op (n−1/2 ) (a more general condition than n(Tn − θ0 ) converges in disin problem 72,
tribution). Then θn∗ = θ0 + Op (n−1/2 ). One can use the same argument as √
00
00
00
00 ∗
(θ
)
+
o
(1),
which
means
)
=
L
(θ
and
see
that
L
(T
)
=
L
(θ
)
+
o
(1),
L
n(θ̄n − θ0 ) =
p
p
n n
n 0
n 0
n n
√
00 ∗
00
n(Tn − θ0 ) {1 − Ln (θn )/Ln (Tn )} = op (1). The result then follows, by slutsky’s theorem, from the
√
p
d
− I1 (θ0 ), n{n−1 L0n (θ0 ) − 0} →
facts that −n−1 L00n (Tn ) →
− N (0, I1 (θ0 )), in θ0 probability.
6
Download