Stat 643 Spring 2010 Assignment 05 Sufficiency and Related Notions

advertisement
Stat 643 Spring 2010
Assignment 05
Sufficiency and Related Notions
35. Let m̃ be the Lebesgue measure restricted on (1, ∞) and ν be the point mass measure at 0, that is, ν({0}) = 1.
(a) Use the above notation, we have
PλX (A)
=
=
=
=
P (X 0 > 1, X 0 ∈ A) + IA (0)P (X 0 ≤ 1)
Z
λ exp(−λx)dm̃ + IA (0)(1 − e−λ )
A
Z
Z
λ exp(−λx)dm̃ + (1 − e−λ )dν
A
A
Z
Z
λ exp(−λx)I{x 6= 0}d(m̃ + ν) + (1 − e−λ )I{x = 0}d(m̃ + ν)
A
Thus
A
dPλX
(x) = λ exp(−λx)I{x 6= 0} + (1 − e−λ )I{x = 0}, a.s.(µ).
dµ
P
(b) Let δi =PI{Xi 6= 0}, one can see that δi = I{X
1}, a.e.(µ). The likelihood function is λ i δi exp{−λ
P i >P
−λ n− i δi
e )
, which implies that T (X) = ( i δi , i δi Xi ) is a sufficient statistic for this problem.
(c) Consider the ratio of
Qn
i=1
P
i δi Xi }(1−
Qn dP X
dPλX
(Xi ) and i=1 λ (Yi ), which is
µ
µ
(1 − e−λ )
P
i
P
δiY − i δiX
P
λ
i
δiX −
P
i
δiY
X
exp{λ(
δiY Yi − δiX Xi )},
i
where δiX , δiY are the corresponding δi0 s to X and Y respectively. Clearly, this ratio independent of λ implies that
T (X) = T (Y ). Thus T (X) is also minimal sufficient by Theorem 55.
36. Note that I{W = X 0 } = I{X 0 ≤ Z}.
(a) In order to get the R-N derivative, consider the sets A × B where A, B ∈ B. Then P (X ∈ A × B) = P (X ∈
({0} ∩ A) × B) + P (X ∈ ({1} ∩ A) × B) = IAR(0)P (X ∈ {0} × B) + IA (1)P (XR ∈ {1} × B) = IA (0)P (X 0 >
0
0
0
0
RZ, Z ∈ B) + IA (1)P (X ≤ Z, X ∈ B) = IA (0) B exp{−(1 + λ)z}dm(z) + IA (1) B λ exp{−(1 + λ)x }dm(x ) =
exp{−(1 + λ)t2 }(I{0} (t1 ) + λI{1} (x1 ))dµ(t1 , t2 ). Thus
A×B
dPλX
(t1 , t2 ) = exp{−(1 + λ)t2 }(I{t1 = 0} + λI{t1 = 1}I{t2 > 0} = λI{t1 =1} exp{−(1 + λ)t2 }I{t2 > 0}, a.e.(µ).
dµ
(b) Let δi = I{wi = x0i }, then join density is
(
)
(
)
X
X
X
P
δ
i
λ i exp −(1 + λ)
wi }I{min wi > 0 = exp log(λ)
δi − (1 + λ)
wi I{min wi > 0}
i
i
i
i
i
P
P
A minimal sufficient statistic here would be ( i I{Wi = Xi0 }, i Wi ). Claim 72 says that this is truely a minimal
sufficient statistic. As one can see that, this is an exponential family and Γλ = {(log λ, λ)|λ > 0} contains an open
rectangle in R2 (We assume the parametric space Θ = R+ ).
1
37. The likelihood function is
dPθ
(X) = Λ(θ, X)fθ0 (X),
dµ
dPθ
dPθ
(x) =
(y)fθ (x)/fθ (y), ∀θ ⇒ fθ (x) = fθ (y) ⇒
dµ
dµ
Λ(θ, x) = Λ(θ, y), thus Λ(θ, X) is minimal sufficient by Theorem 55.
by factorization theorem, Λ(θ, X) is sufficient. Moreover,
38. Denote δi = I{Xi > d}, i = 1, . . . , n.
(a) Because T (X) is sufficient for X1 , . . . , Xn that are iid Pθ , by factorization
Q theorem (Theorem 50), there exists
B-measurable function h and F-measurable function gθ such that ∀θ i fθ (Xi ) = gθ (T (X))h(X). Then joint
likelihood function of the truncated model is
Y
δi gθ (T (X))h(X)/(1 − F (d))n = I{min Xi > d}gθ (T (X))h(X)/(1 − F (d))n ,
i
i
thus (T (X), mini Xi ) is sufficient.
(b) No. As a matter of fact, Theorem 55 can be a if and only if statement. One can see that as long as min(mini Xi , mini Yi ) >
d and T (X) = T (Y ), the ratio of fθ,d (X) with respect to fθ,d (Y ) would be independent of (θ, d) due to the minimal
sufficiency of T (X) for θ. But this does not say that mini Xi = mini Yi .
39. This would be trivial by using factorization theorem if the R-N derivative exits with respect to some sigma-finite
measure.
40. Denote ν as the counting measure restricted on {0, 1, 2, . . .}, and δ(x) = I{0,1} (x). Then
dPθX
(x) =
dν
e−θ1 θ1x
x!
2−θ2
θ2 −1
· θ1x (1 − θ1 )1−x δ(x)
, a.e.(ν).
After some algebra,
"
#θ2 −1
Pi xi Y
d(×ni=1 PθXi )
e−nθ1 (2−θ2 ) [(1 − θ1 )θ2 −1 ]n 1−θ
2
Q
(x1 , . . . , xn ) =
fθ1 ,θ2 (x1 , . . . , xn ) ≡
·
{δ(xi )xi !}
.
· θ1 (1 − θ1 )
dν n
i xi !
i
P
Q
P
Q
Thus ( i Xi , i {δ(Xi )Xi !}) = ( i Xi , i δ(Xi )) is sufficient by factorization theorem (0! = 1! = 1 and δ(x) =
I{0,1} (x)). The minimal sufficient is straightforward using Theorem 55 by looking at the R-N derivative given above.
Specifically,
suppose
fθ1 ,θ2 (x1 , . . . , xn )/f
. . , yn ) does not depend on (θ1 , θ2 ) ∈ (0, 1) × {1, 2}. Let θ2 = 1, we
P
P
Qθ1 ,θ2 (y1 , . Q
get i xi = i yi and further obtain i δ(xi ) = i δ(yi ).
41. Denote PθX as the probabiilty distribution of X here, then
√
dPθX
(x) = φ(x − θ)I{θ6=0} φ(x/ 2)I{θ=0} = φ
dm
x−θ
1 + I{θ = 0}
, a.e.(m),
where φ is the standard normal density and m is the Lebesgue measure. Then the R-N derivative of ×ni=1 PθXi with
respect to mn would be, after some algebra,
(
)
(
)
X
X
X
d(×ni=1 PθXi )
= (2π)n/2 exp −
x2i /4 − nI{θ 6= 0}θ2 /2 ·exp −I{θ 6= 0}
x2i /4 + I{θ 6= 0}θ
xi , a.e.(mn ).
dmn
i
i
i
This implies that {×ni=1 PθXi }θ∈R is an exponential family. However, Γθ = {(I{θ 6= 0}, I{θ 6= 0}θ) | θ ∈ R} does not
contain any open rectangle in R2 . Thus we can not ultilize Claim 71. The good news is that Theorem 62 comes in
handy. Consider Θ0 = {θ 6= 0 | θ ∈ R}, and denote Θ = R. It is easy to see that Pθ (B) = 0, ∀θ ∈ Θ0 ⇒ Pθ (B) =
0, ∀θ ∈ Θ (Normal distributions dominate each other). While for the smaller family {×i PθXi }θ∈Θ0 , it can be seen
that, for θ ∈ Θ0 ,
(
)
(
)
X
X
d(×ni=1 PθXi )
n/2
2
2
= (2π)
exp −
xi /2 − nθ /2 · exp θ
xi , a.e.(mn ),
dmn
i
i
2
P
which actually implies i Xi is minimal sufficient
Pand complete for the smaller family because Γθ = (−∞, 0)∪(0, ∞)
0
contains
open
interval
in
Θ
.
By
Theorem
62,
It would not be surprising that
i Xi is complete for
P
Pθ ∈ Θ
P= R.
2
i Xi is not sufficient for the larger family when θ ∈ Θ (Actually, (
i Xi ,
i Xi ) is sufficient for θ ∈ Θ).
P
To show that i Xi is not sufficient for θ ∈ Θ. One can either use factorization theorem or prove it using the first
principle, i.e., the conditional distribution of X1 , . . . , Xn given X̄ depends on θ. To argue using factorization theorem,
P
d(×ni=1 PθXi )
(x1 , . . . , xn ) =
noticing that if i Xi is sufficient for θ ∈ Θ. then there exists some gθ and h such that
dmn
X
X
P
P
P
d(×ni=1 Pθ i )
d(×ni=1 Pθ i )
gθ ( i xi )h(x1 , . . . , xn ) which says
(x1 , . . . , xn )/
(y1 , . . . , yn ) does not
i xi =
i yi ⇒
n
dm
dmn
P
depend on θ. The ratio actually is,P
from the R-N derivative expression given above, exp{(1 + I{θ 6= 0})( i yi2 −
P
2
i xi )} which depends on θ. Thus
i Xi is not sufficient for θ ∈ Θ.
Alternatively, As X1 , . . . , Xn are actually iid N (θ, 1 + I{θ = 0}), the conditonal distribution of (X1 , . . . , Xn ) given X̄
is still multivariate normal with mean 1X̄ and covariance (1 + I{θ = 0})(In − 110 /n), where 1 is the n-dimensional
vector with all elements being 1 and In is n × n identity matrix. One could also see that the conditional distribution
does not depend on θ when restricted on Θ0 . A good reference for this would be MVN facts from Dr. Vardeman’s
Stat 542 page.
It is possible to prove the completeness of X̄ directly from definition of completeness. That would end up with a
similar proof process as proving the completenss of statistics for exponential family.
Qn
42. The likelihood function corresponding to X1 , . . . , Xn is L(X; µ, σ12 , σ22 ) = i=1 [φ{(X1i − µ)/σ1 }φ{(X2i − µ)/σ2 }],
where φ is the standard normal density here and Xi = (X1i , X2i )0 , i = 1, . . . , n. By factorization theorem, we can
see that (X̄1 , X̄2 , S12 , S22 )0 is sufficient (Or from Claim 69). From the fact that
P
P
P 2 P 2
P
P
P 2 P 2
i X1i −
i Y1i
i Y2i −
i X2i
i X2i −
i Y2i
i Y1i −
i X1i
+
µ
+
+
µ
L(Y ; µ, σ12 , σ22 ),
L(X; µ, σ12 , σ22 ) = exp
2σ12
σ12
2σ22
σ22
utilizing Theorem 55, we can see that T (X) = (X̄1 , X̄2 , S12 , S22 )0 is minimal sufficient. To show that this statistic is
not minimal sufficient, we just need to find some nontrivial bounded function of T that√ is an unbiased estimator
of 0. Let f (a, b) = (a − b)I{|a − b| ≤ 1}, consider h(x1 , x2 , x3 , x4 ) = f {(x1 − x2 )/ x3 + x4 }. h is bounded
and Eθ [h{T (X)}] = 0 for all θ = (θ, σ12 , σ22 ) because h{T (X)} is essentially the two sample t-test statistic, whose
distribution is symmetric due to those two groups having the same population mean. However, h ◦ T is not 0. One
candidate for the 1st order ancillary statistic would be X̄1 − X̄2 .
P
P
43. From factorization theorem, after checking the likelihood function, T (X) = ( i Xi , i Xi2 ) would be a sufficient P
statistic. P
Also consider the
PLikelihood
P function L(X; γ), and L(X; γ) = k(X, Y )L(Y ; γ) where k(X, Y ) =
exp{( i Yi2 − i Xi2 )/(2γ 2 ) + ( i Xi − i Yi )/γ}. K(X, Y ) independent of λ implies that T (X) = T (Y ). Thus
T (X) is minimal sufficient.
3
Models and Measures of Statistical Information
46.
2
∂ log fθ (T, S) Eθ0
θ=θ0
∂θ
0
2
fθ0 (T, S)
Eθ0
fθ0 (T, S)
2
0
fθ0 (T, S) Eθ0 Eθ0
T
fθ0 (T, S)
0
2
Z fθ0 (T, S) Eθ0
gθ0 (t)dν(t)
T = t
fθ0 (T, S)
T
2
Z Z 0
fθ0 (t, s)
fθ0 (s|t)dγ(s) gθ0 (t)dν(t)
T
S fθ0 (t, s)
2
Z Z 0
fθ0 (t, s) fθ0 (t, s)
dγ(s) gθ0 (t)dν(t)
T
S fθ0 (t, s) gθ0 (t)
2
Z R 0
f (t, s)dγ(s)
S θ0
gθ0 (t)dν(t)
gθ0 (t)
T
Z 0
gθ0 (t)2
g (t)dν(t)
2 θ0
T gθ0 (t)
IT (θ0 )
IX (θ0 )
=
=
≥
=
=
=
=
=
=
47. Because {P, Q} are discrete measures, then we can consider counting measure ν restricted on a countable set X ,
and both P and Q are dominated by ν with corresponding R-N derivatives p, q. For statistic T = T (X), let T be
its corresponding area.Then
Z
X
p(x)
p(x)
dν(x) =
p(x) log
IX (P, Q) =
p(x) log
q(x)
q(x)
X
x∈X




X
X
X
X X
p(x)
p(x)
q(x)
P
=
p(x)
=
p(x) log
− log


q(x)
p(x)
p(x)
x:T (x)=t
t∈T
t∈T x:T (x)=t
x:T (x)=t
x:T (x)=t


P

X X
x:T (x)=t p(x)
≥
p(x) log P

x:T (x)=t q(x) 
t∈T
x:T (x)=t
= IT (X) (P T , QT ).
48. Consider {Pθ1 , Pθ2 } and let Pθ1 = P, Pθ2 = Q. Both are dominated by µ = ν × γ with R-N derivatives fθ1 , fθ2 . Then
IX (P, Q)
fθ (T, S)
= Eθ1 log 1
fθ2 (T, S)
fθ2 (T, S) = Eθ1 Eθ1 − log
T
fθ (T, S)
1
fθ2 (T, S) ≥ Eθ1 − log Eθ1
T
fθ1 (T, S)
Z
fθ2 (T, s) fθ1 (T, s)
ds
= Eθ1 − log
S fθ1 (T, s) gθ1 (T )
gθ (T )
= Eθ1 log 1
gθ2 (T )
=
IT (X) (P T , QT ).
4
49.
Z
gβ (x) = min(x, 1)f (x|β)I{x > 0} + I{x = 0}
(1 − min(y, 1))f (y|β)dy,
R
where f (x|β) = exp{−x/β}/βI{x > 0}. Denote h(β) = (1 − min(y, 1))f (y|β)dy = 1 − β + βe−1/β , now the log of
the density becomes
log gβ (x) = I{x > 0}{log(min(x, 1)) + log f (x|β)} + I{x = 0} log h(β),
then
(
Eβ0
∂ log gβ (x)
∂β
2 )
Z =
β=β0
∂ log gβ (x)
∂β
2 β=β0
gβ0 (x)d(δ0 + λ).
More specifically, it is not hard to see that the result has the form
Z
IY (β0 ) + (−1 + min(x, 1))f (x|β0 )(∂ log f (x|β)/∂β|β=β0 )2 dx + (∂ log h(β)/∂β)2 |β=β0 h(β0 ),
where
IY (β0 ) = 1/β02
(∂ log h(β)/∂β)2 |β=β0 h(β0 ) =
Z
e−2/β0 (βe1/β0 − β0 − 1)2
(1 − β0 + β0 e−1/β0 )β0 2
(−1 + min(x, 1))f (x|β0 )(∂ log f (x|β)/∂β|β=β0 )2 dx =
1 + 2β0 + 3β02 −1/β0 3β0 − 1
e
−
β03
β02
Intuitively, if P (B) = 1, then X and Y are the same and should hold the same fisher information matrix, that means
e−1/β0 = P (Y ≥ 1) ≈ 1, which impiles that β0 → ∞. It can also be derived directly from the above resutls.
50. θ = (pY , pZ ). log fθ (y, z) = y log pY + (n − y) log(1 − pY ) + z log pZ + (y − z) log(1 − pZ ) + log yz + log ny . One sees
that ∂ 2 log fθ (y, z)/∂pY ∂pZ = ∂ 2 log fθ (y, z)/∂pZ ∂pY = 0 and ∂ 2 log fθ (y, z)/∂p2Y = −y/p2Y − (n − y)/(1 − pY )2 and
∂ 2 log fθ (y, z)/∂p2Z = −z/p2Z − (y − z)/(1 − pZ )2 . Notice further that E(Y ) = npY , E(Z) = E{E(Z|Y )} = npY pZ .
Denote θ0 = (p∗Y , p∗Z ), then
!
n
0
∗
∂2
p∗
Y (1−pY )
IX (θ0 ) = Eθ0 −
.
log fθ (X)
=
np∗
Y
∂θθ0
θ=θ0
0
pZ (1−pZ )
5
Download