Stat 643 Spring 2010 Assignment 05 Sufficiency and Related Notions 35. Let m̃ be the Lebesgue measure restricted on (1, ∞) and ν be the point mass measure at 0, that is, ν({0}) = 1. (a) Use the above notation, we have PλX (A) = = = = P (X 0 > 1, X 0 ∈ A) + IA (0)P (X 0 ≤ 1) Z λ exp(−λx)dm̃ + IA (0)(1 − e−λ ) A Z Z λ exp(−λx)dm̃ + (1 − e−λ )dν A A Z Z λ exp(−λx)I{x 6= 0}d(m̃ + ν) + (1 − e−λ )I{x = 0}d(m̃ + ν) A Thus A dPλX (x) = λ exp(−λx)I{x 6= 0} + (1 − e−λ )I{x = 0}, a.s.(µ). dµ P (b) Let δi =PI{Xi 6= 0}, one can see that δi = I{X 1}, a.e.(µ). The likelihood function is λ i δi exp{−λ P i >P −λ n− i δi e ) , which implies that T (X) = ( i δi , i δi Xi ) is a sufficient statistic for this problem. (c) Consider the ratio of Qn i=1 P i δi Xi }(1− Qn dP X dPλX (Xi ) and i=1 λ (Yi ), which is µ µ (1 − e−λ ) P i P δiY − i δiX P λ i δiX − P i δiY X exp{λ( δiY Yi − δiX Xi )}, i where δiX , δiY are the corresponding δi0 s to X and Y respectively. Clearly, this ratio independent of λ implies that T (X) = T (Y ). Thus T (X) is also minimal sufficient by Theorem 55. 36. Note that I{W = X 0 } = I{X 0 ≤ Z}. (a) In order to get the R-N derivative, consider the sets A × B where A, B ∈ B. Then P (X ∈ A × B) = P (X ∈ ({0} ∩ A) × B) + P (X ∈ ({1} ∩ A) × B) = IAR(0)P (X ∈ {0} × B) + IA (1)P (XR ∈ {1} × B) = IA (0)P (X 0 > 0 0 0 0 RZ, Z ∈ B) + IA (1)P (X ≤ Z, X ∈ B) = IA (0) B exp{−(1 + λ)z}dm(z) + IA (1) B λ exp{−(1 + λ)x }dm(x ) = exp{−(1 + λ)t2 }(I{0} (t1 ) + λI{1} (x1 ))dµ(t1 , t2 ). Thus A×B dPλX (t1 , t2 ) = exp{−(1 + λ)t2 }(I{t1 = 0} + λI{t1 = 1}I{t2 > 0} = λI{t1 =1} exp{−(1 + λ)t2 }I{t2 > 0}, a.e.(µ). dµ (b) Let δi = I{wi = x0i }, then join density is ( ) ( ) X X X P δ i λ i exp −(1 + λ) wi }I{min wi > 0 = exp log(λ) δi − (1 + λ) wi I{min wi > 0} i i i i i P P A minimal sufficient statistic here would be ( i I{Wi = Xi0 }, i Wi ). Claim 72 says that this is truely a minimal sufficient statistic. As one can see that, this is an exponential family and Γλ = {(log λ, λ)|λ > 0} contains an open rectangle in R2 (We assume the parametric space Θ = R+ ). 1 37. The likelihood function is dPθ (X) = Λ(θ, X)fθ0 (X), dµ dPθ dPθ (x) = (y)fθ (x)/fθ (y), ∀θ ⇒ fθ (x) = fθ (y) ⇒ dµ dµ Λ(θ, x) = Λ(θ, y), thus Λ(θ, X) is minimal sufficient by Theorem 55. by factorization theorem, Λ(θ, X) is sufficient. Moreover, 38. Denote δi = I{Xi > d}, i = 1, . . . , n. (a) Because T (X) is sufficient for X1 , . . . , Xn that are iid Pθ , by factorization Q theorem (Theorem 50), there exists B-measurable function h and F-measurable function gθ such that ∀θ i fθ (Xi ) = gθ (T (X))h(X). Then joint likelihood function of the truncated model is Y δi gθ (T (X))h(X)/(1 − F (d))n = I{min Xi > d}gθ (T (X))h(X)/(1 − F (d))n , i i thus (T (X), mini Xi ) is sufficient. (b) No. As a matter of fact, Theorem 55 can be a if and only if statement. One can see that as long as min(mini Xi , mini Yi ) > d and T (X) = T (Y ), the ratio of fθ,d (X) with respect to fθ,d (Y ) would be independent of (θ, d) due to the minimal sufficiency of T (X) for θ. But this does not say that mini Xi = mini Yi . 39. This would be trivial by using factorization theorem if the R-N derivative exits with respect to some sigma-finite measure. 40. Denote ν as the counting measure restricted on {0, 1, 2, . . .}, and δ(x) = I{0,1} (x). Then dPθX (x) = dν e−θ1 θ1x x! 2−θ2 θ2 −1 · θ1x (1 − θ1 )1−x δ(x) , a.e.(ν). After some algebra, " #θ2 −1 Pi xi Y d(×ni=1 PθXi ) e−nθ1 (2−θ2 ) [(1 − θ1 )θ2 −1 ]n 1−θ 2 Q (x1 , . . . , xn ) = fθ1 ,θ2 (x1 , . . . , xn ) ≡ · {δ(xi )xi !} . · θ1 (1 − θ1 ) dν n i xi ! i P Q P Q Thus ( i Xi , i {δ(Xi )Xi !}) = ( i Xi , i δ(Xi )) is sufficient by factorization theorem (0! = 1! = 1 and δ(x) = I{0,1} (x)). The minimal sufficient is straightforward using Theorem 55 by looking at the R-N derivative given above. Specifically, suppose fθ1 ,θ2 (x1 , . . . , xn )/f . . , yn ) does not depend on (θ1 , θ2 ) ∈ (0, 1) × {1, 2}. Let θ2 = 1, we P P Qθ1 ,θ2 (y1 , . Q get i xi = i yi and further obtain i δ(xi ) = i δ(yi ). 41. Denote PθX as the probabiilty distribution of X here, then √ dPθX (x) = φ(x − θ)I{θ6=0} φ(x/ 2)I{θ=0} = φ dm x−θ 1 + I{θ = 0} , a.e.(m), where φ is the standard normal density and m is the Lebesgue measure. Then the R-N derivative of ×ni=1 PθXi with respect to mn would be, after some algebra, ( ) ( ) X X X d(×ni=1 PθXi ) = (2π)n/2 exp − x2i /4 − nI{θ 6= 0}θ2 /2 ·exp −I{θ 6= 0} x2i /4 + I{θ 6= 0}θ xi , a.e.(mn ). dmn i i i This implies that {×ni=1 PθXi }θ∈R is an exponential family. However, Γθ = {(I{θ 6= 0}, I{θ 6= 0}θ) | θ ∈ R} does not contain any open rectangle in R2 . Thus we can not ultilize Claim 71. The good news is that Theorem 62 comes in handy. Consider Θ0 = {θ 6= 0 | θ ∈ R}, and denote Θ = R. It is easy to see that Pθ (B) = 0, ∀θ ∈ Θ0 ⇒ Pθ (B) = 0, ∀θ ∈ Θ (Normal distributions dominate each other). While for the smaller family {×i PθXi }θ∈Θ0 , it can be seen that, for θ ∈ Θ0 , ( ) ( ) X X d(×ni=1 PθXi ) n/2 2 2 = (2π) exp − xi /2 − nθ /2 · exp θ xi , a.e.(mn ), dmn i i 2 P which actually implies i Xi is minimal sufficient Pand complete for the smaller family because Γθ = (−∞, 0)∪(0, ∞) 0 contains open interval in Θ . By Theorem 62, It would not be surprising that i Xi is complete for P Pθ ∈ Θ P= R. 2 i Xi is not sufficient for the larger family when θ ∈ Θ (Actually, ( i Xi , i Xi ) is sufficient for θ ∈ Θ). P To show that i Xi is not sufficient for θ ∈ Θ. One can either use factorization theorem or prove it using the first principle, i.e., the conditional distribution of X1 , . . . , Xn given X̄ depends on θ. To argue using factorization theorem, P d(×ni=1 PθXi ) (x1 , . . . , xn ) = noticing that if i Xi is sufficient for θ ∈ Θ. then there exists some gθ and h such that dmn X X P P P d(×ni=1 Pθ i ) d(×ni=1 Pθ i ) gθ ( i xi )h(x1 , . . . , xn ) which says (x1 , . . . , xn )/ (y1 , . . . , yn ) does not i xi = i yi ⇒ n dm dmn P depend on θ. The ratio actually is,P from the R-N derivative expression given above, exp{(1 + I{θ 6= 0})( i yi2 − P 2 i xi )} which depends on θ. Thus i Xi is not sufficient for θ ∈ Θ. Alternatively, As X1 , . . . , Xn are actually iid N (θ, 1 + I{θ = 0}), the conditonal distribution of (X1 , . . . , Xn ) given X̄ is still multivariate normal with mean 1X̄ and covariance (1 + I{θ = 0})(In − 110 /n), where 1 is the n-dimensional vector with all elements being 1 and In is n × n identity matrix. One could also see that the conditional distribution does not depend on θ when restricted on Θ0 . A good reference for this would be MVN facts from Dr. Vardeman’s Stat 542 page. It is possible to prove the completeness of X̄ directly from definition of completeness. That would end up with a similar proof process as proving the completenss of statistics for exponential family. Qn 42. The likelihood function corresponding to X1 , . . . , Xn is L(X; µ, σ12 , σ22 ) = i=1 [φ{(X1i − µ)/σ1 }φ{(X2i − µ)/σ2 }], where φ is the standard normal density here and Xi = (X1i , X2i )0 , i = 1, . . . , n. By factorization theorem, we can see that (X̄1 , X̄2 , S12 , S22 )0 is sufficient (Or from Claim 69). From the fact that P P P 2 P 2 P P P 2 P 2 i X1i − i Y1i i Y2i − i X2i i X2i − i Y2i i Y1i − i X1i + µ + + µ L(Y ; µ, σ12 , σ22 ), L(X; µ, σ12 , σ22 ) = exp 2σ12 σ12 2σ22 σ22 utilizing Theorem 55, we can see that T (X) = (X̄1 , X̄2 , S12 , S22 )0 is minimal sufficient. To show that this statistic is not minimal sufficient, we just need to find some nontrivial bounded function of T that√ is an unbiased estimator of 0. Let f (a, b) = (a − b)I{|a − b| ≤ 1}, consider h(x1 , x2 , x3 , x4 ) = f {(x1 − x2 )/ x3 + x4 }. h is bounded and Eθ [h{T (X)}] = 0 for all θ = (θ, σ12 , σ22 ) because h{T (X)} is essentially the two sample t-test statistic, whose distribution is symmetric due to those two groups having the same population mean. However, h ◦ T is not 0. One candidate for the 1st order ancillary statistic would be X̄1 − X̄2 . P P 43. From factorization theorem, after checking the likelihood function, T (X) = ( i Xi , i Xi2 ) would be a sufficient P statistic. P Also consider the PLikelihood P function L(X; γ), and L(X; γ) = k(X, Y )L(Y ; γ) where k(X, Y ) = exp{( i Yi2 − i Xi2 )/(2γ 2 ) + ( i Xi − i Yi )/γ}. K(X, Y ) independent of λ implies that T (X) = T (Y ). Thus T (X) is minimal sufficient. 3 Models and Measures of Statistical Information 46. 2 ∂ log fθ (T, S) Eθ0 θ=θ0 ∂θ 0 2 fθ0 (T, S) Eθ0 fθ0 (T, S) 2 0 fθ0 (T, S) Eθ0 Eθ0 T fθ0 (T, S) 0 2 Z fθ0 (T, S) Eθ0 gθ0 (t)dν(t) T = t fθ0 (T, S) T 2 Z Z 0 fθ0 (t, s) fθ0 (s|t)dγ(s) gθ0 (t)dν(t) T S fθ0 (t, s) 2 Z Z 0 fθ0 (t, s) fθ0 (t, s) dγ(s) gθ0 (t)dν(t) T S fθ0 (t, s) gθ0 (t) 2 Z R 0 f (t, s)dγ(s) S θ0 gθ0 (t)dν(t) gθ0 (t) T Z 0 gθ0 (t)2 g (t)dν(t) 2 θ0 T gθ0 (t) IT (θ0 ) IX (θ0 ) = = ≥ = = = = = = 47. Because {P, Q} are discrete measures, then we can consider counting measure ν restricted on a countable set X , and both P and Q are dominated by ν with corresponding R-N derivatives p, q. For statistic T = T (X), let T be its corresponding area.Then Z X p(x) p(x) dν(x) = p(x) log IX (P, Q) = p(x) log q(x) q(x) X x∈X X X X X X p(x) p(x) q(x) P = p(x) = p(x) log − log q(x) p(x) p(x) x:T (x)=t t∈T t∈T x:T (x)=t x:T (x)=t x:T (x)=t P X X x:T (x)=t p(x) ≥ p(x) log P x:T (x)=t q(x) t∈T x:T (x)=t = IT (X) (P T , QT ). 48. Consider {Pθ1 , Pθ2 } and let Pθ1 = P, Pθ2 = Q. Both are dominated by µ = ν × γ with R-N derivatives fθ1 , fθ2 . Then IX (P, Q) fθ (T, S) = Eθ1 log 1 fθ2 (T, S) fθ2 (T, S) = Eθ1 Eθ1 − log T fθ (T, S) 1 fθ2 (T, S) ≥ Eθ1 − log Eθ1 T fθ1 (T, S) Z fθ2 (T, s) fθ1 (T, s) ds = Eθ1 − log S fθ1 (T, s) gθ1 (T ) gθ (T ) = Eθ1 log 1 gθ2 (T ) = IT (X) (P T , QT ). 4 49. Z gβ (x) = min(x, 1)f (x|β)I{x > 0} + I{x = 0} (1 − min(y, 1))f (y|β)dy, R where f (x|β) = exp{−x/β}/βI{x > 0}. Denote h(β) = (1 − min(y, 1))f (y|β)dy = 1 − β + βe−1/β , now the log of the density becomes log gβ (x) = I{x > 0}{log(min(x, 1)) + log f (x|β)} + I{x = 0} log h(β), then ( Eβ0 ∂ log gβ (x) ∂β 2 ) Z = β=β0 ∂ log gβ (x) ∂β 2 β=β0 gβ0 (x)d(δ0 + λ). More specifically, it is not hard to see that the result has the form Z IY (β0 ) + (−1 + min(x, 1))f (x|β0 )(∂ log f (x|β)/∂β|β=β0 )2 dx + (∂ log h(β)/∂β)2 |β=β0 h(β0 ), where IY (β0 ) = 1/β02 (∂ log h(β)/∂β)2 |β=β0 h(β0 ) = Z e−2/β0 (βe1/β0 − β0 − 1)2 (1 − β0 + β0 e−1/β0 )β0 2 (−1 + min(x, 1))f (x|β0 )(∂ log f (x|β)/∂β|β=β0 )2 dx = 1 + 2β0 + 3β02 −1/β0 3β0 − 1 e − β03 β02 Intuitively, if P (B) = 1, then X and Y are the same and should hold the same fisher information matrix, that means e−1/β0 = P (Y ≥ 1) ≈ 1, which impiles that β0 → ∞. It can also be derived directly from the above resutls. 50. θ = (pY , pZ ). log fθ (y, z) = y log pY + (n − y) log(1 − pY ) + z log pZ + (y − z) log(1 − pZ ) + log yz + log ny . One sees that ∂ 2 log fθ (y, z)/∂pY ∂pZ = ∂ 2 log fθ (y, z)/∂pZ ∂pY = 0 and ∂ 2 log fθ (y, z)/∂p2Y = −y/p2Y − (n − y)/(1 − pY )2 and ∂ 2 log fθ (y, z)/∂p2Z = −z/p2Z − (y − z)/(1 − pZ )2 . Notice further that E(Y ) = npY , E(Z) = E{E(Z|Y )} = npY pZ . Denote θ0 = (p∗Y , p∗Z ), then ! n 0 ∗ ∂2 p∗ Y (1−pY ) IX (θ0 ) = Eθ0 − . log fθ (X) = np∗ Y ∂θθ0 θ=θ0 0 pZ (1−pZ ) 5