b. Using the normal approximation, we have µv = r(1 − p)/p = 20(.3)/.7 = 8.57 and � � 2 σv = r(1 − p)/p = (20)(.3)/.49 = 3.5. Then, P (Vn = 0) = 1 − P (Vn ≥ 1) = 1 − P !"#$%&'()*+),(-)(.#/012(3)(.4"1(45(-6(.4"1($7( � Vn −8.57 1−8.57 ≥ 3.5 3.5 10-705: Intermediate Statistics Another way to approximate this probability is � � � = 1 − P (Z ≥ −2.16) = .0154. Fall 2012 0−8.57 V − 8.574 Solutions Homework ≤ = P (Z ≤ −2.45) = .0071. P (Vn = 0) = P (Vn ≤ 0) = P Lecturer: Larry Wasserman 3.5 3.5 TA: Wanjie Wang, Haijie Gu Continuing in this way we have P (V = 1) = P (V ≤ 1) − P (V ≤ 0) = .0154 − .0071 = .0083, etc. � � (k−.5)−8.57 c. With the continuity ≤ Z ≤ (k+.5)−8.57 , so Problem 1[C&B] 5.39correction, compute P (V = k) by P 3.5 3.5 P (V = 0) = P (−9.07/3.5 ≤ Z ≤ −8.07/3.5) = .0104 − .0048 = .0056, etc. Notice that the continuity correction gives some improvement over the uncorrected normal approximation. 5.39 a. If h is continuous given � > 0 there exits δ such that |h(xn ) − h(x)| < � for |xn − x| < δ. Since X1 , . . . , Xn converges in probability to the random variable X, then limn→∞ P (|Xn − X| < δ) = 1. Thus limn→∞ P (|h(Xn ) − h(X)| < �) = 1. b. Define the subsequence Xj (s) = s + I[a,b] (s) such that in I[a,b] , a is always 0, i.e, the subsequence X1 , X2 , X4 , X7 , . . .. For this subsequence � s if s > 0 Xj (s) → s + 1 if s = 0. 5.41 a. Let � = |x − µ|. (i) For x − µ ≥ 0 Problem 2[C&B] 5.44 P (|Xn − µ| > �) = = ≥ = !"#$%&'()*88,(-)(.#/012(3)(.#/012(&49:7! ( P (|Xn − µ| > x − µ) P (Xn − µ < −(x − µ)) + P (Xn − µ > x − µ) P (Xn − µ > x − µ) P (Xn > x) = 1 − P (Xn ≤ x). Therefore, 0 = limn→∞ P (|Xn −µ| > �) ≥ limn→∞ 1−P (Xn ≤ x). Thus limn→∞ P (Xn ≤ x) ≥ 1. (ii) For x − µ < 0 P (|Xn − µ| > �) = = ≥ = P (|Xn − µ| > −(x − µ)) P (Xn − µ < x − µ) + P (Xn − µ > −(x − µ)) P (Xn − µ < x − µ) P (Xn ≤ x). Therefore, 0 = limn→∞ P (|Xn − µ| > �) ≥ limn→∞ P (Xn ≤ x). By (i) and (ii) the results follows. b. For every � > 0, ! P (|Xn − µ| > �) ≤ P (Xn − µ < −�) + P (Xn − µ > �) = P (Xn < µ − �) + 1 − P (Xn ≤ µ + �) → 0 as n → ∞. 1 f (x1 , ..., xn |θ) = n � P (eiθ−xi I[iθ,∞) (xi )) = eθ( i=1 P i i) − e P i xi I(xi ≥ iθ ∀ i) = = e�− ��i x�i eθn(n+1)/2 I(mini (xi /i) ≥ θ) � �� � → − − g(T (→ x )|θ) h( x ) − where T (→ x ) = mini (xi /i). Therefore, by the Factorization Theorem(C&B Theorem 6.2.6), T = mini (Xi /i) is a sufficient statistic for θ. Problem 3[C&B] 6.5 Problem 2. C&B 6.5 (20 points) Similarly to the previous problem we can write the joint pdf to identify a sufficient statistic. f (x1 , x2 , ..., xn |θ) = = � 1 2θ �n n � 1 I (−i(θ − 1) ≤ xi ≤ i(θ + 1)) = 2iθ i=1 1 − I(mini (xi /i) ≥ −(θ − 1))I(maxi (xi /i) ≤ (θ + 1)) = g(T (→ x )|θ) n! − where T (→ x ) = (mini (xi /i), maxi (xi /i)). Therefore, (mini (Xi /i), maxi (Xi /i)) is a two-dimensional sufficient statistic for θ. Problem 5. C&B 7.1 (10 points) One observation is taken on a discrete random variable Problem 3. C&B 6.8θ (20 X with pmf f (x|θ), where ∈ {1,points) 2, 3}. Find the MLE of θ. Problem 4 pdf of X1 , X2 , ..., Xn is: The joint The likelihood ratio of two samples {Xi } and {Yni }f (x|3) is x fn(x|1) f (x|2) � � → − f ( x |θ) = f (x − θ)1 = n f (x(i) − θ) 1 ni 0 ≤ Yi ≤ θ + 1) πi=1 I(θ ; θ) 4 i=1 3 n n 0 i=1p(y R(x , y ; θ) = = n n p(x ; θ) 1 Therefore, πi=1 I(θ by ≤ the Xi ≤ θ + 1) where x(i) is the i-th observation in magnitude. Factorization Theorem the 1 1 0 4 order statistics T (X1 , X2 , ..., Xn ) = (X(1) , 3X(2) , ..., X ) are sufficient for θ. (n) n n → − x |θ) max(X) = max(Y ). R(xSince , y ; fθ)can does not pdf, depend on θ if and only if min(X) = min(Y )f (and 1 In particular, be any no further is1possible. is independent → − 2 reduction 0 f ( y |θ) 4 4 → − → − Therefore the minimal sufficient statistics is T (X) = {min(X), max(X)}. of θ if and only if T ( x ) = T ( y ) so by C&B Theorem 6.2.13 the order statistics are minimal 1 1 1 1 6 0 1 4 3 reduction sufficient. The question about further is a4 little vague here, so we are going to ignore it 6 2 Problem 5[C&B] 7.1 when grading. 4 Solution 1. x = 0, the likelihood L(θ) = 13 I(θ = 1) + 14 I(θ = 2) + 0 · I(θ = 3) = 13 I(θ = 1) + 14 I(θ = 2), therefore, the MLE θ̂ = 1; 2. x = 1, L(θ) = 13 I(θ = 1) + 14 I(θ = 2), θ̂ = 1; 3. x = 2, L(θ) = 14 I(θ = 2) + 14 I(θ = 3), θ̂ = 12 or θ̂ = 3; 4. x = 3, L(θ) = 16 I(θ = 1) + 14 I(θ = 2) + 12 I(θ = 2), θ̂ = 3; 5. x = 4, L(θ) = 16 I(θ = 1) + 14 I(θ = 3), θ̂ = 3. Finally, X = 0, 1; 1 2 or 3 X = 2; θ̂ = 3 X = 3, 4. Problem 6. C&B 7.6 (10 points) Let X1 , ..., Xn be a random sample from the pdf f (x|θ) = θx−2 , 0 < θ ≤ x < ∞. (a) What is a sufficient statistic for θ? (b) Find the MLE of θ. (c) Find the method of moments estimator of θ.2 Solution (a) Joint likelihood n n 3 X = 3, 4. Problem 6. C&B 7.6 (10 points) Let X1 , ..., Xn be a random sample from the pdf f (x|θ) = θx−2 , 0 < θ ≤ x < ∞. (a) What is a sufficient statistic for θ? Problem (b) Find 6[C&B] the MLE of7.6 θ. (c) Find the method of moments estimator of θ. Solution n (b) Since Qnθ x2 is strictly increasing w/ θ, and I(θ � mini xi ) puts an upper cutoff beyond which i=1 i (a) Joint L(θ) likelihood varnishes, therefore, the MLE θ̂M LE = mini Xi . n � �∞ θ� ∞ θ θn �n thus L(θ) = � x ) = min x (c) E[|Xi |] = E[Xi ] = θ xf (x|θ)dx = 2 I(θ dx diverges, we�can’t get i i ). an expression for any 2 I(θ i xi θ x i=1 xi i=1doesn’t moment, therefore, the MOME exist. Take h(x) = Qn 1 x2 , g(θ, T (x)) = θn I(θ � mini xi ), then by the Factorization Theorem the i=1 i sufficient statistic is T = mini Xi . n 7.7 increasing (10 points) (b)Problem Since Qnθ 7.x2C&B is strictly w/ θ, and I(θ � mini xi ) puts an upper cutoff beyond which i=1 Let X1 , ..., Xni be iid with one of two pdfs. If θ = 0, then L(θ) varnishes, therefore, the MLE θ̂M LE = mini Xi . � �∞ � ∞ θ 1 3 if 0 < x < 1 f (x|θ) (c) E[|Xi |] = E[Xi ] = θ xf (x|θ)dx = =θ x dx diverges, thus we can’t get an expression for any 0 otherwise, moment, therefore, the MOME doesn’t exist. while if θ = 1, then � √ 1/(2 x) if 0 < x < 1 f (x|θ) = 0 otherwise, Problem Problem7[C&B] 7. C&B 7.7 7.7 (10 points) Let Xthe Xn be Find MLE of iid θ. with one of two pdfs. If θ = 0, then 1 , ..., � Solution 1 if 0 < x < 1 f (x|θ) = 0 otherwise, L(0|�x) = 1, 0 < xi < 1 while if θ = 1, then � � n √ 1/(2 1x) , if 0< L(1|� x ) = 0< xi x<<1 1 √ f (x|θ) = 2 x 0 otherwise, i i=1 �n Find the MLE of θ. Then, θ̂M LE = 0, when 1 ≥ i=1 2√1xi I(xi ∈ (0, 1)), and θ̂M LE = 1 otherwise. Solution Problem 8. C&B 7.12 (a,b) (10 points) L(0|� x) = 1, 0 < xi < 1 Let X1 , ..., Xn be a random sample from a population with pmf n � 1 L(1|� x ) = √ , 0 < xi < 1 1 x 1−x Pθ (X = x) = θ (1 −i=1 θ) 2 ,xi x = 0 or 1, 0 ≤ θ ≤ . 2 �n 1 √ Then, θ̂ = 0, when 1 ≥ I(x ∈ (0, 1)), and θ̂ = 1 otherwise. LE method of moments i and the MLE of M LE i=1 estimator 2 xi (a) FindMthe θ. (b) Find the mean squared errors of each of the estimators. Problem 8. C&B 7.12 (a,b) (10 points) Solution Let EX X1 , = ...,θ. XnTherefore, be a random sample from a population with pmf (a) θ̂M OM E = X̄. 11 x 1−x n−nx̄ PP θ (X (θ|�x=) x) = θ=nx̄θ(1(1−−θ)θ) ,, xxi==00,or1 1, , 00≤≤θθ≤≤ . 22 θ̂(a) maximizes L(θ|� as well asestimator log(L(θ|�xand )). (see Example Find the method ofx)moments the MLE of θ.7.2.7 in C&B) M LE (b) Find the mean squared errors of each of the estimators. ∂ ∂ log(L(θ|�x)) = [nx̄ log θ + (n − nx̄) log(1 − θ)] = Solution ∂θ ∂θ (a) EX = θ. Therefore, θ̂M OM E = X̄. 1 1 = nx̄ − (n − nx̄) θ 1−θ 1 P (θ|�x) = θnx̄ (1 − θ)n−nx̄ , xi = 0, 1 , 0 ≤ θ ≤ The derivative is 0, when 2 θ̂M LE maximizes L(θ|�x) as well as log(L(θ|�x)). (see Example 7.2.7 in C&B) 3 nx̄)θ nx̄(1 − θ) = (n − ∂ ∂ log(L(θ|�x)) = [nx̄ log θ + (n − nx̄) log(1 − θ)] = ∂θ ∂θ 4 1 1 = nx̄ − (n − nx̄) fZ1 ,Z 2 (z1 , z2 ) can be factored into two densities. Therefore Z1 and Z2 are independent and Z1 ∼ gamma(r + s, 1), Z2 ∼ beta(r, s). 4.25 For X and Z independent, and Y = X + Z, fXY (x, y) = fX (x)fZ (y − x). In Example 4.5.8, fXY (x, y) = I(0,1) (x) 1 I(0,1/10) (y − x). 10 In Example 4.5.9, Y = X 2 + Z and 1 1 Problem 8[C&B] 7.14 fXY (x, y) = fX (x)fZ (y − x2 ) = I(−1,1) (x) I(0,1/10) (y − x2 ). 2 10 4.26 a. P (Z ≤ z, W = 0) = P Homework (min(X, Y ) ≤ z, 2 Y – ≤ STAT X) = 543 P (Y ≤ z, Y ≤ X) � � z ∞ 1 −y/µ26 by 5:00 pm (TA’s office); 1 −x/λ On campus: Due Friday, January e e dxdy = λ assignment µ you also may turn in class on the same Friday 0 yin the � � � � �� Distance students: Due λ Wednesday, January 1 1 31 by 12:00 pm (TA’s email) = 1. Problem 7.1, Casella & Berger 1 − exp − µ+λ µ + λ z . Similarly, 2. P Problem 7.12(a), & Berger (Z ≤ z,W =1) Casella = P (min(X, Y ) ≤ z, X ≤ Y ) = P (X ≤ z, X ≤ Y ) � z � ∞ 1 1 = & Bergere−x/λ e−y/µ dydx = 3. Problem 7.14, Casella λ µ 0 x b. µ µ+λ � � � � �� 1 1 1 − exp − + z . µ λ Hint: You should be able to show that the joint density of (Z, W ) is � � � ∞ ∞ −1 −z (λ−1 +µ−1 ) λ 1 µ −x/λe1 −y/µ dxdy = z> . 0, w = 0 P (W =dF 0) (z, = Pw) (Y =≤fX) (z,=w|λ, µ) = λ e −1 µ−ze(λ−1 +µ −1 µ+λ ) 0 y dz λ e z > 0, w = 1 P (W = 1) = 1 − P (W = 0) = where F (z, w|λ, µ) = P (Z ≤ z, W = w|λ, µ) µ . µ+λ � � � � 1 1 P (Z ≤ z) =Casella P (Z ≤& z, Berger W = 0) + P (Z ≤ z, W = 1) = 1 − exp − + z . 4. Problem 7.15(a), n µ X λ Pn Pn 1 1 The parameters λw> n inverse Gaussian satisfy − µ, −n+ i=1 wi i 0. i=1 Therefore, P (Z ≤ z, W f=(zi)i ,= (Zµ) ≤= z)Pλ(W = i), i = 0, 1, zexp{−( > 0. So Z and L(λ, µ) of = the πi=1 wiP|λ, µfor zi )( W +are )} λ µ i=1 5. independent. Problem 7.46, Casella & Berger (skip part(c))2 4.27 From Theorem 4.2.14 we know UX ∼ n(µ + γ, 2σ ) and V ∼ n(µ − γ, 2σ 2 ). ItX remains 1to show X 1 that they are independent. Exercise 4.24.+random −( aswiin) log λ + (−n wi )variables log µ −with ( pdf zi )(f (x|β). + Here, ) 6. Suppose Xlog . , Xnµ)are=Proceed independent, exponential(β) we’ll suppose 1 , . .L(λ, λ µ 2 that each Xi represents the1 lifetime of a 2new type of AAA battery being tested in a development lab (that 1 − +(y−γ) ] fXY (x, y) = e 2σ2 [(x−µ) (by sofXY fX fY ) 2 Take derivative with respect to µ and λ and setindependence, them to zero, weith=have is, Xi is the duration in hours, of the tested battery). Due to time con2πσof use until the battery is “dead,” 1fixed time P 1 tested P P P straints, the batteries can only be until a point t at which the test will be stopped. We will s Let u = x + y, v = x − y, then x = 2 (u + v), y = 2 (u − v) and − w −n + w z z i i i i � � measure or observe lifetimes Xi =+x� i for some that “die before = 0,batteries + out” = 0 :ts ; for other batteries that do 1 λ∗ ∗ 2 1/2 �� ∗2 � 1/2 µ∗ |J| µ λ = . not die out before the test ends, we = will only know that the battery’s life X is greater than the stopping point ts . i � 1/2 −1/2 � 2 P Therefore 0 <is mapped wi <onto n) is: The set {−∞the < xMLE < ∞, (assuming −∞ < y < ∞} the set {−∞ < u < ∞, −∞ < v < ∞}. Suppose in the sample, we observe exact lifetimes given by X1 = x1 , . . . , Xr = xr , which correspond to the r Therefore P P � u+v0 ≤ r2 ≤ � batteries failing before time ts ,1where n. n − r batteries (i.e., lifetimes Xr+1 , . . . , Xn ) 2 z zi u−vThe remaining i 1 1 − ∗ (∗( = (( 2 ,)−γ ) λ̂ 2 )−µ) + 2σ 2 µ P P µ̂ = = λ = e · f (u, v) = U V survived past time ts (so 2 don’t have observed values xr+1 2πσwe 2 , . . . , xn for n − wi withese). � � 2 2 2 2 (µ+γ) (µ+γ) 1 − 2σ12 2( u2 ) −u(µ+γ)+ 2 +2( v2 ) −v(µ−γ)+ 2 e We would like to find4πσ the 2MLE for the exponential parameter β by writing out a likelihood function L(β) for 1 1 above − 1do not fall easily2into the−framework the data. The data=mentioned for STAT 2 543 and it may not be clear g(u) e 2(2σ2 ) (u − (µ + γ)) · h(v)e 2(2σ2 ) (v − (µ − γ)) . 2 4πσ (since for n−r observations in the sample, we don’t really know their values). how to formulate the likelihood � By Itthe factorization V are independent. turns out that wetheorem, can writeU aand likelihood function for β as L(β) = ni=1 Ci where, for each i = 1, . . . , n, we = let Ci = f (xi |β) if we observe a failure time xi for the variable Xi and set Ci = Pβ (Xi > ts ) if we have no observed failure time for Xi (i.e., Xi takes on some value beyond ts so we use the probability of that event). (Note: we won’t worry about how this likelihood arises. You can take “STAT 533” to learn about reliability testing and failure time data.) (a) Given this definition of the likelihood above and a random sample of size n containing r failures with � observed lifetimes X1 = x1 , . . . , Xr = xr (where 0 ≤ r ≤ n), write out L(θ) = ni=1 Ci for β in terms �r 4 of r, n and the “total time on test” T = i=1 xi + (n − r)ts . (b) If r > 0, find the MLE of β (c) When r = 0 (no failures observed), does the MLE exist? Explain. Problem 9 (a) The posterior of µ under prior N (a, b2 ) is µ|X ∼ N ( nb2 X̄ + σ 2 a σ 2 b2 , ) σ 2 + nb2 σ 2 + nb2 Under squared loss, the bayes estimator is just the posterior mean. Hence, µ̂ = Let η = σ2 , σ 2 +nb2 σ2 nb2 X̄ + a σ 2 + nb2 σ 2 + nb2 (1) we can rewrite Eq. 1 as µ̂ = (1 − η)X̄ + ηa (2) (b) The risk R(µ, µ̂) = M SE(µ̂) = Bias2 (µ̂) + V ar(µ̂) σ2 =(ηa + ((1 − η) − 1)µ)2 + (1 − η)2 n 2 σ =(η(a − µ))2 + (1 − η)2 n (c) The sup of the risk is infinity. By making µ arbitrarily far from a, the risk goes unbounded. (d) The Bayes risk Z B(π, µ̂) = R(µ, µ̂)π(µ)dµ Z σ2 = [(1 − η)2 + η 2 (µ − a)2 ]π(µ)dµ n Z 2 2σ 2 =(1 − η) +η (µ − a)2 π(µ)dµ n σ2 =(1 − η)2 + η 2 b2 (π(µ) ∼ N (a, b2 )) n σ2 σ2 σ2 = − 2η + η 2 (b2 + ) n n n 2 2 2 2 2 σ σ σ 2 σ + nb = − 2η + ( 2 ) n n σ + nb2 n 2 σ σ 2 b2 = (1 − η) = 2 n σ + nb2 5 Problem 10. (10 points) In class, we found the minimax estimator for the Bernoulli. Here, you will fill in the details. Let X1 , ..., Xn ∼ Bernoulli(p). Let L(p, p̂) = (p − p̂)2 . (a) Let p̂ be the Bayes estimator using a Beta(α, β) prior. Find the Bayes estimator. (b) Compute the risk function. (c) Compute the Bayes risk. Problem (d) Find 10 the α and β to make the risk constant and hence find the minimax estimator. Solution iid (a) Bayes estimator under square error loss L(p, p̂) = (p − p̂)2 is the posterior � mean. Xi ∼ Bernoulli(p), p ∼ Beta(α, β) are conjugate,Pthe posterior is p|X ∼ Beta(α + i Xi , β + n − � α+ i Xi i Xi ). Therefore, Bayes estimator p̂ = α+β+n . (b) Risk function for p̂ R(p, p̂) = Ep [L(p, p̂)] = M SE(p̂) = (E[p̂] − p)2 + V [p̂] α + np np(1 − p) = ( − p)2 + α+β+n (α + β + n)2 (α(1 − p) − βp)2 np(1 − p) = + (α + β + n)2 (α + β + n)2 (c) Bayes risk for p̂ B(π, p̂) = = = = = = � R(p, p̂)π(p)dp � 1 α 2 [(α + β)2 (p − ) + np − np2 ]π(p)dp (α + β + n)2 α+β 1 αβ nα αβ α2 [(α + β)2 + − n( + )] 2 2 2 (α + β + n) (α + β) (α + β + 1) α + β (α + β) (α + β + 1) (α + β)2 1 αβ nα nα(α + 1) [ + − ] (α + β + n)2 α + β + 1 α + β (α + β)(α + β + 1) 1 αβ nαβ [ + ] (α + β + n)2 α + β + 1 (α + β)(α + β + 1) αβ (α + β)(α + β + 1)(α + β + n) (d) The risk R(p, p̂) = (α(1 − p) − βp)2 np(1 − p) 1 + = {p2 [(α+β)2 −n]+p[n−2α(α+β)]+α2 } (α + β + n)2 (α + β + n)2 (α + β + n)2 is a 2nd order polynomial of p. To make it constant, set � � (α + β)2 − n = 0; α= =⇒ n − 2α(α + β) = 0. β= Thus p̂m = P α+ i Xi α+β+n = √ P n/2+ i Xi √ n+n is the minimax estimator. 6 6 √ n ; √2 n 2 .