STAT 510 Homework 1 Solutions Spring 2016

STAT 510 Homework 1 Solutions Spring 2016 1. We can express A and B in terms of vectors:       1 5 1 5 A = 4 −1 = 4 , −1 = a1 , a2 , 3×2 0 1 0 1 0 b 3, 3, −2, 4 3 3 −2 4 B = = . = (1) 5 −1 2 8 5, −1, 2, 8 b0(2) 2×4 Then AB is a sum of two matrices: X 2 b0(1) = ai b0(i) AB = a1 , a2 b0(2) 3×4 i=1     1 5    = 4 3 3 −2 4 + −1 5 −1 2 8 0 1     3 3 −2 4 25 −5 10 40 = 12 12 −8 16 + −5 1 −2 −8 0 0 0 0 5 −1 2 8   28 −2 8 44 =  7 13 −10 8  . 5 −1 2 8 2. Suppose u ∼ tm (δ). Clearly there exist independent random variables y ∼ N (δ, 1) and w ∼ χ2m . By slide 20 of set 1, y p ∼ tm (δ). w/m p Since both u and the random variable defined by y/ w/m have tm (δ) distributions, they must have the same cdf, and thus they are equal in distribution: y d . u= p w/m Squaring, we obtain d u2 = y2 . w/m Since y ∼ N (δ, 1), we have y 2 ∼ χ21 (δ 2 /2) by slide 15 of set 1. Further, since y and w are independent, y 2 and w are also independent (see Theorem 4.3.5 in Casella and Berger1 ). By slide 22 of set 1, it follows that d u2 = y2 y 2 /1 = ∼ F1,m (δ 2 /2). w/m w/m Hence, u2 ∼ F1,m (δ 2 /2). 1 Casella, G. and Berger, R.L. (2002) Statistical Inference, 2nd edition, Duxubry Press. 1 Comments: • Some of you said that since u ∼ tm (δ), there p exist independent random variables y ∼ 2 N (δ, 1) and w ∼ χm such that u = y/ w/m. This is a much stronger claim than p p d equality in distribution (note that u = y/ w/m =⇒ u = y/ w/m, but not the converse), and is not needed here. • Many of you forgot about the independence condition! This is a very important condition, p highlighted by this counterexample. Suppose y ∼ N (0, 1). Let w = y 2 and u = y/ w/1. By slide 16 of set 1, w ∼ χ21 . Then we have y , u= p w/1 where y ∼ N (0, 1) and w ∼ χ21 . (1) Since w = y 2 , y y y u= p =p = = |y| w/1 y 2 /1 −1 1 with probability 0.5 with probability 0.5 Obviously u does not have a t1 distribution; in fact, u isn’t even a continuous random variable. Thus, we’ve found that u 6∼ t1 despite having (1) hold. 3. Since x ∼ N (2, 1) independent of y ∼ N (0, 1), together they are multivariate normal: x 2 1 0 ∼N , , y 0 0 1 where Cov(x, y) = Cov(y, x) = 0 since x ⊥ y. Put µ = [2, 0]0 so that we can write [x, y]0 ∼ N (µ, I2×2 ). Using the results on non-central chi-square distributions (slide 15 of set 1), x 2 2 x + y = x, y ∼ χ22 (µ0 µ/2), y where the non-centrality parameter reduces to 4 1 2 2, 0 = = 2. µ µ/2 = 0 2 2 0 This gives us x2 + y 2 ∼ χ22 (2), so that p P x2 + y 2 > 3 = P x2 + y 2 > 9 = P χ22 (2) > 9 . We can compute the above probability in R using the pchisq() function. The documentation brought up by ?pchisq says “The non-central chi-squared distribution with df = n degrees of freedom and noncentrality parameter ncp = λ [...] this is the distribution of the sum of squares of n normals each with variance one, λ being the sum of squares of the normal means.” 2 This implies R parameterizes the non-centrality parameter using µ0 µ = µ21 +· · ·+µ2n (this is the sum of squares of normal means) rather than µ0 µ/2. So, we need to double the non-centrality parameter we obtained above when inputing the function arguments. The code > pchisq(9, df = 2, ncp = 4, lower.tail = FALSE) gives the result p 2 2 P x + y > 3 = 0.2144. Comments: Some students did a Monte Carlo experiment to approximate this probability; others did a change of variable transformation or used a moment generating function to establish the distribution of x2 +y 2 . Noticing that we can write x2 +y 2 in terms of a multivariate normal vector and applying the results in the slide set, as done above, is not an approximation (as Monte Carlo methods are), and may be easier than a change of variable transformation or using a moment generating function. iid 4. We have z1 , z2 ∼ N (0, 1). By slide 12 of set 1, z1 ∼ N (02×2 , I2×2 ). z2 (a) Let A = h √1 , − √1 2 2 i . By slide 13 of set 1, h i z z1 1 1 z 1 1 1 √ (z1 − z2 ) = √ 1, −1 = √2 , − √2 = A 1 ∼ N (0, AA0 ). z2 z2 z2 2 2 1 1 1 1 0 √ √ √ √ Since AA = 2 · 2 + − 2 · − 2 = 1, we have 1 √ (z1 − z2 ) ∼ N (0, 1). 2 Squaring, by slide 16 of set 1, we obtain a central chi-square random variable: 1 (z1 − z2 )2 ∼ χ21 . 2 (b) Notice that √1 (z1 + z2 ) z1 + z2 =q 2 . |z1 − z2 | 1 2 /1 (z − z ) 2 2 1 (2) Similar to the first step in part (a), the numerator in (2) is standard normal: 1 √ (z1 + z2 ) ∼ N (0, 1). 2 From the result in part (a), the part of the denominator in (2) under the square root is a central chi-square on one degree of freedom. 3 To have a t1 distribution, we need to show that the random variables in the numerator and under the square root in the denominator in (2) are independent. We can do this by showing that [z1 + z2 , z1 − z2 ]0 has bivariate normal distribution with zero covariances. Let 1 1 B= , 1 −1 so that z1 + z2 1 1 z1 z = =B 1 . z1 − z2 1 −1 z2 z2 Then 1 1 1 1 2 0 BB = = , 1 −1 1 −1 0 2 0 and we can use the result on slide 13 of set 1 to obtain z1 + z2 0 2 0 ∼N , . z1 − z2 0 0 2 Since z1 + z2 and z1 − z2 are jointly normally distributed, Cov(z1 + z2 , z1 − z2 ) = 0 implies that they are independent. Consequently, √12 (z1 + z2 ) is independent of 12 (z1 − z2 )2 . By the result on slide 21 of set 25, √1 (z1 + z2 ) z1 + z2 =q 2 ∼ t1 . |z1 − z2 | 1 2 /1 (z − z ) 2 2 1 Comments: • Many students failed to check the independence condition needed to establish a t1 distribution in (b). See the comment section in question 1 for the potential perils of doing so. Other students did not show that z1 + z2 and z1 − z2 were jointly normally distributed; without this, zero covariance does not imply independence. A web search will provide counterexamples where x and y are both normally distributed with Cov(x, y) = 0, but x 6⊥ y because [x, y]0 is not bivariate normal. • Some students did a change of variable transformation or used a moment generating +z2 function, others showed that |zz11 −z is a ratio of independent standard normal random 2| variables and hence Cauchy(0,1), which is the same distribution as t1 . 4 iid 5. We are given y1 , . . . , yn ∼ N (µ, σ 2 ). (a) Using properties of matrix transposition, n 1 X s = (yi − ȳ )2 n − 1 i=1 2 1 (y − ȳ 1n×1 )0 (y − ȳ 1n×1 ) n−1 !0 ! n n 1 1X 1X = y− yi 1n×1 y− yi 1n×1 n−1 n i=1 n i=1 !0 ! n n X X 1 1 1 = y − 1n×1 yi y − 1n×1 yi n−1 n n i=1 i=1 0 1 1 1 0 0 = y − 1n×1 1n×1 y y − 1n×1 1n×1 y n−1 n n 0 1 1 1 0 0 = In×n − 1n×1 1n×1 y In×n − 1n×1 1n×1 y n−1 n n 0 1 1 1 0 0 0 y In×n − 1n×1 1n×1 In×n − 1n×1 1n×1 y = n−1 n n 0 1 1 1 0 0 0 =y In×n − 1n×1 1n×1 In×n − 1n×1 1n×1 y n−1 n n 0 = y By, = where 1 1 0 0 0 In×n − 1n×1 1n×1 ) (In×n − 1n×1 1n×1 . n n In×n − P1n×1 , because 1 B= n−1 Notice that B = 1 n−1 1 1n×1 1n×1 = 1n×1 (n−1 )1n×1 n = 1n×1 (10n×1 1n×1 )−1 1n×1 = P1n×1 . Comments: As seen above, we do not need to assume any model for y (e.g., we do not need the GMMNE on slide 12 of set 2) to show the existence of B such that s2 = y 0 By. (b) From part (a), we have B = 1 (I n−1 n×n iid − P1n×1 ). Since yi ∼ N (µ, σ 2 ), it follows that [y1 , . . . , yn ]0 = y ∼ N (µ, Σ) , where µ = µ1n×1 and Σ = σ 2 In×n . Clearly Σ is positive definite, assuming that σ 2 > 0: 5 for any non-zero t = [t1 , . . . , tn ]0 ∈ Rn (i.e., t 6= 0n×1 ), we have t0 Σt = t0 [σ 2 In×n ]t = σ 2 t0 In×n t = σ 2 t0 t = σ 2 (t21 + · · · + t2n ) >0 since t 6= 0n×1 implies ti 6= 0 for at least one i ∈ { 1, . . . , n }. Set A = n−1 B. By properties of rank and the results on slide 5 of set 2 that pertain to σ2 the rank of an orthogonal projection matrix, n−1 1 rank(A) = rank (In×n − P1n×1 ) · σ2 n−1 = rank(In×n − P1n×1 ) = n − rank(1n×1 ) = n − 1. As an orthogonal projection matrix, P1n×1 is symmetric, and clearly In×n is symmetric. Hence A is also symmetric. We also have that AΣ is idempotent: 1 n−1 1 n−1 2 2 · (In×n − P1n×1 ) σ In×n · (In×n − P1n×1 ) σ In×n AΣAΣ = σ2 n−1 σ2 n−1 = (In×n − P1n×1 )(In×n − P1n×1 ) = (In×n − P1n×1 ) 1 n−1 2 · (In×n − P1n×1 ) σ In×n = σ2 n−1 = AΣ, since In×n − P1n×1 is idempotent (this is easy to show using the fact that P1n×1 is idempotent). We have now established all the ingredients that we need to apply the result on slide 18 of set 1. Hence, n−1 2 s ∼ χ2n−1 (µ0 Aµ/2). y 0 Ay = σ2 6 The non-centrality parameter reduces to zero: µ0 Aµ/2 = [µ1n×1 ]0 1 n−1 1 (In×n − P1n×1 ) [µ1n×1 ] · 2 σ n−1 2 µ2 0 1 (In×n − P1n×1 )1n×1 2σ 2 n×1 µ2 = 2 (10n×1 In×n 1n×1 − 10n×1 P1n×1 1n×1 ) 2σ µ2 = 2 (10n×1 1n×1 − 10n×1 1n×1 (10n×1 1n×1 )−1 10n×1 1n×1 ) 2σ µ2 0 = 2 (1n×1 1n×1 − 10n×1 1n×1 ) 2σ µ2 = 2 (n−1 − n−1 ) 2σ = 0, = proving the desired result that n−1 2 s ∼ χ2n−1 . 2 σ Comments: Many students didn’t establish all the conditions listed on slide 18, such as A is symmetric or that y is multivariate normal. A few students assumed the noncentrality parameter was zero without any mention of µ0 Aµ/2, and others only showed minimal work in establishing that µ0 Aµ/2 = 0. 6. (a) Consider a matrix  Am×n a11  a21  =  ..  . a12 a22 .. . ··· ··· .. . am1 am2 · · ·  a1n a2n   ..  , .  amn so that    a11 a21 · · · am1 a11 a12 · · · a1n  a12 a22 · · · am2   a21 a22 · · · a2n     0 A A =  .. .. ..   .. .. ..  . . . .  . . . . .  . . .  a1n a2n · · · anm am1 am2 · · · amn   Pm P Pm m 2 a a a · · · a a i1 i2 i1 in i1 i=1 i=1 i=1 P Pm m 2  Pm ai1 ai2  ···  i=1 i=1 ai2 i=1 ai2 ain  = . .. .. .. ...   . . . Pm Pm Pm 2 i=1 ai1 ain i=1 ai2 ain · · · i=1 ain ⇒: [“only if” part] Suppose that A = 0m×n . Clearly then A0n×m = 0n×m , which implies A0 A = 0n×m 0m×n = 0n×n . 0 ⇐: [“if” part] Suppose that A A = 0n×n . This requires that the diagonal elements of P 0 2 A A are only zeros, that is, m i=1 aij = 0 for j = 1, . . . n. Consequently, aij = 0 for 7 j = 1, . . . , n, i = 1, . . . , m, which implies A = 0m×n . Thus, A = 0 ⇐⇒ A0 A = 0. Comments: This question, as well as the next one, requires a proof of an “if and only if” statement. This means you need to consider both directions (both the “if” and “only if” parts) to prove the desired conclusion. While it is trivial to many that A = 0 =⇒ A0 A = 0, if you don’t include this in your proof, you can’t claim to have proven the desired result. Additionally, your proof needs to hold for a matrix A with any dimension, not just 2 × 2. (b) ⇐: [“if” part] Suppose XA = XB. This part of the proof is trivial: multiplying both sides by X 0 gives X 0 XA = X 0 XB. ⇒: [“only if” part] Conversely, suppose X 0 XA = X 0 XB. By properties of matrix algebra and transpose, as well as the result of part (a), we have X 0 XA = X 0 XB =⇒ X 0 XA − X 0 XB = 0 =⇒ X 0 X(A − B) = 0 =⇒ (A − B)0 X 0 X(A − B) = 0 0 =⇒ X(A − B) X(A − B) = 0 =⇒ X(A − B) = 0 =⇒ XA − XB = 0 =⇒ XA = XB. by part (a) Therefore, X 0 XA = X 0 XB ⇐⇒ XA = XB. Comments: As in part (a), a proof of an“if and only if” statement requires both directions to be complete. Additionally, note that depending on the matrix dimensions, we may have 0 X(A − B) X(A − B) = 0n×n 6= 0m×n = X(A − B), where m, n ∈ { 1, 2, . . . }. (c) Let (X 0 X)− be any generalized inverse of X 0 X, which by definition implies X 0 X(X 0 X)− X 0 X = X 0 X, where X has dimension m × n, say. Put A = (X 0 X)−1 X 0 X and B = In×n , so that X 0 X (X 0 X)− X 0 X = X 0 X = X 0 X In×n =⇒ X 0 XA = X 0 XB. |{z} {z } | B A By the result of part (b), it follows that XA = XB, and hence X(X 0 X)− X 0 X = X. 8 (d) Let A be any symmetric matrix and G be any generalized inverse of A. By definition, AGA = A. Now, transpose both sides and use the fact that A0 = A by symmetry: (AGA)0 = A0 =⇒ A0 G0 A0 = A0 =⇒ AG0 A = A. Hence, G0 is a generalized inverse of A. (e) Let G be any generalized inverse of X 0 X. Notice that X 0 X is symmetric, so by part (d), G0 is also a generalized inverse of X 0 X. The result of part (c) holds for any generalized inverse of X 0 X, and hence holds using G0 . Using the result of part (c) with G0 and then taking transposes gives XG0 X 0 X = X =⇒ (XG0 X 0 X)0 = X 0 =⇒ X 0 [X 0 ]0 [G0 ]0 X 0 = X 0 =⇒ X 0 XGX 0 = X 0 . Because we chose G to be any generalized inverse of X 0 X, X 0 X(X 0 X)− X 0 = X 0 . Comments: We could have (X 0 X)− 6= [(X 0 X)− ]0 , so it is important that (c) and (d) are used at the right steps in your proof so it is clear that you aren’t trying to say (X 0 X)− = [(X 0 X)− ]0 . On a related note, we may also have [(X 0 X)− ]0 6= [(X 0 X)0 ]− . (f) This part requires two proofs that PX is idempotent for full credit. 1. By part (c), PX PX = X(X 0 X)− X 0 X (X 0 X)− X 0 {z } | X 0 = X(X X)− X 0 = PX . 2. By part (e), PX PX = X(X 0 X)− X 0 X(X 0 X)− X 0 | {z } X0 0 − = X(X X) X = PX . 9 0 (g) Let G1 and G2 be any generalized inverses of X 0 X. By parts (c) and (e), we have XG1 X 0 = XG1 X 0 XG2 X 0 | {z } part (e) holds for any generalized inverse of X 0 X = XG1 X X G2 X 0 | {z } part (c) holds for any generalized inverse of X 0 X X0 0 X = XG2 X 0 . Comments: A few students stated that G1 6= G2 at the beginning of their proof, and hence did not prove that the desired result holds for any two generalized inverses of X 0 X, which was your task. Others tried to use the fact that PX is the same matrix regardless of which generalized inverse of X 0 X is used, but this is what we are trying to show. (h) Let (X 0 X)− be any generalized inverse of X 0 X. We know that X 0 X is a symmetric matrix, so the result of part (d) says that if (X 0 X)− is a generalized inverse of X 0 X, then [(X 0 X)− ]0 is a generalized inverse of X 0 X. The result of part (g) then establishes that X(X 0 X)− X 0 = X[(X 0 X)− ]0 X 0 . Hence, these results and properties of matrix transpose give 0 0 PX = X(X 0 X)− X 0 = [X 0 ]0 [(X 0 X)− ]0 X 0 = X[(X 0 X)− ]0 X 0 = X(X 0 X)− X 0 by parts (d,g) as explained above = PX . Comments: It is important to use parts (d) and (g) at the right steps in your proof so it is clear that you aren’t trying to say (X 0 X)− = [(X 0 X)− ]0 . 7. Let X be an n × p matrix and y be an n × 1 vector. Suppose that z ∈ C(X) and z 6= PX y, which implies (PX y − z) 6= 0n×1 . Observe that z ∈ C(X) implies that PX z = z. Using this result and the fact that PX is symmetric and idempotent, it follows that (y − PX y)0 (PX y − z) = (y 0 − [PX y]0 )(PX y − z) 0 )(PX y − z) = (y 0 − y 0 PX 0 0 = (y − y PX )(PX y − z) = y 0 PX y − y 0 z − y 0 PX PX y + y 0 PX z = y 0 PX y − y 0 z − y 0 PX y + y 0 PX z = −y 0 z + y 0 z = 0. Now that we have (y − PX y)0 (PX y − z) = 0 and (PX y − z) 6= 0, we can use the same 10 argument provided in the homework with a = y − PX y and b = PX y − z: ky − zk2 = ky − PX y + PX y − zk2 = (y − PX y + PX y − z)0 (y − PX y + PX y − z) = (y − PX y)0 + (PX y − z)0 y − PX y + PX y − z = (y − PX y)0 (y − PX y) + 2(y − PX y)0 (PX y − z) + (PX y − z)0 (PX y − z) = ky − PX yk2 + kPX y − zk2 > ky − PX yk2 . Hence, ky − zk > ky − PX yk, which says that PX y is the unique point in C(X) that is closest to y in Euclidean distance. Comments: You can instead show that (y − PX y)0 (PX y − z) = 0 by orthogonality, but as this is a proof, you need to provide sufficient reasoning or work to establish this. 11

STAT 510 Homework 1 Solutions Spring 2016

Related documents

Products

Support

STAT 510 Homework 1 Solutions Spring 2016

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib