STAT 510 Homework 1 Solutions Spring 2016

advertisement
STAT 510
Homework 1 Solutions
Spring 2016
1. We can express A and B in terms of vectors:

    
1 5
1
5
A = 4 −1 = 4 , −1 = a1 , a2 ,
3×2
0 1
0
1
0 b
3, 3, −2, 4
3 3 −2 4
B =
=
.
= (1)
5 −1 2 8
5, −1, 2, 8
b0(2)
2×4
Then AB is a sum of two matrices:
X
2
b0(1)
=
ai b0(i)
AB = a1 , a2
b0(2)
3×4
i=1
 
 
1 5 


= 4 3 3 −2 4 + −1 5 −1 2 8
0
1

 

3 3 −2 4
25 −5 10 40
= 12 12 −8 16 + −5 1 −2 −8
0 0 0 0
5 −1 2
8


28 −2 8 44
=  7 13 −10 8  .
5 −1 2
8
2. Suppose u ∼ tm (δ). Clearly there exist independent random variables y ∼ N (δ, 1) and
w ∼ χ2m . By slide 20 of set 1,
y
p
∼ tm (δ).
w/m
p
Since both u and the random variable defined by y/ w/m have tm (δ) distributions, they
must have the same cdf, and thus they are equal in distribution:
y
d
.
u= p
w/m
Squaring, we obtain
d
u2 =
y2
.
w/m
Since y ∼ N (δ, 1), we have y 2 ∼ χ21 (δ 2 /2) by slide 15 of set 1. Further, since y and w are
independent, y 2 and w are also independent (see Theorem 4.3.5 in Casella and Berger1 ). By
slide 22 of set 1, it follows that
d
u2 =
y2
y 2 /1
=
∼ F1,m (δ 2 /2).
w/m
w/m
Hence, u2 ∼ F1,m (δ 2 /2).
1
Casella, G. and Berger, R.L. (2002) Statistical Inference, 2nd edition, Duxubry Press.
1
Comments:
• Some of you said that since u ∼ tm (δ), there
p exist independent random variables y ∼
2
N (δ, 1) and w ∼ χm such that u = y/ w/m. This is a much stronger claim than
p
p
d
equality in distribution (note that u = y/ w/m =⇒ u = y/ w/m, but not the
converse), and is not needed here.
• Many of you forgot about the independence condition! This is a very important condition,
p
highlighted by this counterexample. Suppose y ∼ N (0, 1). Let w = y 2 and u = y/ w/1.
By slide 16 of set 1, w ∼ χ21 . Then we have
y
,
u= p
w/1
where y ∼ N (0, 1) and w ∼ χ21 .
(1)
Since w = y 2 ,
y
y
y
u= p
=p
=
=
|y|
w/1
y 2 /1
−1
1
with probability 0.5
with probability 0.5
Obviously u does not have a t1 distribution; in fact, u isn’t even a continuous random
variable. Thus, we’ve found that u 6∼ t1 despite having (1) hold.
3. Since x ∼ N (2, 1) independent of y ∼ N (0, 1), together they are multivariate normal:
x
2
1 0
∼N
,
,
y
0
0 1
where Cov(x, y) = Cov(y, x) = 0 since x ⊥ y.
Put µ = [2, 0]0 so that we can write [x, y]0 ∼ N (µ, I2×2 ). Using the results on non-central
chi-square distributions (slide 15 of set 1),
x
2
2
x + y = x, y
∼ χ22 (µ0 µ/2),
y
where the non-centrality parameter reduces to
4
1 2
2, 0
= = 2.
µ µ/2 =
0
2
2
0
This gives us x2 + y 2 ∼ χ22 (2), so that
p
P
x2 + y 2 > 3 = P x2 + y 2 > 9 = P χ22 (2) > 9 .
We can compute the above probability in R using the pchisq() function. The documentation
brought up by ?pchisq says
“The non-central chi-squared distribution with df = n degrees of freedom and noncentrality parameter ncp = λ [...] this is the distribution of the sum of squares of n
normals each with variance one, λ being the sum of squares of the normal means.”
2
This implies R parameterizes the non-centrality parameter using µ0 µ = µ21 +· · ·+µ2n (this is the
sum of squares of normal means) rather than µ0 µ/2. So, we need to double the non-centrality
parameter we obtained above when inputing the function arguments. The code
> pchisq(9, df = 2, ncp = 4, lower.tail = FALSE)
gives the result
p
2
2
P
x + y > 3 = 0.2144.
Comments: Some students did a Monte Carlo experiment to approximate this probability;
others did a change of variable transformation or used a moment generating function to
establish the distribution of x2 +y 2 . Noticing that we can write x2 +y 2 in terms of a multivariate
normal vector and applying the results in the slide set, as done above, is not an approximation
(as Monte Carlo methods are), and may be easier than a change of variable transformation
or using a moment generating function.
iid
4. We have z1 , z2 ∼ N (0, 1). By slide 12 of set 1,
z1
∼ N (02×2 , I2×2 ).
z2
(a) Let A =
h
√1 , − √1
2
2
i
. By slide 13 of set 1,
h
i z z1
1
1 z
1
1
1
√ (z1 − z2 ) = √ 1, −1
= √2 , − √2
= A 1 ∼ N (0, AA0 ).
z2
z2
z2
2
2
1
1
1
1
0
√
√
√
√
Since AA = 2 · 2 + − 2 · − 2 = 1, we have
1
√ (z1 − z2 ) ∼ N (0, 1).
2
Squaring, by slide 16 of set 1, we obtain a central chi-square random variable:
1
(z1 − z2 )2 ∼ χ21 .
2
(b) Notice that
√1 (z1 + z2 )
z1 + z2
=q 2
.
|z1 − z2 |
1
2 /1
(z
−
z
)
2
2 1
(2)
Similar to the first step in part (a), the numerator in (2) is standard normal:
1
√ (z1 + z2 ) ∼ N (0, 1).
2
From the result in part (a), the part of the denominator in (2) under the square root is
a central chi-square on one degree of freedom.
3
To have a t1 distribution, we need to show that the random variables in the numerator
and under the square root in the denominator in (2) are independent. We can do this by
showing that [z1 + z2 , z1 − z2 ]0 has bivariate normal distribution with zero covariances.
Let
1 1
B=
,
1 −1
so that
z1 + z2
1 1
z1
z
=
=B 1 .
z1 − z2
1 −1 z2
z2
Then
1 1
1 1
2 0
BB =
=
,
1 −1 1 −1
0 2
0
and we can use the result on slide 13 of set 1 to obtain
z1 + z2
0
2 0
∼N
,
.
z1 − z2
0
0 2
Since z1 + z2 and z1 − z2 are jointly normally distributed, Cov(z1 + z2 , z1 − z2 ) = 0 implies
that they are independent. Consequently, √12 (z1 + z2 ) is independent of 12 (z1 − z2 )2 .
By the result on slide 21 of set 25,
√1 (z1 + z2 )
z1 + z2
=q 2
∼ t1 .
|z1 − z2 |
1
2 /1
(z
−
z
)
2
2 1
Comments:
• Many students failed to check the independence condition needed to establish a t1 distribution in (b). See the comment section in question 1 for the potential perils of doing so.
Other students did not show that z1 + z2 and z1 − z2 were jointly normally distributed;
without this, zero covariance does not imply independence. A web search will provide
counterexamples where x and y are both normally distributed with Cov(x, y) = 0, but
x 6⊥ y because [x, y]0 is not bivariate normal.
• Some students did a change of variable transformation or used a moment generating
+z2
function, others showed that |zz11 −z
is a ratio of independent standard normal random
2|
variables and hence Cauchy(0,1), which is the same distribution as t1 .
4
iid
5. We are given y1 , . . . , yn ∼ N (µ, σ 2 ).
(a) Using properties of matrix transposition,
n
1 X
s =
(yi − ȳ )2
n − 1 i=1
2
1
(y − ȳ 1n×1 )0 (y − ȳ 1n×1 )
n−1
!0
!
n
n
1
1X
1X
=
y−
yi 1n×1
y−
yi 1n×1
n−1
n i=1
n i=1
!0
!
n
n
X
X
1
1
1
=
y − 1n×1
yi
y − 1n×1
yi
n−1
n
n
i=1
i=1
0 1
1
1
0
0
=
y − 1n×1 1n×1 y
y − 1n×1 1n×1 y
n−1
n
n
0 1
1
1
0
0
=
In×n − 1n×1 1n×1 y
In×n − 1n×1 1n×1 y
n−1
n
n
0 1
1
1
0
0
0
y In×n − 1n×1 1n×1
In×n − 1n×1 1n×1 y
=
n−1
n
n
0 1
1
1
0
0
0
=y
In×n − 1n×1 1n×1
In×n − 1n×1 1n×1 y
n−1
n
n
0
= y By,
=
where
1
1
0
0
0
In×n − 1n×1 1n×1 ) (In×n − 1n×1 1n×1 .
n
n
In×n − P1n×1 , because
1
B=
n−1
Notice that B =
1
n−1
1
1n×1 1n×1 = 1n×1 (n−1 )1n×1
n
= 1n×1 (10n×1 1n×1 )−1 1n×1
= P1n×1 .
Comments: As seen above, we do not need to assume any model for y (e.g., we do not
need the GMMNE on slide 12 of set 2) to show the existence of B such that s2 = y 0 By.
(b) From part (a), we have B =
1
(I
n−1 n×n
iid
− P1n×1 ). Since yi ∼ N (µ, σ 2 ), it follows that
[y1 , . . . , yn ]0 = y ∼ N (µ, Σ) ,
where µ = µ1n×1 and Σ = σ 2 In×n . Clearly Σ is positive definite, assuming that σ 2 > 0:
5
for any non-zero t = [t1 , . . . , tn ]0 ∈ Rn (i.e., t 6= 0n×1 ), we have
t0 Σt = t0 [σ 2 In×n ]t
= σ 2 t0 In×n t
= σ 2 t0 t
= σ 2 (t21 + · · · + t2n )
>0
since t 6= 0n×1 implies ti 6= 0 for at least one i ∈ { 1, . . . , n }.
Set A = n−1
B. By properties of rank and the results on slide 5 of set 2 that pertain to
σ2
the rank of an orthogonal projection matrix,
n−1
1
rank(A) = rank
(In×n − P1n×1 )
·
σ2
n−1
= rank(In×n − P1n×1 )
= n − rank(1n×1 )
= n − 1.
As an orthogonal projection matrix, P1n×1 is symmetric, and clearly In×n is symmetric.
Hence A is also symmetric. We also have that AΣ is idempotent:
1
n−1
1
n−1
2
2
·
(In×n − P1n×1 ) σ In×n
·
(In×n − P1n×1 ) σ In×n
AΣAΣ =
σ2
n−1
σ2
n−1
= (In×n − P1n×1 )(In×n − P1n×1 )
= (In×n − P1n×1 )
1
n−1
2
·
(In×n − P1n×1 ) σ In×n
=
σ2
n−1
= AΣ,
since In×n − P1n×1 is idempotent (this is easy to show using the fact that P1n×1 is idempotent).
We have now established all the ingredients that we need to apply the result on slide 18
of set 1. Hence,
n−1 2
s ∼ χ2n−1 (µ0 Aµ/2).
y 0 Ay =
σ2
6
The non-centrality parameter reduces to zero:
µ0 Aµ/2 = [µ1n×1 ]0
1
n−1 1
(In×n − P1n×1 ) [µ1n×1 ] ·
2
σ n−1
2
µ2 0
1 (In×n − P1n×1 )1n×1
2σ 2 n×1
µ2
= 2 (10n×1 In×n 1n×1 − 10n×1 P1n×1 1n×1 )
2σ
µ2
= 2 (10n×1 1n×1 − 10n×1 1n×1 (10n×1 1n×1 )−1 10n×1 1n×1 )
2σ
µ2 0
= 2 (1n×1 1n×1 − 10n×1 1n×1 )
2σ
µ2
= 2 (n−1 − n−1 )
2σ
= 0,
=
proving the desired result that
n−1 2
s ∼ χ2n−1 .
2
σ
Comments: Many students didn’t establish all the conditions listed on slide 18, such
as A is symmetric or that y is multivariate normal. A few students assumed the noncentrality parameter was zero without any mention of µ0 Aµ/2, and others only showed
minimal work in establishing that µ0 Aµ/2 = 0.
6. (a) Consider a matrix

Am×n
a11
 a21

=  ..
 .
a12
a22
..
.
···
···
..
.
am1 am2 · · ·

a1n
a2n 

..  ,
. 
amn
so that



a11 a21 · · · am1
a11 a12 · · · a1n
 a12 a22 · · · am2   a21 a22 · · · a2n 



0
A A =  ..
..
..   ..
..
.. 
.
.
.
.
 .
.
.
.
.  .
.
. 
a1n a2n · · · anm
am1 am2 · · · amn

 Pm
P
Pm
m
2
a
a
a
·
·
·
a
a
i1
i2
i1
in
i1
i=1
i=1
i=1
P
Pm
m
2
 Pm ai1 ai2

···
 i=1
i=1 ai2
i=1 ai2 ain 
=
.
..
..
..
...


.
.
.
Pm
Pm
Pm 2
i=1 ai1 ain
i=1 ai2 ain · · ·
i=1 ain
⇒: [“only if” part] Suppose that A = 0m×n . Clearly then A0n×m = 0n×m , which implies
A0 A = 0n×m 0m×n = 0n×n .
0
⇐: [“if” part] Suppose that A
A = 0n×n . This requires that the diagonal elements of
P
0
2
A A are only zeros, that is, m
i=1 aij = 0 for j = 1, . . . n. Consequently, aij = 0 for
7
j = 1, . . . , n, i = 1, . . . , m, which implies A = 0m×n .
Thus, A = 0 ⇐⇒ A0 A = 0.
Comments: This question, as well as the next one, requires a proof of an “if and only if”
statement. This means you need to consider both directions (both the “if” and “only
if” parts) to prove the desired conclusion. While it is trivial to many that A = 0 =⇒
A0 A = 0, if you don’t include this in your proof, you can’t claim to have proven the
desired result. Additionally, your proof needs to hold for a matrix A with any dimension,
not just 2 × 2.
(b) ⇐: [“if” part] Suppose XA = XB. This part of the proof is trivial: multiplying both
sides by X 0 gives X 0 XA = X 0 XB.
⇒: [“only if” part] Conversely, suppose X 0 XA = X 0 XB. By properties of matrix
algebra and transpose, as well as the result of part (a), we have
X 0 XA = X 0 XB =⇒ X 0 XA − X 0 XB = 0
=⇒ X 0 X(A − B) = 0
=⇒ (A − B)0 X 0 X(A − B) = 0
0
=⇒ X(A − B) X(A − B) = 0
=⇒ X(A − B) = 0
=⇒ XA − XB = 0
=⇒ XA = XB.
by part (a)
Therefore, X 0 XA = X 0 XB ⇐⇒ XA = XB.
Comments: As in part (a), a proof of an“if and only if” statement requires both directions
to be complete. Additionally, note that depending on the matrix dimensions, we may
have
0
X(A − B) X(A − B) = 0n×n 6= 0m×n = X(A − B),
where m, n ∈ { 1, 2, . . . }.
(c) Let (X 0 X)− be any generalized inverse of X 0 X, which by definition implies
X 0 X(X 0 X)− X 0 X = X 0 X,
where X has dimension m × n, say. Put A = (X 0 X)−1 X 0 X and B = In×n , so that
X 0 X (X 0 X)− X 0 X = X 0 X = X 0 X In×n =⇒ X 0 XA = X 0 XB.
|{z}
{z
}
|
B
A
By the result of part (b), it follows that XA = XB, and hence
X(X 0 X)− X 0 X = X.
8
(d) Let A be any symmetric matrix and G be any generalized inverse of A. By definition,
AGA = A.
Now, transpose both sides and use the fact that A0 = A by symmetry:
(AGA)0 = A0 =⇒ A0 G0 A0 = A0
=⇒ AG0 A = A.
Hence, G0 is a generalized inverse of A.
(e) Let G be any generalized inverse of X 0 X. Notice that X 0 X is symmetric, so by part (d),
G0 is also a generalized inverse of X 0 X. The result of part (c) holds for any generalized
inverse of X 0 X, and hence holds using G0 . Using the result of part (c) with G0 and then
taking transposes gives
XG0 X 0 X = X =⇒ (XG0 X 0 X)0 = X 0
=⇒ X 0 [X 0 ]0 [G0 ]0 X 0 = X 0
=⇒ X 0 XGX 0 = X 0 .
Because we chose G to be any generalized inverse of X 0 X,
X 0 X(X 0 X)− X 0 = X 0 .
Comments: We could have
(X 0 X)− 6= [(X 0 X)− ]0 ,
so it is important that (c) and (d) are used at the right steps in your proof so it is clear
that you aren’t trying to say (X 0 X)− = [(X 0 X)− ]0 . On a related note, we may also
have [(X 0 X)− ]0 6= [(X 0 X)0 ]− .
(f) This part requires two proofs that PX is idempotent for full credit.
1. By part (c),
PX PX = X(X 0 X)− X 0 X (X 0 X)− X 0
{z
}
|
X
0
= X(X X)− X 0
= PX .
2. By part (e),
PX PX = X(X 0 X)− X 0 X(X 0 X)− X 0
|
{z
}
X0
0
−
= X(X X) X
= PX .
9
0
(g) Let G1 and G2 be any generalized inverses of X 0 X. By parts (c) and (e), we have
XG1 X 0 = XG1 X 0 XG2 X 0
|
{z
}
part (e) holds for any generalized inverse of X 0 X
= XG1 X X G2 X 0
| {z }
part (c) holds for any generalized inverse of X 0 X
X0
0
X
= XG2 X 0 .
Comments: A few students stated that G1 6= G2 at the beginning of their proof, and
hence did not prove that the desired result holds for any two generalized inverses of
X 0 X, which was your task. Others tried to use the fact that PX is the same matrix
regardless of which generalized inverse of X 0 X is used, but this is what we are trying to
show.
(h) Let (X 0 X)− be any generalized inverse of X 0 X. We know that X 0 X is a symmetric
matrix, so the result of part (d) says that if (X 0 X)− is a generalized inverse of X 0 X,
then [(X 0 X)− ]0 is a generalized inverse of X 0 X. The result of part (g) then establishes
that X(X 0 X)− X 0 = X[(X 0 X)− ]0 X 0 . Hence, these results and properties of matrix
transpose give
0
0
PX
= X(X 0 X)− X 0
= [X 0 ]0 [(X 0 X)− ]0 X 0
= X[(X 0 X)− ]0 X 0
= X(X 0 X)− X 0
by parts (d,g) as explained above
= PX .
Comments: It is important to use parts (d) and (g) at the right steps in your proof so it is
clear that you aren’t trying to say (X 0 X)− = [(X 0 X)− ]0 .
7. Let X be an n × p matrix and y be an n × 1 vector. Suppose that z ∈ C(X) and z 6= PX y,
which implies (PX y − z) 6= 0n×1 . Observe that z ∈ C(X) implies that PX z = z. Using this
result and the fact that PX is symmetric and idempotent, it follows that
(y − PX y)0 (PX y − z) = (y 0 − [PX y]0 )(PX y − z)
0
)(PX y − z)
= (y 0 − y 0 PX
0
0
= (y − y PX )(PX y − z)
= y 0 PX y − y 0 z − y 0 PX PX y + y 0 PX z
= y 0 PX y − y 0 z − y 0 PX y + y 0 PX z
= −y 0 z + y 0 z
= 0.
Now that we have (y − PX y)0 (PX y − z) = 0 and (PX y − z) 6= 0, we can use the same
10
argument provided in the homework with a = y − PX y and b = PX y − z:
ky − zk2 = ky − PX y + PX y − zk2
= (y − PX y + PX y − z)0 (y − PX y + PX y − z)
= (y − PX y)0 + (PX y − z)0 y − PX y + PX y − z
= (y − PX y)0 (y − PX y) + 2(y − PX y)0 (PX y − z) + (PX y − z)0 (PX y − z)
= ky − PX yk2 + kPX y − zk2
> ky − PX yk2 .
Hence, ky − zk > ky − PX yk, which says that PX y is the unique point in C(X) that is
closest to y in Euclidean distance.
Comments: You can instead show that (y − PX y)0 (PX y − z) = 0 by orthogonality, but as
this is a proof, you need to provide sufficient reasoning or work to establish this.
11
Download