STAT 510 Homework 3 Solutions Spring 2016

advertisement
STAT 510
Homework 3 Solutions
Spring 2016
1. [8pts] There are infinitely many possible examples. One example can be given using
1 0
1 2
A=
and G =
.
0 0
3 0
We can verify that G meets the definition of a generalized inverse of A:
1 0 1 2 1 0
1 0
AGA =
=
= A.
0 0 3 0 0 0
0 0
Clearly A is symmetric (i.e., A = A0 ), yet G is not symmetric:
1 2
1 3
G=
6=
= G0 .
3 0
2 0
Hence, a generalized inverse of a symmetric matrix need not be symmetric.
2. We are given that for any two matrices U and V that allow for the product matrix U V ,
rank(U V ) ≤ min { rank(U ), rank(V ) }
(1)
This says rank(U V ) is no larger than the smaller of the two quantities rank(U ) and rank(V ),
which implies
rank(U V ) ≤ rank(U ) and rank(U V ) ≤ rank(V ).
(a) [8pts] Let X be any matrix. Then,
rank(X 0 X) ≤ min { rank(X 0 ), rank(X) }
≤ rank(X)
= rank(PX X)
= rank(X(X 0 X)− X 0 X)
≤ min rank(X(X 0 X)− ), rank(X 0 X)
≤ rank(X 0 X).
by (1)
since PX X = X
since PX = X(X 0 X)− X 0
by (1)
The above says that rank(X 0 X) is simultaneously no larger and no smaller than rank(X),
that is,
rank(X) ≤ rank(X 0 X) and rank(X) ≥ rank(X 0 X),
which implies that
rank(X) = rank(X 0 X).
Comments: An alternative solution can be given by appealing to some linear algebra facts. Some
students approached this problem by stating that X and X 0 X must have the same null space because
Xz = 0 ⇐⇒ X 0 Xz = 0.
1
(2)
However, a few additional steps are needed for this to establish the desired result. First, why does (2)
imply that X and X 0 X have the same null space? For an m × n real-valued matrix A, its null space,
Null(A), is defined as
Null(A) = { z ∈ Rn : Az = 0 } .
From the definition, clearly (2) implies Null(X) = Null(X 0 X). Now, how does this imply equality of
ranks? The rank-nullity theorem says that for any m × n matrix A,
rank(A) + dim Null(A) = n.
Noting that if X is m × n, then X 0 X is n × n, so the rank-nullity theorem says
rank(X) + dim Null(X) = n = rank(X 0 X) + dim Null(X 0 X) .
Since Null(X) = Null(X 0 X), the two sets must have the same dimension, implying that rank(X) =
rank(X 0 X).
(b) [8pts] Let X be any matrix. Then,
rank(X) = rank(PX X)
≤ min { rank(PX ), rank(X) }
≤ rank(PX )
= rank X(X 0 X)− X 0
≤ min rank(X), rank (X 0 X)− X 0
≤ rank(X).
since PX X = X
by (1)
since PX = X(X 0 X)− X 0
by (1)
Inequality in both directions implies equality; therefore,
rank(X) = rank(PX ).
(c) [9pts] Let X be an n × p matrix. Suppose C is a q × p matrix of rank q and that there
exists a matrix A such that C = AX.
First, let us verify that C(X 0 X)− C 0 is q × q. Because X is a n × p matrix, X 0 X is p × p
matrix. Consequently, any generalized inverse (X 0 X)− of X 0 X is also p × p. Then,
C (X 0 X)− C 0
{z } p×q
q×p |
p×p
is clearly a q × q matrix.
Next, we will show that C(X 0 X)− C 0 has rank q. In order to do so, I will use following
fact: for any matrix X,
rank(X) = rank(X 0 ),
(3)
because the rank of a matrix can be defined as the number of linearly independent rows
or columns.
2
Now, consider
rank C(X 0 X)− C 0 ≤ min rank C , rank (X 0 X)− C 0
≤ rank C
=q
= rank(C)
= rank(AX)
= rank(APX X)
≤ min { rank(APX ), rank(X) }
≤ rank(APX )
= rank [APX ]0
0
= rank PX
A0
0
0
A0 ]0 P X
A0
= rank [PX
0
= rank APX PX
A0
= rank APX PX A0
= rank APX A0
= rank A[X(X 0 X)− X 0 ]A0
= rank [AX](X 0 X)− [X 0 A0 ]
= rank [AX](X 0 X)− [AX]0
= rank C(X 0 X)− C 0 .
by (1)
by assumption
since PX X = X
by (1)
by (3)
0
by part (a) using X = PX
A0
PX is symmetric
PX is idempotent
since PX = X(X 0 X)− X 0
Therefore, C(X 0 X)− C 0 is a q × q matrix of rank q.
(d) [8pts] Suppose X is an n × p matrix and A is a matrix with n columns satisfying
APX = A. Then,
rank A = rank APX
= rank AX(X 0 X)− X 0
(def. of PX )
0
−
0
≤ min rank AX , rank (X X) X
by (1)
≤ rank AX
≤ min rank A , rank X
by (1)
≤ rank A .
Inequality in both directions implies equality; therefore,
rank(AX) = rank(A).
3. Suppose y = Xβ + ε, where ε ∼ N 0, σ 2 I for some unknown σ 2 > 0. Let ŷ = PX y. By
the results on multivariate normal distributions from slide set 1 (“Preliminaries”), we have
y ∼ N Xβ, σ 2 I .
3
(a) [12pts] Notice that we want to find the distribution of a linear transformation of y:
ŷ
PX y
PX
=
=
y.
(4)
y − ŷ
y − PX y
I − PX
Together, slides 13–14 of the first slide set (“Preliminaries”) imply that a linear transformation, say Ax + b, of a normal random variable x is also normal:
x ∼ N (µ, Σ) =⇒ Ax + b ∼ N (Aµ + b, AΣA0 ).
The mean of the linear transformation in (4), since PX X = X, is
PX
PX
E
y =
E (y)
I − PX
I − PX
PX
=
Xβ
I − PX
PX Xβ
=
I − PX Xβ
PX Xβ
=
IXβ − PX Xβ
Xβ
=
Xβ − Xβ
Xβ
=
.
0
4
The variance of (4), since PX is symmetric and idempotent, is
0
PX
PX
PX
Var
y =
Var(y)
I − PX
I − PX
I − PX
0
0 PX
=
Var(y) PX
, I − PX
I − PX
0
0 PX
=
σ 2 I PX
, I − PX
I − PX
0
0 PX
2
=σ
PX , I − PX
I − PX
0 0
PX I − PX
PX PX
2
0
0
=σ
I − PX PX
I − PX I − PX
0
0
0
PX PX
P
I
−
P
X
2
X
0
=σ
0
I − PX PX
I − PX I 0 − PX
PX PX
PX I− PX 2
=σ
I − PX PX I − PX I − PX
PX PX
PX I − PX PX
2
=σ
IPX − PX PX II − IPX − PX I + PX PX
PX
PX − PX
2
=σ
PX − PX I − PX − PX + PX
0
2 PX
=σ
.
0 I − PX
As a linear transformation of a multivariate normal random variable, it follows that
ŷ
Xβ
0
2 PX
∼N
, σ
.
y − ŷ
0
0 I − PX
Comments: it is important to notice that the expected values of y and ŷ depend on the parameter
β and not the estimate β̂, which is a random variable. Similarly, the variances do not depend on the
estimate σ̂ 2 . Additionally, some students did not give the covariances or only stated that they were zero
without any derivation (the distribution of a multivariate normal random variable is not fully specified
without the covariances).
(b) [12pts] Again using the fact that PX is symmetric and idempotent, we see that ŷ 0 ŷ is
a quadratic form:
ŷ 0 ŷ = [PX y]0 PX y
0
PX y
= y 0 PX
0
= y PX PX y
= y 0 PX y,
where y ∼ N (Xβ, σ 2 I). Recall the results about quadratic forms in the first slide set
(“Preliminaries”) on slide 18. To apply these, want to find a symmetric matrix A such
that AΣ is idempotent for Σ ≡ Var(y). We can’t use PX as our A matrix because the
σ 2 doesn’t cancel:
PX Var(y) = PX σ 2 I = σ 2 PX .
5
Instead using A =
PX
,
σ2
which is symmetric, gives
AΣ = A Var(y) =
PX 2
σ I = PX ,
σ2
which we know is idempotent.
It is easy to verify that Σ = σ 2 I is positive definite since σ 2 > 0 (see homework 1
solutions). We can use the result of problem 2(b) to determine the rank of A:
rank(A) = rank(PX /σ 2 ) = rank(PX ) = rank(X).
Then,
PX
1 0
ŷ ŷ = y 0 2 y ∼ χ2rank(X) [Xβ]0 AXβ/2 ,
2
σ
σ
where the noncentrality parameter simplifies to
[Xβ]0 AXβ/2 = [Xβ]0
PX
1
Xβ
σ2
2
1 0 0
β X PX Xβ
2σ 2
1
= 2 β 0 X 0 Xβ.
2σ
=
Therefore, we end up with a scaled non-central chi-square random variable on rank(X)
degrees of freedom:
0 0
β X Xβ
0
2 2
ŷ ŷ ∼ σ χrank(X)
.
2σ 2
Comments: some students used results that apply to the sum of squared independent normal random
variables with variance one (e.g., the result on slide 15 of set 1; this assumes A = Σ = I). Others tried
to find the distribution of the quadratic form ŷ σI2 ŷ rather than simplifying to y PσX2 y; unfortunately this
doesn’t work because Var(ŷ) = σ 2 PX may not be positive definite.
4. For this problem, let µijk denote the mean for filler type i = 1, 2, surface treatment j = 1, 2,
and filler proportion k = 1, 2, 3 (corresponding to 25%, 50%, and 75%, respectively).
The R code below fits the cell means model to these data:
>
>
>
>
>
>
>
>
>
>
>
>
# Read in Dr. Nettleton’s functions in order to use test() and estimate().
source(’http://www.public.iastate.edu/~dnett/S510/04TwoFactorCellMeans.R’)
# Read in data.
webloc <- c(’http://www.public.iastate.edu/~dnett/S510/FabricLoss.txt’)
dat <- read.table(paste(webloc), header = TRUE)
# Convert covariates to factors.
dat$S <- factor(dat$surface); dat$F <- factor(dat$filler); dat$p <- factor(dat$p)
# Fit cell means model.
fit <- lm(y ~ F:S:p + 0, data = dat)
6
From the output below, notice that R is using the parameterization
β = [µ111 , µ211 , µ121 , µ221 , µ112 , µ212 , µ122 , µ222 , µ113 , µ213 , µ123 , µ223 ]0 .
(5)
> # Look at the vector of estimates.
> coef(fit)
F1:S1:p25 F2:S1:p25 F1:S2:p25 F2:S2:p25 F1:S1:p50 F2:S1:p50
201.0
213.0
164.0
148.5
237.0
233.5
F1:S2:p50 F2:S2:p50 F1:S1:p75 F2:S1:p75 F1:S2:p75 F2:S2:p75
187.5
113.5
267.0
234.5
232.0
143.5
Depending how you called lm(), you may not have the same parameterization of β as in (5),
and hence may require different C matrices in the following problems.
(a) From the output below,
µ̂121 = 164.00
and
se(µ̂121 ) = 11.59.
> # 4a: mean and se for F1, S2, 25% filler.
> summary(fit)
Call:
lm(formula = y ~ F:S:p + 0, data = dat)
Residuals:
Min
1Q
-26.000 -9.125
Median
0.000
3Q
9.125
Max
26.000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
F1:S1:p25
201.00
11.59 17.340 7.33e-10 ***
F2:S1:p25
213.00
11.59 18.375 3.74e-10 ***
F1:S2:p25
164.00
11.59 14.148 7.57e-09 ***
F2:S2:p25
148.50
11.59 12.811 2.33e-08 ***
F1:S1:p50
237.00
11.59 20.445 1.08e-10 ***
F2:S1:p50
233.50
11.59 20.143 1.28e-10 ***
F1:S2:p50
187.50
11.59 16.175 1.64e-09 ***
F2:S2:p50
113.50
11.59
9.791 4.50e-07 ***
F1:S1:p75
267.00
11.59 23.033 2.67e-11 ***
F2:S1:p75
234.50
11.59 20.229 1.22e-10 ***
F1:S2:p75
232.00
11.59 20.014 1.38e-10 ***
F2:S2:p75
143.50
11.59 12.379 3.42e-08 ***
(b) From the code and output below (which uses Dr. Nettleton’s estimate() function),
b̄1 = 214.75
µ
>
>
+
>
>
# 4b: LSMEANS for F1 and F2.
C.b <- c(rep(c(1,0), 6),
# F1
rep(c(0,1), 6))
# F2
C.b <- matrix(C.b / 6, nrow = 2,
estimate(fit, C.b)
c1
c2
c3
[1,] 0.1666667 0.0000000 0.1666667
[2,] 0.0000000 0.1666667 0.0000000
c9
c10
c11
[1,] 0.1666667 0.0000000 0.1666667
[2,] 0.0000000 0.1666667 0.0000000
and
b̄2 = 181.08.
µ
byrow = TRUE)
c4
0.0000000
0.1666667
c12
0.0000000
0.1666667
7
c5
c6
c7
c8
0.1666667 0.0000000 0.1666667 0.0000000
0.0000000 0.1666667 0.0000000 0.1666667
estimate
se 95% Conf.
limits
214.7500 4.732424 204.4389 225.0611
181.0833 4.732424 170.7723 191.3944
(c) From the code and output below (which uses Dr. Nettleton’s estimate() function),
b̄12 = 235.25.
µ
>
>
>
>
# 4c-d. LSMEAN and se for S1/p50.
C.c <- c(rep(0, 4), 1, 1, rep(0, 6))
C.c <- matrix(C.c / 2, nrow = 1)
estimate(fit, C.c)
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 estimate
se 95% Conf.
limits
[1,] 0 0 0 0 0.5 0.5 0 0 0
0
0
0
235.25 8.196798 217.3907 253.1093
(d) From the output in part (c),
b̄12 ) = 8.20.
se(µ
(e) The test for filler type main effects is given by
H0 : µ̄1 − µ̄2 = 0.
The code below (which uses Dr. Nettleton’s test() function) gives F = 25.30 and
p = 0.0003. These data provide strong evidence that there are filler type main effects.
That is, averaging over surface treatment and proportion of filler, there is strong evidence
that filler type F1 differs from F2 at preventing fabric loss in abrasive tests.
> # 4e. Filler type main effects.
> C.e <- matrix(rep(c(1, -1), 6), nrow = 1)
> test(fit, C.e) # Note d = 0 by default.
$Fstat
[1] 25.30481
$pvalue
[1] 0.0002939847
Comments: we are specifically interested in main effects here, not any kind of effect. Hence, comparing a
reduced cell-means model with only the factors surface treatment and filler proportion to a full cell-means
model with all three factors does not test for filler type main effects.
(f) The test for three-way interactions is given by
µ111 − µ121 − µ211 + µ221 − µ112 + µ122 + µ212 − µ222
0
H0 :
=
.
µ111 − µ121 − µ211 + µ221 − µ113 + µ123 + µ213 − µ223
0
The code below (which uses Dr. Nettleton’s test() function) gives F = 0.89 and p =
0.44. These data do not provide evidence of three-way interactions among filler type,
surface treatment, and filler proportion.
> # 4f. Three-factor interaction.
> C.f <- c(1, -1, -1, 1, -1, 1, 1, -1, 0, 0, 0, 0,
+
1, -1, -1, 1, 0, 0, 0, 0, -1, 1, 1, -1)
> C.f <- matrix(C.f, nrow = 2, byrow = TRUE)
> test(fit, C.f)
$Fstat
[1] 0.8903876
$pvalue
[1] 0.4359589
Comments: a large p-value (i.e., p > α) suggests that it possible that there are no three-way interactions,
but it does not say there are no three-way interactions. Why? There could be three-way interactions,
but they are too small (relative to the variability in the study) to detect.
8
(g) The test for two-way interactions between filler type and filler proportion is given by
µ̄11 − µ̄12 − µ̄21 + µ̄22
0
H0 :
=
.
µ̄11 − µ̄13 − µ̄21 + µ̄23
0
The code below (which uses Dr. Nettleton’s test() function) gives F = 6.57 and
p = 0.012. These data provide evidence that there are two-way interactions between
filler type and filler proportion.
> # 4g. Two-factor interaction between filler type and proportion.
> C.g <- c(1, -1, 1, -1, -1, 1, -1, 1, 0, 0, 0, 0,
+
1, -1, 1, -1, 0, 0, 0, 0, -1, 1, -1, 1)
> C.g <- matrix(C.g, nrow = 2, byrow = TRUE)
> test(fit, C.g)
$Fstat
[1] 6.565736
$pvalue
[1] 0.01185168
9
Download