STAT 510 Homework 3 Solutions Spring 2016 1. [8pts] There are infinitely many possible examples. One example can be given using 1 0 1 2 A= and G = . 0 0 3 0 We can verify that G meets the definition of a generalized inverse of A: 1 0 1 2 1 0 1 0 AGA = = = A. 0 0 3 0 0 0 0 0 Clearly A is symmetric (i.e., A = A0 ), yet G is not symmetric: 1 2 1 3 G= 6= = G0 . 3 0 2 0 Hence, a generalized inverse of a symmetric matrix need not be symmetric. 2. We are given that for any two matrices U and V that allow for the product matrix U V , rank(U V ) ≤ min { rank(U ), rank(V ) } (1) This says rank(U V ) is no larger than the smaller of the two quantities rank(U ) and rank(V ), which implies rank(U V ) ≤ rank(U ) and rank(U V ) ≤ rank(V ). (a) [8pts] Let X be any matrix. Then, rank(X 0 X) ≤ min { rank(X 0 ), rank(X) } ≤ rank(X) = rank(PX X) = rank(X(X 0 X)− X 0 X) ≤ min rank(X(X 0 X)− ), rank(X 0 X) ≤ rank(X 0 X). by (1) since PX X = X since PX = X(X 0 X)− X 0 by (1) The above says that rank(X 0 X) is simultaneously no larger and no smaller than rank(X), that is, rank(X) ≤ rank(X 0 X) and rank(X) ≥ rank(X 0 X), which implies that rank(X) = rank(X 0 X). Comments: An alternative solution can be given by appealing to some linear algebra facts. Some students approached this problem by stating that X and X 0 X must have the same null space because Xz = 0 ⇐⇒ X 0 Xz = 0. 1 (2) However, a few additional steps are needed for this to establish the desired result. First, why does (2) imply that X and X 0 X have the same null space? For an m × n real-valued matrix A, its null space, Null(A), is defined as Null(A) = { z ∈ Rn : Az = 0 } . From the definition, clearly (2) implies Null(X) = Null(X 0 X). Now, how does this imply equality of ranks? The rank-nullity theorem says that for any m × n matrix A, rank(A) + dim Null(A) = n. Noting that if X is m × n, then X 0 X is n × n, so the rank-nullity theorem says rank(X) + dim Null(X) = n = rank(X 0 X) + dim Null(X 0 X) . Since Null(X) = Null(X 0 X), the two sets must have the same dimension, implying that rank(X) = rank(X 0 X). (b) [8pts] Let X be any matrix. Then, rank(X) = rank(PX X) ≤ min { rank(PX ), rank(X) } ≤ rank(PX ) = rank X(X 0 X)− X 0 ≤ min rank(X), rank (X 0 X)− X 0 ≤ rank(X). since PX X = X by (1) since PX = X(X 0 X)− X 0 by (1) Inequality in both directions implies equality; therefore, rank(X) = rank(PX ). (c) [9pts] Let X be an n × p matrix. Suppose C is a q × p matrix of rank q and that there exists a matrix A such that C = AX. First, let us verify that C(X 0 X)− C 0 is q × q. Because X is a n × p matrix, X 0 X is p × p matrix. Consequently, any generalized inverse (X 0 X)− of X 0 X is also p × p. Then, C (X 0 X)− C 0 {z } p×q q×p | p×p is clearly a q × q matrix. Next, we will show that C(X 0 X)− C 0 has rank q. In order to do so, I will use following fact: for any matrix X, rank(X) = rank(X 0 ), (3) because the rank of a matrix can be defined as the number of linearly independent rows or columns. 2 Now, consider rank C(X 0 X)− C 0 ≤ min rank C , rank (X 0 X)− C 0 ≤ rank C =q = rank(C) = rank(AX) = rank(APX X) ≤ min { rank(APX ), rank(X) } ≤ rank(APX ) = rank [APX ]0 0 = rank PX A0 0 0 A0 ]0 P X A0 = rank [PX 0 = rank APX PX A0 = rank APX PX A0 = rank APX A0 = rank A[X(X 0 X)− X 0 ]A0 = rank [AX](X 0 X)− [X 0 A0 ] = rank [AX](X 0 X)− [AX]0 = rank C(X 0 X)− C 0 . by (1) by assumption since PX X = X by (1) by (3) 0 by part (a) using X = PX A0 PX is symmetric PX is idempotent since PX = X(X 0 X)− X 0 Therefore, C(X 0 X)− C 0 is a q × q matrix of rank q. (d) [8pts] Suppose X is an n × p matrix and A is a matrix with n columns satisfying APX = A. Then, rank A = rank APX = rank AX(X 0 X)− X 0 (def. of PX ) 0 − 0 ≤ min rank AX , rank (X X) X by (1) ≤ rank AX ≤ min rank A , rank X by (1) ≤ rank A . Inequality in both directions implies equality; therefore, rank(AX) = rank(A). 3. Suppose y = Xβ + ε, where ε ∼ N 0, σ 2 I for some unknown σ 2 > 0. Let ŷ = PX y. By the results on multivariate normal distributions from slide set 1 (“Preliminaries”), we have y ∼ N Xβ, σ 2 I . 3 (a) [12pts] Notice that we want to find the distribution of a linear transformation of y: ŷ PX y PX = = y. (4) y − ŷ y − PX y I − PX Together, slides 13–14 of the first slide set (“Preliminaries”) imply that a linear transformation, say Ax + b, of a normal random variable x is also normal: x ∼ N (µ, Σ) =⇒ Ax + b ∼ N (Aµ + b, AΣA0 ). The mean of the linear transformation in (4), since PX X = X, is PX PX E y = E (y) I − PX I − PX PX = Xβ I − PX PX Xβ = I − PX Xβ PX Xβ = IXβ − PX Xβ Xβ = Xβ − Xβ Xβ = . 0 4 The variance of (4), since PX is symmetric and idempotent, is 0 PX PX PX Var y = Var(y) I − PX I − PX I − PX 0 0 PX = Var(y) PX , I − PX I − PX 0 0 PX = σ 2 I PX , I − PX I − PX 0 0 PX 2 =σ PX , I − PX I − PX 0 0 PX I − PX PX PX 2 0 0 =σ I − PX PX I − PX I − PX 0 0 0 PX PX P I − P X 2 X 0 =σ 0 I − PX PX I − PX I 0 − PX PX PX PX I− PX 2 =σ I − PX PX I − PX I − PX PX PX PX I − PX PX 2 =σ IPX − PX PX II − IPX − PX I + PX PX PX PX − PX 2 =σ PX − PX I − PX − PX + PX 0 2 PX =σ . 0 I − PX As a linear transformation of a multivariate normal random variable, it follows that ŷ Xβ 0 2 PX ∼N , σ . y − ŷ 0 0 I − PX Comments: it is important to notice that the expected values of y and ŷ depend on the parameter β and not the estimate β̂, which is a random variable. Similarly, the variances do not depend on the estimate σ̂ 2 . Additionally, some students did not give the covariances or only stated that they were zero without any derivation (the distribution of a multivariate normal random variable is not fully specified without the covariances). (b) [12pts] Again using the fact that PX is symmetric and idempotent, we see that ŷ 0 ŷ is a quadratic form: ŷ 0 ŷ = [PX y]0 PX y 0 PX y = y 0 PX 0 = y PX PX y = y 0 PX y, where y ∼ N (Xβ, σ 2 I). Recall the results about quadratic forms in the first slide set (“Preliminaries”) on slide 18. To apply these, want to find a symmetric matrix A such that AΣ is idempotent for Σ ≡ Var(y). We can’t use PX as our A matrix because the σ 2 doesn’t cancel: PX Var(y) = PX σ 2 I = σ 2 PX . 5 Instead using A = PX , σ2 which is symmetric, gives AΣ = A Var(y) = PX 2 σ I = PX , σ2 which we know is idempotent. It is easy to verify that Σ = σ 2 I is positive definite since σ 2 > 0 (see homework 1 solutions). We can use the result of problem 2(b) to determine the rank of A: rank(A) = rank(PX /σ 2 ) = rank(PX ) = rank(X). Then, PX 1 0 ŷ ŷ = y 0 2 y ∼ χ2rank(X) [Xβ]0 AXβ/2 , 2 σ σ where the noncentrality parameter simplifies to [Xβ]0 AXβ/2 = [Xβ]0 PX 1 Xβ σ2 2 1 0 0 β X PX Xβ 2σ 2 1 = 2 β 0 X 0 Xβ. 2σ = Therefore, we end up with a scaled non-central chi-square random variable on rank(X) degrees of freedom: 0 0 β X Xβ 0 2 2 ŷ ŷ ∼ σ χrank(X) . 2σ 2 Comments: some students used results that apply to the sum of squared independent normal random variables with variance one (e.g., the result on slide 15 of set 1; this assumes A = Σ = I). Others tried to find the distribution of the quadratic form ŷ σI2 ŷ rather than simplifying to y PσX2 y; unfortunately this doesn’t work because Var(ŷ) = σ 2 PX may not be positive definite. 4. For this problem, let µijk denote the mean for filler type i = 1, 2, surface treatment j = 1, 2, and filler proportion k = 1, 2, 3 (corresponding to 25%, 50%, and 75%, respectively). The R code below fits the cell means model to these data: > > > > > > > > > > > > # Read in Dr. Nettleton’s functions in order to use test() and estimate(). source(’http://www.public.iastate.edu/~dnett/S510/04TwoFactorCellMeans.R’) # Read in data. webloc <- c(’http://www.public.iastate.edu/~dnett/S510/FabricLoss.txt’) dat <- read.table(paste(webloc), header = TRUE) # Convert covariates to factors. dat$S <- factor(dat$surface); dat$F <- factor(dat$filler); dat$p <- factor(dat$p) # Fit cell means model. fit <- lm(y ~ F:S:p + 0, data = dat) 6 From the output below, notice that R is using the parameterization β = [µ111 , µ211 , µ121 , µ221 , µ112 , µ212 , µ122 , µ222 , µ113 , µ213 , µ123 , µ223 ]0 . (5) > # Look at the vector of estimates. > coef(fit) F1:S1:p25 F2:S1:p25 F1:S2:p25 F2:S2:p25 F1:S1:p50 F2:S1:p50 201.0 213.0 164.0 148.5 237.0 233.5 F1:S2:p50 F2:S2:p50 F1:S1:p75 F2:S1:p75 F1:S2:p75 F2:S2:p75 187.5 113.5 267.0 234.5 232.0 143.5 Depending how you called lm(), you may not have the same parameterization of β as in (5), and hence may require different C matrices in the following problems. (a) From the output below, µ̂121 = 164.00 and se(µ̂121 ) = 11.59. > # 4a: mean and se for F1, S2, 25% filler. > summary(fit) Call: lm(formula = y ~ F:S:p + 0, data = dat) Residuals: Min 1Q -26.000 -9.125 Median 0.000 3Q 9.125 Max 26.000 Coefficients: Estimate Std. Error t value Pr(>|t|) F1:S1:p25 201.00 11.59 17.340 7.33e-10 *** F2:S1:p25 213.00 11.59 18.375 3.74e-10 *** F1:S2:p25 164.00 11.59 14.148 7.57e-09 *** F2:S2:p25 148.50 11.59 12.811 2.33e-08 *** F1:S1:p50 237.00 11.59 20.445 1.08e-10 *** F2:S1:p50 233.50 11.59 20.143 1.28e-10 *** F1:S2:p50 187.50 11.59 16.175 1.64e-09 *** F2:S2:p50 113.50 11.59 9.791 4.50e-07 *** F1:S1:p75 267.00 11.59 23.033 2.67e-11 *** F2:S1:p75 234.50 11.59 20.229 1.22e-10 *** F1:S2:p75 232.00 11.59 20.014 1.38e-10 *** F2:S2:p75 143.50 11.59 12.379 3.42e-08 *** (b) From the code and output below (which uses Dr. Nettleton’s estimate() function), b̄1 = 214.75 µ > > + > > # 4b: LSMEANS for F1 and F2. C.b <- c(rep(c(1,0), 6), # F1 rep(c(0,1), 6)) # F2 C.b <- matrix(C.b / 6, nrow = 2, estimate(fit, C.b) c1 c2 c3 [1,] 0.1666667 0.0000000 0.1666667 [2,] 0.0000000 0.1666667 0.0000000 c9 c10 c11 [1,] 0.1666667 0.0000000 0.1666667 [2,] 0.0000000 0.1666667 0.0000000 and b̄2 = 181.08. µ byrow = TRUE) c4 0.0000000 0.1666667 c12 0.0000000 0.1666667 7 c5 c6 c7 c8 0.1666667 0.0000000 0.1666667 0.0000000 0.0000000 0.1666667 0.0000000 0.1666667 estimate se 95% Conf. limits 214.7500 4.732424 204.4389 225.0611 181.0833 4.732424 170.7723 191.3944 (c) From the code and output below (which uses Dr. Nettleton’s estimate() function), b̄12 = 235.25. µ > > > > # 4c-d. LSMEAN and se for S1/p50. C.c <- c(rep(0, 4), 1, 1, rep(0, 6)) C.c <- matrix(C.c / 2, nrow = 1) estimate(fit, C.c) c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 estimate se 95% Conf. limits [1,] 0 0 0 0 0.5 0.5 0 0 0 0 0 0 235.25 8.196798 217.3907 253.1093 (d) From the output in part (c), b̄12 ) = 8.20. se(µ (e) The test for filler type main effects is given by H0 : µ̄1 − µ̄2 = 0. The code below (which uses Dr. Nettleton’s test() function) gives F = 25.30 and p = 0.0003. These data provide strong evidence that there are filler type main effects. That is, averaging over surface treatment and proportion of filler, there is strong evidence that filler type F1 differs from F2 at preventing fabric loss in abrasive tests. > # 4e. Filler type main effects. > C.e <- matrix(rep(c(1, -1), 6), nrow = 1) > test(fit, C.e) # Note d = 0 by default. $Fstat [1] 25.30481 $pvalue [1] 0.0002939847 Comments: we are specifically interested in main effects here, not any kind of effect. Hence, comparing a reduced cell-means model with only the factors surface treatment and filler proportion to a full cell-means model with all three factors does not test for filler type main effects. (f) The test for three-way interactions is given by µ111 − µ121 − µ211 + µ221 − µ112 + µ122 + µ212 − µ222 0 H0 : = . µ111 − µ121 − µ211 + µ221 − µ113 + µ123 + µ213 − µ223 0 The code below (which uses Dr. Nettleton’s test() function) gives F = 0.89 and p = 0.44. These data do not provide evidence of three-way interactions among filler type, surface treatment, and filler proportion. > # 4f. Three-factor interaction. > C.f <- c(1, -1, -1, 1, -1, 1, 1, -1, 0, 0, 0, 0, + 1, -1, -1, 1, 0, 0, 0, 0, -1, 1, 1, -1) > C.f <- matrix(C.f, nrow = 2, byrow = TRUE) > test(fit, C.f) $Fstat [1] 0.8903876 $pvalue [1] 0.4359589 Comments: a large p-value (i.e., p > α) suggests that it possible that there are no three-way interactions, but it does not say there are no three-way interactions. Why? There could be three-way interactions, but they are too small (relative to the variability in the study) to detect. 8 (g) The test for two-way interactions between filler type and filler proportion is given by µ̄11 − µ̄12 − µ̄21 + µ̄22 0 H0 : = . µ̄11 − µ̄13 − µ̄21 + µ̄23 0 The code below (which uses Dr. Nettleton’s test() function) gives F = 6.57 and p = 0.012. These data provide evidence that there are two-way interactions between filler type and filler proportion. > # 4g. Two-factor interaction between filler type and proportion. > C.g <- c(1, -1, 1, -1, -1, 1, -1, 1, 0, 0, 0, 0, + 1, -1, 1, -1, 0, 0, 0, 0, -1, 1, -1, 1) > C.g <- matrix(C.g, nrow = 2, byrow = TRUE) > test(fit, C.g) $Fstat [1] 6.565736 $pvalue [1] 0.01185168 9