4. Normal Theory Inference Density for a Normal Distribution with mean = 0 and variance = 1 0.2 Y 0.0 f(y) 0.4 We will use the notation -3 -2 -1 0 1 2 Suppose Z has a normal distribution with E (Z ) = 0 and V ar(Z ) = 1, i.e., 3 y Defn 4.1: A random variable Y with density function 1 e f (y ) = p 2 N (; 2) 1 y !2 2 Z N (0; 1); then Z is said to have a standard normal distribution. is said to have a normal (Gaussian) distribution with E (Y ) = and V ar(Y ) = 2: 267 2 66 66 66 66 4 Z1 3 77 77 77 77 5 .. is a Zm random vector whose elements are independently distributed standard normal random variables. For any m n matrix A, We say that Defn 4.2: Suppose Z = Y = + AT Z has a 268 and variance-covariance matrix V ar(Y) = AT V ar(Z)A = AT A We will use the notation Y N (; ) multivariate normal distribution with mean vector E (Y ) When is positive denite, the joint density function is 1 1 T 1 f (y) = e 2 (y ) (y ) n= 2 1 = 2 (2) jj = E ( + AT Z) = + AT E (Z) = + AT 0 = 269 270 The multivariate normal distribution has many useful properties: Proof: By Defn 4.1, Y = + AT Z where AT A = . Then, Result 4.1 Normality is preserved under linear transformations: W = c + BY = c + B( + AT Z) = (c + B) + BAT Z which satises Defn. 4.1. with If Y N (; ), then V ar(W) = BAT AB T W = c + BY N (c + B; BBT ) for any non-random c and B . = BBT 271 Result 4.2 Suppose 2 3 02 3 2 31 66 Y1 77 B 66 1 77 66 11 12 77C B Y = 46 Y2 57 N @B46 2 57 ; 46 21 22 75CCA then Y1 N (1; 11) : 272 Example 4.1. Suppose 2 3 02 3 2 66 Y1 77 B 66 77 66 1 4 B 66 77 B 66 B 6 7 B 6 Y = 6664 Y2 7775 N BBB@6664 3 7777775 ; 6666664 1 Y3 2 1 then " # Note: This result applies to any subset of the elements of Y because you can move that subset to the top of the vector by multiplying Y by an appropriate matrix of zeros and ones. 273 2 66 64 Y1 777 = 666 1 Y3 5 4 0 3 31 77C 77C C 77C C 77C C 5C A = [1 0 0] Y N (1; 4) = [0 1 0] Y N ( 3; 3) = [0 0 1] Y N (2; 9) Y1 Y2 Y3 Proof: Note that Y1 = I 0 Y and apply Result 4.1. 1 1 3 3 3 9 2 0 0 YN 1 ; 01 2 3 77 75 " call this matrix B 02 B 66 B B @64 3 77 75 " B 2 66 64 4 1 1 9 " B B T 274 31 77C 75C C A Comment: If Y1 N (1; 1) and Y2 N (2; 2), it is not always true that 1 Y= Y Y2 has a normal distribution. 2 66 64 3 77 75 Result 4.3: If Y1 and Y2 are independent random vectors such that Y1 N (1; 1) and Y2 N (2; 2) then 2 3 02 3 2 31 66 Y1 77 B 66 1 77 66 1 0 B 6 7 B 6 7 6 Y = 4 Y2 5 N @4 2 5 ; 4 0 2 7775CCCA Proof: Since Y1 N (1; 1), we have from Denition 4.2 that Y1 = 1 + AT1 Z1 where AT1 A1 = 1 and the elements of Z1 are independent standard normal random variables. A similar result, Y2 = 2 + AT2 Z2, is true for Y2. Since Y1 and Y2 are independent, it follows that Z1 and Z2 are independent. Then 2 3 2 3 T 66 1 + A1 Z1 77 1 5 Y=4 Y = 4 5 Y2 2 + AT2 Z2 2 32 2 3 3 T = 4 12 5 + 664 P01 P0T 775 4 ZZ12 5 2 satises Defn 4.2. 276 275 Y1 Result 4.4 If Y = .. is a ranYk 2 66 66 66 66 4 Alternatively, you could prove Result 4.3 by showing that the product of the characteristic functions for Y1 and Y2 is a characteristic function for a multivariate normal distribution. If 1 and 2 are both non-singular, you could prove Result 4.3 by showing that the product of the density functions for Y1 and Y2 is a density function for the specied multivariate normal distribution. 277 3 77 77 77 77 5 dom vector with a multivariate normal distribution, then Y1; Y2; ; Yk are independent if and only if Cov(Yi; Yj ) = 0 for all i 6= j . Comments: (i) If Yi is independent of Yj , then Cov(Yi; Yj ) = 0. (ii) When Y = (Y1; ; Yn)T has a multivariate normal distribution, Yi uncorrelated with Yj implies Yi is independent of Yj . This is usually not true for other distributions. 278 Result 4.5 If 0 B B B @ Y N X 1 C C C A 0 B B B B @ 2 66 64 Y 777 666 Y Y X 5 ; 64 XY 3 2 Y X XX 3 77 77 5 1 C C C C A with a positive denite covariance matrix, the conditional distribution of Y given the value of X is a normal distribution with mean vector E (YjX) = Y 1 (X X ) + Y X XX and positive denite covaraince matrix V (YjX) = Y Y 1 XY Y X XX Quadratic forms: YT AY Sums of squares in ANOVA Chi-square tests F-tests Estimation of variances Some useful information about the distribution of quadratic forms is summarized in the following results. % note that this does not depend on the value of X 279 2 66 66 66 66 4 Y1 280 Proof: (a) Note that the denition of a covariance matrix implies that V ar(Y) = E (YYT ) T , where = E (Y). Then, 3 77 77 77 77 5 Result 4.6a If Y = . is a Yn random vector with E (Y) = and V ar(Y) = E (YT AY) and A is an n n non-random matrix, then E (YT AY) = T A + tr(A) Result 4.6b If Y N (; ) and A is a symmetric matrix, then var(YT AY) = 4T AA+2tr(AA) = = = = = = = = = E (tr(YT AY)) E (tr(AYYT )) tr(E (AYYT )) tr(AE (YYT )) tr(A[V ar(Y) + T ]) tr(A + AT ) tr(A) + tr(AT ) tr(A) + tr(T A) tr(A) + T A (b) See Searle(1971, page 57). 281 282 Example 4.2 Consider a GaussMarkov model with E (Y) = X and V ar(Y) = 2I . Let b=( XT X ) XT The residual vector is e = Y Y^ = (I PX )Y and the sum of squared residuals, also called the error sum of squares, is SSE Y n = i=1 (Yi Y^i)2 n 2 = i=1 ei X X be any solution to the normal equations. Since E (Y) = X is estimable, the unique OLS estimator is Y^ = X b = X (X T X ) X T Y = PX Y = = = = = eT e [(I PX )Y]T (I PX )Y YT (I PX )T (I PX )Y YT (I PX )(I PX )Y YT (I PX )Y 283 From Result 4.6 E (SSE ) Chi-square Distributions = E (YT (I PX )Y) = T X T (I PX )X +tr((I PX )2I ) = 0 + 2tr(I PX ) = 2[tr(I ) tr(PX )] = 2[n rank(PX )] = 2[n rank(X )] Consequently, ^ 2 = SSE rank(X ) is an unbiased estimator for 2 (provided that rank(X ) < n) n 284 285 2 66 66 66 66 4 Z1 3 77 77 77 77 5 .. N (0; I ), Zn i.e., the elements of Z are n independent standard normal random variables. The distribution of Defn 4.3 Let Z = W n = ZT Z = i=1 Zi2 X is called the central chi-square distribution with n degrees of freedom. We will use the notation W 2(n) 286 0.5 Central Chi-Square Densities 1 df 3 df 5 df 7 df then 0.1 E (W ) = n 0.2 2n, f(x;df) If W 0.3 0.4 Moments: 0.0 V ar (W ) = 2n 0 2 4 6 8 10 x 287 # This code is stored in: 288 chiden.ssc #================================================# # chisq.density.plot() # # -----------------------# # Input : degrees of freedom; it can be a vector.# # (e.g.) chisq.density.plot(c(1,2,5,7))# # creates curves for df = 1,3,5,and 7 # # Output: density plot of chisquare distribution.# # # ################################################## chisq.density.plot <- function(df) { x <- seq(.001,10,,50) # Create the x,y axis and title plot(c(0,10), c(0,.5), type="n", xlab="x", ylab="f(x;df)", main="Central Chi-Square Densities") for(i in 1:length(df)) { lty.num <- 3*i-2 # specify the line types. f.x <- dchisq(x,df[i]) # calculate density. # draw the curve. lines(x, f.x, type="l",lty=lty.num) } 289 # The following lines are for legends; legend( x = rep(5,length(df)) , y = rep(.45,length(df)) , legend = paste(as.character(df),"df") , lty = seq(1,by=3,length=length(df)) , bty = "n") } # # # # # # # # # This function can be executed with the following commands. The source( ) function enters the entire file into the command window and executes all of the commands in the file. source("chiden.ssc") par(fin=c(7,8),cex=1.2,mex=1.5,lwd=4) chisq.density.plot(c(1,3,5,7)) 290 2 66 66 66 66 4 Y1 3 77 77 77 77 5 Defn 4.4: Let Y = . N (; I ) Yn i.e., the elements of Y are independent normal random variables with Yi N (i; 1). The distribution of the random variable W is called a n = YT Y = i=1 Yi2 X If W 2n(Æ2) then noncentral chi-square with n degrees of freedom and noncentrality parameter distribution Æ2 Moments: E (W ) = n + Æ 2 V ar (W ) = 2n + 4Æ 2 n = T = i=1 2i X We will use the notation W 2n(Æ2) Defn 4.5: If W1 2n1 and W2 2 n2 and W1 and W2 are independent, then the distribution of F = Central moments: W1=n1 W2=n2 = n n2 2 for n2 > 2 2 2n22(n1 + n2 2) V ar(F ) = n1(n2 2)2(n2 4) E (F ) is called the central F distribution with n1 and n2 degrees of freedom. We will use the notation F 292 291 for n2 > 4 Fn1;n2 293 294 # This code is stored in the file #=================================================# # f.density.plot() # # -------------------# # Input : degrees of freedom; it can be a vector. # # (e.g.) f.density.plot(c(1,10,10,50)) # # creates a plot with density curves # # for (1,10) df and (10,50) df # # Output: density plots of the central F # # distribution. # # # ################################################### 1.2 Densities for Central F Distributions 0.8 (1,10) df (10,50) df (10,10) df (10,4) df f.density.plot <- function(n) { # draw x,y axis and title 0.0 0.4 f(x;n) fden.ssc 0 1 2 3 x <- seq(.001,4,,50) plot(c(0,4), c(0,1.4), type="n", xlab="x", ylab="f(x;n)", main="Densities for Central F Distributions") 4 x # the length of df should be even. legend.txt <- NULL d.f <- matrix(n,ncol=2,byrow=T); r <- dim(d.f)[1] 295 for(i in 1:r) { lty.num <- 3*i-2 # specify the line types. f.x <- df(x,d.f[i,1],d.f[i,2]) # calculate density. lines(x, f.x, type="l",lty=lty.num) # draw a curve. legend.txt <- c(legend.txt, paste("(",d.f[i,1],",",d.f[i,2],")",sep="")) } # The following lines are for inserting # legends on plots using a motif # graphics window. } # # # # # # 296 Defn 4.6: If W1 2n1(Æ12) and W2 2 and W2 are n2 and W1 independent, then the distribution of F = legend( x = rep(1.9,r) , y = rep(1.1,r) , cex=1.0, legend = paste(legend.txt,"df") , lty = seq(1,by=3,length=r) , bty = "n" ) This function can be executed with the following commands. par(fin=c(6,7),cex=1.0,mex=1.3,lwd=4) source("fden.ssc") f.density.plot(c(1,10,10,50,10,10,10,4)) W2=n2 is called a noncentral F distribution with n1 and n2 degrees of freedom and noncentrality parameter Æ12. We will use the notation F 297 W1=n1 Fn1;n2(Æ12) 298 Moments: 0.6 for n2 > 2 Central F Noncentral F 0.2 0.4 2 = n(n2(n1 +2)Æn1 ) 2 1 f(x;v,w) E (F ) 0.8 Central and Noncentral F Densities with (5,20) df and noncentrality parameter = 1.5 0.0 V ar(F ) = 2n22[(n1+Æ12)2+(n2 2)(n1+2Æ12)] n21(n2 2)2(n2 4) 0 1 2 3 4 5 x for n2 > 4 299 300 # This code is stored in the file: #================================================# # dnf; density of non.central.F # # ----------------------------# # Input : x can be a scalar or a vector # # v df for numerator # # w df for denominator # # delta non-centrality parameter # # (e.g.) dnf(x,5,20,1.5) when x is a # # scalar, # # sapply(x,dnf,5,20,1.5) when x # # is a vector # # Output: evaluate density curve of the # # noncentral F distribution # ################################################## Central F 0.2 0.4 Noncentral F dnf <- function(x,v,w,delta) { sum <- 1 term <- 1 p <- ((delta*v*x)/(w+v*x)) nt <- 100 for (j in 1:nt) { term <- term*p*(v+w+2*(j-1))/((v+2*(j-1))*j) sum <- sum + term } dnf.x <- exp(-delta)*sum*df(x,v,w) return(dnf.x) } 0.0 f(x;v,w) 0.6 0.8 Central and Noncentral F Densities with (5,20) df and noncentrality parameter = 3 0 1 2 3 4 fdennc.ssc 5 x 301 302 # # # # # # dnf.slow is aimed to show vectorized calculations and use of a loop avoidance function (sapply). Vectorized calculations operate on entire vectors rather than on individual components in sequence. (V&R on p.103-108) dnf.slow <- function(x,v,w,delta) { prod.seq <- function(a,b) prod(seq(b,b+2*(a-1),2)) } j <- 1:100 p <- ((delta*v*x)/(w+v*x)) numer <- sapply(j,prod.seq,v+w,simplify=T) denom <- gamma(j+1)*sapply(j,prod.seq,v,simplify=T) k <- 1 + sum( p^j * numer / denom ) f.x <- k*exp(-delta)*df(x,v,w) return(f.x) # # # # # # # # # # # # # # # # # # # # The folowing commands can be applied to obtain a single density value dnf(0.5,5,20,1.5) dnf.slow(0.5,5,20,1.5) The following commands are used to evalute the noncentral F density for a vector of values x <- seq(1,10,.1) f.x1 <- sapply(x,dnf,5,20,1.5) f.x2 <- sapply(x,dnf.slow,5,20,1.5) You will notice that the performance of dnf is better than that of dnf.slow. The results should be the same. In this case using a loop is better than using vectorized calculations, but is is usually more efficient to use vectorized computions. 304 303 #=============================================# # noncen.f.density() # # -------------------# # Input : v,w degrees of freedom # # delta non-centrality parameter # # (e.g.) noncen.f.density(5,20,1.5) # # Output: Two F density curves on one plot # # (central and noncental). # # Note : You must define the dnf( ) function # # before applying this function. # # # ############################################### # create the axes, lines, and legends. plot(c(0,5), c(0,0.8), type="n",xlab="x", ylab="f(x;v,w)") mtext(main.txt, side=3,line=2.2) lines(x, cf.x, type="l",lty=1) lines(x, nf.x, type="l",lty=3) legend(x=1.6, y=0.64,legend="Central F ",cex=0.9, lty =1, bty = "n" ) legend(x=1.6, y=0.56,legend="Noncentral F",cex=0.9, lty =3, bty = "n" ) } n.f.density.plot <- function(v,w,delta) { x <- seq(.001,5,length=60) cf.x <- df(x,v,w) nf.x <- sapply(x,dnf,v,w,delta) # For the main title; main1.txt <- "Central and Noncentral F Densities \n with" df.txt <- paste(paste("(",paste(paste(v,",",sep=""), w,sep=""),sep=""),")",sep="") main2.txt <- paste(df.txt, "df and noncentrality parameter =") main2.txt <- paste(main2.txt,delta) main.txt <- paste(main1.txt,main2.txt) 305 # # # # # # # # This function is executed with the following commands. par(fin=c(6.5,7),cex=1.2,mex=1.5,lwd=4, mar=c(5,4,5,2)) source("fdennc.ssc") n.f.density.plot(5,20,1.5) n.f.density.plot(5,20,3.0) 306 Sums of squares in ANOVA tables are quadratic forms Reminder: YT AY If Y1; Y2; :::; Yk are independent random vectors, then where A is a non-negative denite symmetric matrix (usually a projection matrix). f1(Y1); f2(Y2); : : : ; fk(Yk) are distributed independently . Here fi(Yi) indicates that fi( ) is a function only of Yi and not a function of any other Yj ; j 6= i. These could be either real valued or vector valued functions. 307 Result 4.7: Let A be an n n symmetric matrix with rank(A) = k, and let 2 3 66 Y1 77 66 Y = 66664 .. 7777775 N (; ) Yn where is an n n symmetric positive denite matrix. If A is idempotent then Y To develop F-tests we need to identify conditions under which YT AY has a central (or noncentral) chi-square distribution YT AiY and YT Aj Y are independent 308 Proof: We will show that the denition of a noncentral chi-square random variable (Defn 4.4) is satised by showing that YT AY = ZT Z for a normal random vector 2 3 66 Z1 77 Z = 66664 . 77775 with V ar(Z) = Ikk : Zk is idempotent we have A = AA Since is positive denite, then Step 1: Since A Step 2: 1 exists and we have A 1 = AA 1 0 1 Y 2k @T AA TA ) In addition, if A = 0 then ) YT AY 2k 309 = AA A = AT A A 310 Step 3: For any vector x = 2 66 66 66 4 3 77 77 77 5 x1 .. xn we have xT Ax = xT AT Ax 0 because is positive denite. Hence, A is non-negative denite and symmetric. Step 4: From the spectral decomposition of A (Result 1.12) we have A = where 1 k X vv = j j jT j =1 4 are k = B V 2 66 66 66 66 64 p1 1 ... A are zero p1 3 77 77 77 77 75 k = Since are the positive eigenvalues of A, 2 3 66 1 77 66 77 2 66 77 D = 66 77 . . . 66 77 V Step 5: Dene V DV T 2 k > 0 and the columns of The other n k eigenvalues of because rank(A) = k. V TV B T AB 5 = I , we have = D 1=2V T V DV T V D = D 1=2DD 1=2 = Ik k 1=2 = AT A we have I = B T AB = B T AT AB Then, since v1; v2; ; vk; VD 1=2 the eigenvectors corresponding to the positive eigenvalues of A. A 311 312 Z = BT AY, then V ar (Z) = B T AT AB = Ikk Step 6: Dene and Finally, since Z N (BT A; I ) Z N (BT A; I ) we have Step 7: because ZT Z 2k (Æ2) ZT Z = (BT AY)T (BT AY) = YT AT BBT AY = YT AY AT BB T A = = = = = ABB T A V D V T V D 1=2D V DD 1DV T V DV T A from Defn 4.4, where 1=2 V T V 313 DV T Æ2 = (BT A)T (BT A) = T AT BBT A = T A 314 Example 4.3 For the GaussMarkov model with and V ar(Y) = 2I E (Y) = X include the assumption that 2 3 66 Y1 77 66 Y = 66664 .. 7777775 N (X; 2I ): Yn For any solution The sum of squared residuals is SSE = 2 T =1 ei = e e = YT (I PX )Y: n X i Use Result 4.7 to obtain the distribution of 2 SSE 66 1 T 64 = Y (I 2 2 b = (X T X ) X T Y to the normal equations, the OLS estimator for X is Y^ = X b = X (X T X ) X T Y = PX Y and the residual vector is e = Y Y^ = (I PX )Y: 3 77 75 PX ) Y Here = = A = E (Y) = X V ar(Y) = 2I is p.d. 1 (I P ) is symmetric X 2 315 Note that A = 12 (I PX )2I = I PX 316 We could also express this as SSE 2 2n is idempodent, and 1 A = 2 (I PX )X = 0 Now consider the \uncorrected" model sum of squares ^2 ^T ^ =1 Yi = Y Y = (PX Y)T PX Y Then n X i SSE 2n 2 k = YT PXT PX Y where rank(I k PX ) = n rank(X ) =n k 317 = YT PX Y: 318 Use Result 4.7 to show 12 1 Y^ 2 = YT ( 2 PX )Y 2k (Æ2) i=1 i n X % this is and where Æ2 A k " = rank(X ) = 2I = (X)T (12 PX )(X) = 12 T XT (PX X ) - this is X 1 = 2 T XT X The next result addresses the independence of several quadratic forms 2 66 66 66 66 4 Y1 3 77 77 77 77 5 Result 4.8 Let Y = .. N (; ) Yn and let A1; A2; : : : ; Ap be n n symmetric matrices. If AiAj = 0 for all i 6= j then Y T A 1 Y ; Y T A2 Y ; : : : ; Y T A p Y are independent random variables. 319 Proof: From Result 4.1 2 3 2 3 66 A1Y 77 66 A1 77 66 77 66 .. 777 = 666 .. 77777 Y 66 66 4 ApY 57 46 Ap 57 has a multivariate normal distribution, and for i 6= j Cov(AiY; Aj Y) = AiATj =0 320 Since YT AiY = YT AiAi AiY = YT ATi Ai AiY = (AiY)T Ai (AiY) is a function of AiY only, it follows that It follows from Result 4.4 that YT A1Y; ; YT ApY A1Y; A2Y; ; ApY are independent random vectors. 321 are independent random variables. 322 Example 4.4. Continuing Example 4.3, show that the \uncorrected" model sum of squares n X i =1 Use Result 4.8 with A1 = PX and A2 = I PX . Note that A1A2 Y^i2 = YT PX Y and the sum of squared residuals ^ 2 T =1(Yi Yi) = Y (I PX )Y n X i are independently distributed for the \normal theory" Gauss-Markov model where Y N (X; 2I ): (I PX )(2I )PX 2(I PX )PX 2(PX PX PX ) 2(PX PX ) = = = = = 0 Consequently, 1 2 i n X =1 Y^i2 and 12 ^ 2 =1(Yi Yi) n X i are independently distributed. 324 323 By Defn 4.6, In Example 4.3 we showed that 12 i n X =1 T T Y^i2 2k X2 X 1 =1 Y^i2 F = 1 ( ) =1(Yi Y^i)2 Pn k 2 i Pn n k 2 i ! uncorrected model # mean square n X ^2 Yi =1 = 1 nX Y^i)2 n k i=1(Yi 1 ki and 1 ^ 2 2 =1(Yi Yi) n 2 i n X % k where k = rank(X ). Fk;n Residual mean square 0 @ 1 k 2 " This reduces to a central F distribution with (k; n when 325 X = 0 1 T X T X A k) d.f. 326 Comments Use F 1 n X (i) The null hypothesis corre-. sponds to the condition under which F has a central F distribution (the noncentrality parameter is zero). In this example Y^i2 = 1 =1 ^ 2 Yi) n k i=1(Yi ki n X to test the null hypothesis 1 Æ2 = 2 (X)T (X) = 0 if and only if X = 0 H0 : E (Y) = X = 0 against the alternative (ii) If k = rank(X ) = number of columns in X , then H0 : X = 0 is equivalent to H0 : = 0. HA : E (Y) = X 6= 0 327 328 Example 4.4 is a simple illustration of a typical (iii) If k = rank(X ) is less than the number of columns in X , then X = 0 for some 6= 0 and H0 : X = 0 is not equivalent to H0 : = 0. 2 T =1 Yi = Y Y = YT [(I PX ) + PX ] Y n X i = YT (I PX )Y + YT PX Y " call this " A2 call this A1 n n ^2 = i=1 (Yi Y^i)2 + i=1 Yi X X " d.f. = rank(A2) 329 " d.f. = rank(A1) 330 More generally an uncorrected total sum of squares can be partitioned as Y 2 = YT Y i=1 i = YT A1Y + YT A2Y + + Y T Ak Y n X using orthogonal projection matrices and AiAj = 0 Since we are dealing with orthogonal projection matrices we also have ATi AiAi A1 + A2 + + Ak = Inn where for any i 6= j : = Ai (symmetry) = Ai (idempodent matrices) rank(A1)+rank(A2)+ +rank(Ak) = n 332 331 Result 4.9 (Cochran's Theorem) 2 66 66 66 66 4 Y1 3 77 77 77 77 5 Let Y = .. N (; 2I ) Yn and let A1; A2; ; Ak be n n symmetric matrices with and YT A1Y; YT A2Y; ; YT AkY I = A1 + A2 + + Ak are distributed independently. and n = r1 + r2 + + rk where ri = rank(Ai) . Then, for i = 1; 2; : : : ; k Proof: This result follows directly from Result 4.7, Result 4.8 and the following Result 4.10. 12 YT AiY 2r 12 T Ai i 333 334 Proof: Result 4.10 Let A1; A2; ; Ak be n n symmetric matrices such that First show that (i) A1 + A2 + + Ak = I : AiAi Then the following statments are equivalent (i) AiAj = 0 for any i 6= j (ii) AiAi = Ai for all i = 1; : : : ; k (iii) rank(A1) + + rank(Ak) = n Since =I Ai X j 6=i = = ) (ii) Aj , ( Ai I Ai Now show that (ii) we have X j 6=i X j 6=i Aj ) AiAj = Ai ) (iii) Since an idempodent matrix has eigenvalues that are either 0 or 1 and the number of non-zero eigenvalues is the rank of the matrix, (ii) implies that tr(Ai) = rank(Ai). Then, n = tr(I ) = tr(A1 + A2 + + Ak) = tr(A1) + tr(A2) + + tr(Ak) = rank(A1) + rank(A2) + + rank(Ak) 336 335 Finally, show that (iii) Then ) (i) Let ri = rank(Ai). Since Ai is symmetric, we can apply the spectral decomposition (Result 1.12) to write Ai as where i Ui Ai = Ui i UiT I = = + A2 + + Ak U11U1T + + Uk k UkT 2 66 1 6 = [U1j jUk] 6666666 2 . . . A1 4 is an ri ri diagonal matrix containing the non-zero eigenvalues of Ai and = [ u1i j u2i j j ur ;i ] i is an n ri matrix whose columns are the eigenvectors corresponding to the non-zero eigenvalues of Ai. 337 = U 2 66 66 66 4 1 ... k 3 77 77 T 77 U 5 k 3 77 2 77 66 77 66 77 66 77 4 5 U1T .. UkT 3 77 77 77 5 Since rank(A1) + + rank(Ak) = n and rank(Ai) is the number of columns in Ui, then U = [U1j jUk] is an nxn matrix. Furthermore, rank(U ) = n because the identity matrix on the left side of the equal sign has rank n. Then, U T U is an nxn matrix of full rank and (U T U ) 1 exists, and 338 I = U ) UT U = 2 66 66 66 4 1 UT U ) 2 66 66 66 4 ... 1 (U T U ) 1U T U = 2 6 1 (U T U ) 1U T U 666664 ) I = 2 66 66 66 4 1 ... k ... ... k It follows that 3 77 77 T 77 U 5 k k 2 66 66 66 64 3 77 77 T 77 U U 5 1 1 ... Consequently, 3 77 77 T 77 U U 5 UiT Uj and AiAj 3 77 77 T 77 U U 5 k 1 =0 3 77 77 77 75 = 2 66 66 66 4 U1T .. UkT 3 77 77 77 5 for any [U1 Uk] i 6= j = Uii UiT Uj j Uj = 0 " this is a matrix for any 339 References: Johnson, R. A. and Wichern, D. W. (1998) Applied Multivariate Statistical Analysis, 4th edition, Pentice Hall (Chapter 4) Anderson, T. W. (1984) An Introduction to Multivariate Statistical Analysis, 2nd edition, Wiley, New York. Johnson, N. L., Kotz, S. and Balakrishnan, N. (1994,95) Distributions in Statistics: Continuous Univariate Distributions Vols 1 and 2, 2nd edition, Wiley, New York. Searle, S. R. (1987) Linear Models for Unbalanced Data, Wiley, New York (Chapters 7 and 8). Rencher, A. C. (2000) Linear Models in Statistics, Wiley, New York (Chapters 4 and 5). 341 i 6= j . of zeros 340