Normal Theory Inference Defn 4.1: Univariate normal distribution: We will use the notation 0.2 Y N (; 2) 0.0 f(y) 0.4 Density for a Normal Distribution with mean = 0 and variance = 1 -3 -2 -1 0 1 2 3 y A random variable Y with density function 8 9 < 1 y ; 2 = 1 f (y) = p 2 exp: ; 2 ; Suppose Z has a normal distribution with E (Z ) = 0 and V ar(Z ) = 1, i.e., Z N (0; 1); then Z is said to have a standard normal distribution. is said to have a normal (Gaussian) distribution with E (Y ) = and V ar(Y ) = 2: 222 221 2 Defn 4.2: Suppose Z = 64 3 Z1 7 .. 5 is a random Zm vector whose elements are independently distributed standard normal random variables. For any m n matrix A, We say that Y = + AT Z has a multivariate normal distribution with mean vector E (Y ) = and variance-covariance matrix V ar(Y) = AT A = : When and Y N (; ) nn is positive denite, the joint density function is 9 8 < 1 T ;1(y;)= f (y ) = ( y ; ) exp ; ; (2)n=2jj1=2 : 2 1 We will use the notation Y N (; ) 223 224 The multivariate normal distribution has many useful properties: Result 4.2 Suppose Y= " Y1 Y2 # N " # " #! 1 ; 11 12 2 21 22 Result 4.1 Normality is preserved under linear transformations then If Y N (; ) then W = c + B Y N (c + B ; B B T ) for any non-random c and B. Proof: Note that Y1 = I 0 Result 4.1. Proof: By Defn 4.1, Y = + AT Z where AT A = . Then W = c + B Y = (c + B ) + BAT Z satises Defn. 4.1, with BAT ABT = BBT . Note: This result applies to any subset of the elements of Y because you can move that subset to the top of the vector by multiplying Y by an appropriate matrix of zeros and ones. Y1 N (1; 11) h : i Y and apply 226 225 Example 4.1 Suppose 2 3 02 3 2 31 Y1 7 1 7 6 4 1 ;1 7C 6 B 6 Y = 4 Y2 5 N @4 ;3 5 ; 4 1 3 ;3 5A Y3 2 ;1 ; 3 9 then Y1 = [1 0 0] Y N (1; 4) Y2 = [0 1 0] Y N (;3; 3) Y3 = [0 0 1] Y N (2; 9) " # " # " # " #! Y1 = 1 0 0 Y N 1 ; 4 ;1 Y3 0 0 1 2 ;1 9 " call this matrix B " B " B B T 227 Comment: If Y1 N (1; 1) and Y2 N (2; 2), it is not always true that Y= " Y1 Y2 # has a normal distribution. Result 4.3: If Y1 and Y2 are independent random vectors such that Y1 N (1; 1) and Y2 N (2; 2), then " # " # " #! Y1 0 1 1 Y= N ; 0 2 Y2 2 228 Proof: Since Y1 N (1; 1), we have from Denition 4.2 that Y1 = 1 + AT1 Z1 where AT1 A1 = 1 and the elements of Z1 are independent standard normal random variables. A similar result, Y2 = 2 + AT2 Z2, is true for Y2. Since Y and Y2 are independent, it follows that Z1 1and Z2 are independent. Then Y= Y 1 Y2 = = " 1 + AT1 Z1 2 + AT2 Z2 # " PT 0 # Z 1 1 1 + satises Defn 4.2. 2 0 P2T Alternatively, you could prove Result 4.3 by showing that the product of the characteristic functions for Y1 and Y2 is a characteristic function or a multivariate normal distribution. If 1 and 2 are both non-singular, you could prove Result 4.3 by showing that the product of the density functions for Y1 and Y2 is a density function for the specied multivariate normal distribution. Z2 229 2 3 Y1 Result 4.4 If Y = 64 .. 75 is a random vector Yk with a multivariate normal distribution, then Y1; Y2; ; Yk are independent if and only if Cov(Yi; Yj ) = 0 for all i 6= j . Comments: (i) In general Yi is independent of Yj implies that Cov(Yi; Yj ) = 0. (ii) When Y = (Y1; ; Yn)T has a multivariate normal distribution, Yi uncorrelated with Yj implies Yi is independent of Yj . This is usually not true for other distributions. 231 230 Result 4.5 (Conditional distributions) If Y X ! N " # " Y Y Y X Y X ; XY XX #! with a positive denite covariance matrix, then the conditional distribution of Y given the value of X is a normal distribution with mean vector 1 (X ; ) E (YjX) = Y + Y X ;XX X and positive denite covaraince matrix 1 XY V (YjX) = Y Y ; Y X ;XX % note that this does not depend on the value of X 232 2 3 Y1 Result 4.6 If Y = 64 .. 75 is a random vector Yn with E (Y) = and V ar(Y) = , and A is an n n non-random matrix, then E (YT AY) = T A + tr(A) Quadratic forms: YT AY Sums of squares in ANOVA Chi-square tests F-tests Estimation of variances Some useful information about the distribution of quadratic forms is summarized in the following results. Proof: Note that YT AY is a random variable, and E (YT AY) = E (tr(YT AY)) = E (tr(AYYT )) = tr(E (AYYT )) = tr(AE (YYT )) = tr(A[V ar(Y) + T ]) = tr(A + AT ) = tr(A) + tr(AT ) = tr(A) + tr(T A) = tr(A) + T A 233 Example 4.2 The sum of squared residuals, also called the error sum of squares, is n X SSE = (Yi ; Y^i)2 Consider a Gauss-Markov model with E (Y) = X and V ar(Y) = 2I . Let 234 = b = (X T X );X T Y be any solution to the normal equations. i=1 n X i=1 eT e e2i = = [(I ; PX )Y]T (I ; PX )Y = YT (I ; PX )T (I ; PX )Y = YT (I ; PX )(I ; PX )Y = YT (I ; PX )Y Since E (Y) = X is estimable, the OLS estimator is ^ = X b = X (X T X );X T Y Y = PX Y and the residual vector is ^ = (I ; PX )Y e=Y;Y 235 236 Chi-square Distributions From Result 4.6 E (SSE ) = E (YT (I ; PX )Y) = T X T (I ; PX )X +tr((I ; PX )2I ) = 0 + 2tr(I ; PX ) = 2[tr(I ) ; tr(PX )] = 2[n ; rank(PX )] = 2[n ; rank(X )] Consequently, 2 3 Z1 Defn 4.3 Let Z = 64 .. 75 N (0; I ), Zn i.e., the elements of Z are n independent standard normal random variables. The distribution of the random variable W = ZT Z = n X Zi2 i=1 is called the central chi-square distribution with n degrees of freedom. ^2 = n ; SSE rank(X ) is an unbiased estimator for 2 (provided that rank(X ) < n) We will use the notation W 2(n) 237 238 0.5 Central Chi-Square Densities 1 df 3 df 5 df 7 df 0.0 0.1 E (W ) = n V ar(W ) = 2n 0.2 f(x;df) If W 2n, then 0.3 0.4 Moments: 0 2 4 6 8 10 x 239 240 # This code is stored in: chiden.ssc #================================================# # chisq.density.plot() # # -----------------------# # Input : degrees of freedom; it can be a vector.# # (e.g.) chisq.density.plot(c(1,2,5,7))# # creates curves for df = 1,3,5,and 7 # # Output: density plot of chisquare distribution.# # Note: Unix users must first use motif() # # to open a grpahics window before # # using this function. # ################################################## chisq.density.plot <- function(df) { x <- seq(.001,10,,50) # Create the x,y axis and title plot(c(0,10), c(0,.5), type="n", xlab="x", ylab="f(x;df)", main="Central Chi-Square Densities") for(i in 1:length(df)) { lty.num <- 3*i-2 # specify the line types. f.x <- dchisq(x,df[i]) # calculate density. # draw the curve. lines(x, f.x, type="l",lty=lty.num) } # The following lines are for legends; legend( x = rep(5,length(df)) , y = rep(.45,length(df)) , legend = paste(as.character(df),"df") , lty = seq(1,by=3,length=length(df)) , bty = "n") } # # # # # # # # This function can be executed as follows. Windows users should delete the motif( ) command. motif( ) source("chiden.ssc") par(fin=c(7,8),cex=1.2,mex=1.5,lwd=4) chisq.density.plot(c(1,3,5,7)) 242 241 2 3 Y1 Defn 4.4: Let Y = 64 .. 75 N (; I ) Yn i.e., the elements of Y are independent normal random variabales with Yi N (i; 1). The distribution of the random variable n X W = YT Y = Yi2 i=1 is called a noncentral chi-square distribution with n degrees of freedom and noncentrality parameter n X 2 = T = 2i Moments: If W 2n(2) then E (W ) = n + 2 V ar(W ) = 2n + 42 i=1 We will use the notation W 2n(2) 243 244 Defn 4.5: If W1 2n1 and W2 2n2 and W1 and W2 are independent, then the distribution of W1=n1 F=W =n 2 2 is called the central F distribution with n1 and n2 degrees of freedom. We will use the notation F Fn1;n2 Central moments: E (F ) = n n;2 2 2 for n2 > 2 2 + n2 ; 2) V ar(F ) = n2(nn2(n;12) 2(n2 ; 4) 1 2 for n2 > 4 245 246 # This code is stored in the file #=================================================# # f.density.plot() # # -------------------# # Input : degrees of freedom; it can be a vector. # # (e.g.) f.density.plot(c(1,10,10,50)) # # creates a plot with density curves # # for (1,10) df and (10,50) df # # Output: density plots of the central F # # distribution. # # Note : Unix users must use the motif() # # function to open a graphics window # # before using this function. # ################################################### 1.2 Densities for Central F Distributions 0.4 0.8 (1,10) df (10,50) df (10,10) df (10,4) df f.density.plot <- function(n) { # draw x,y axis and title 0.0 f(x;n) fden.ssc 0 1 2 3 x <- seq(.001,4,,50) plot(c(0,4), c(0,1.4), type="n", xlab="x", ylab="f(x;n)", main="Densities for Central F Distributions") 4 x # the length of df should be even. legend.txt <- NULL d.f <- matrix(n,ncol=2,byrow=T); r <- dim(d.f)[1] 247 248 for(i in 1:r) { lty.num <- 3*i-2 # specify the line types. f.x <- df(x,d.f[i,1],d.f[i,2]) # calculate density. lines(x, f.x, type="l",lty=lty.num) # draw a curve. legend.txt <- c(legend.txt, paste("(",d.f[i,1],",",d.f[i,2],")",sep="")) } Defn 4.6: If W1 2n1(12) and W2 2n2 and W1 and W2 are independent, then the distribution of # The following lines are for inserting # legends on plots using a motif # graphics window. } # # # # # # # # legend( x = rep(1.9,r) , y = rep(1.1,r) , cex=1.0, legend = paste(legend.txt,"df") , lty = seq(1,by=3,length=r) , bty = "n" ) This function can be executed with the following commands. Windows users should delete the motif( ) command. motif() par(fin=c(6,7),cex=1.0,mex=1.3,lwd=4) source("fden.ssc") f.density.plot(c(1,10,10,50,10,10,10,4)) W1=n1 F=W =n 2 2 is called a noncentral F distribution with n1 and n2 degrees of freedom and noncentrality parameter 12. We will use the notation F Fn1;n2(12) 249 250 Central Moments: for n2 > 4 251 0.4 Noncentral F 0.2 h i n22 (n1 + 12)2 + (n2 ; 2)(n1 + 212) V ar(F ) = n21(n2 ; 2)2(n2 ; 4) Central F 0.0 for n2 > 2 f(x;v,w) 12) E (F ) = n(2n(n;1 + 2 2)n1 0.6 0.8 Central and Noncentral F Densities with (5,20) df and noncentrality parameter = 1.5 0 1 2 3 4 5 x 252 # This code is stored in the file: #================================================# # dnf; density of non.central.F # # ----------------------------# # Input : x can be a scalar or a vector # # v df for numerator # # w df for denominator # # delta non-centrality parameter # # (e.g.) dnf(x,5,20,1.5) when x is a # # scalar, # # sapply(x,dnf,5,20,1.5) when x # # is a vector # # Output: evaluate density curve of the # # noncentral F distribution # ################################################## Central F 0.4 Noncentral F 0.2 f(x;v,w) 0.6 0.8 Central and Noncentral F Densities with (5,20) df and noncentrality parameter = 3 dnf <- function(x,v,w,delta) { sum <- 1 term <- 1 p <- ((delta*v*x)/(w+v*x)) nt <- 100 for (j in 1:nt) { term <- term*p*(v+w+2*(j-1))/((v+2*(j-1))*j) sum <- sum + term } dnf.x <- exp(-delta)*sum*df(x,v,w) return(dnf.x) } 0.0 0 1 2 3 4 fdennc.ssc 5 x 254 253 # # # # # # dnf.slow is aimed to show vectorized calculations and one of loop avoidance functions (sapply). Vectorized calculations operate on entire vectors rather than on individual components in sequence. (V&R on p.103-108) dnf.slow <- function(x,v,w,delta) { prod.seq <- function(a,b) prod(seq(b,b+2*(a-1),2)) } j <- 1:100 p <- ((delta*v*x)/(w+v*x)) numer <- sapply(j,prod.seq,v+w,simplify=T) denom <- gamma(j+1)*sapply(j,prod.seq,v,simplify=T) k <- 1 + sum( p^j * numer / denom ) f.x <- k*exp(-delta)*df(x,v,w) return(f.x) 255 # # # # # # # # # # # # # # # # # # # # The folowing commands can be applied to ontain a single density value dnf(0.5,5,20,1.5) dnf.slow(0.5,5,20,1.5) The following commands are used to evalute the noncentral F density for a vector of values x <- seq(1,10,.1) f.x1 <- sapply(x,dnf,5,20,1.5) f.x2 <- sapply(x,dnf.slow,5,20,1.5) You will notice that the performance of dnf is better than that of dnf.slow. The results should be the same. In this case using a loop is better than using vectorized calculations, but is is usually more efficient to use vectorized computions. 256 #=============================================# # noncen.f.density() # # -------------------# # Input : v,w degrees of freedom # # delta non-centrality parameter # # (e.g.) noncen.f.density(5,20,1.5) # # Output: Two F density curves on one plot # # (central and noncental). # # Note : You must define the dnf( ) function # # before applying this function. # # Unix users must use motif() to open # # a graphics window before using this # # function. # ############################################### # create the axes, lines, and legends. plot(c(0,5), c(0,0.8), type="n",xlab="x", ylab="f(x;v,w)") mtext(main.txt, side=3,line=2.2) lines(x, cf.x, type="l",lty=1) lines(x, nf.x, type="l",lty=3) legend(x=1.6, y=0.64, legend = "Central F ", cex=0.9, lty =1, bty = "n" ) legend(x=1.6, y=0.56, legend = "Noncentral F", cex=0.9, lty =3, bty = "n" ) } n.f.density.plot <- function(v,w,delta) { x <- seq(.001,5,length=60) cf.x <- df(x,v,w) nf.x <- sapply(x,dnf,v,w,delta) # For the main title; main1.txt <- "Central and Noncentral F Densities \n with" df.txt <- paste(paste("(",paste(paste(v,",",sep=""), w,sep=""),sep=""),")",sep="") main2.txt <- paste(df.txt, "df and noncentrality parameter =") main2.txt <- paste(main2.txt,delta) main.txt <- paste(main1.txt,main2.txt) # # # # # # # # # # # # This function is executed with the following commands. Windows users should delete the motif( ) command. motif() par(fin=c(6.5,7),cex=1.2,mex=1.5,lwd=4, mar=c(5.1,4.1,4.1,2.1)) source("fdennc.ssc") n.f.density.plot(5,20,1.5) motif() n.f.density.plot(5,20,3.0) 257 258 Sums of squares in ANOVA tables are quadratic forms YT AY where A is a non-negative denite symmetric matrix (generally a projection matrix). Reminder: If Y1; Y2; :::; Yk are independent random vectors, then f1(Y1); f2(Y2); : : : ; fk (Yk ) are distributed independently. Here fi(Yi) indicates that fi( ) is a function only of Yi and not a function of any other Yj ; j 6= i. These could be either real valued or vector valued functions. 259 To develop F-tests we need to identify conditions under which Y T AY YT AiY has a central (or noncentral) chi-square distribution and YT Aj Y are independent 260 Result 4.7: Let A be an nn symmetric matrix with rank(A) = k, and let 2 3 Y1 Y = 64 .. 75 N (; ) Yn where is an n n symmetric positive denite matrix. If A is idempotent then YT AY 2k T A In addition, if A = 0 then YT AY 2k Proof: We will show that the denition of a noncentral chi-square random variable (Defn 4.4) is satised by showing that YT AY = ZT Z for a normal random vector 2 3 Z1 Z = 64 .. 75 with V ar(Z) = Ikk : Zk Step 1: Since A is idempotent we have A = AA Step 2: Since is positive denite, then ;1 exists and we have A;1 = AA;1 ) A = AA A = AT A ) 261 2 3 x1 Step 3: For any vector x = 64 .. 75 we have xn xT Ax = xT AT Ax 0 because is positive denite. Hence, A is non- negative denite and symmetric. Step 4: From the spectral decomposition of A (Result 1.12) we have A= where k X j =1 j vj vjT = V DV T 1 2 k > 0 are the positive eigenvalues of A, 2 3 66 1 2 7 D = 64 . . . 775 k and the columns of V are v1; v2; ; vk ; the eigenvectors corresponding to the positive eigenvalues of A. 263 262 The other n ; k eigenvalues of A are zero because rank(A) = k. Step 5: Dene 2 1 66 p1 . . . B = V 64 p1 k 3 77 75 = V D;1=2 Since V T V = I , we have B T AB = D;1=2V T V DV T V D;1=2 = D;1=2DD;1=2 = Ikk Then, since A = AT A we have I = B T AB = B T AT AB 264 Step 6: Dene Z = BT AY, then and V ar(Z) = B T AT AB = Ikk Finally, since Z N (B T A; I ) Z N (B T A; I ) we have Step 7: ZT Z because AT BB T A = = = = = ZT Z = (BT AY)T (BT AY) = YT AT BBT AY = Y T AY 2k (2) from Defn 4.4, where ABB T A V D V T V D;1=2D;1=2 V T V DV T V DD;1DV T V DV T A 2 = (B T A)T (B T A) = T AT BBT A = T A 265 Example 4.3 266 The sum of squared residuals is For the Gauss-Markov model with E (Y) = X and V ar(Y) = 2I include the assumption that 2 3 Y1 Y = 64 .. 75 N (X ; 2I ): Yn SSE = n X e2i = eT e i=1 = YT (I ; PX )Y: Use Result 4.7 to obtain the distribution of For any solution b = (X T X );X T Y to the normal equations, the OLS estimator for X is ^ = X b = X (X T X );X T Y = PX Y Y and the residual vector is ^ = (I ; PX )Y: e = Y;Y 267 SSE = YT 1 (I ; P ) Y X 2 2 Here = E (Y ) = X = V ar(Y) = 2I is p.d. A = 12 (I ; PX ) is symmetric 268 Note that A = 12 (I ; PX )2I = I ; PX is idempodent, and A = 12 (I ; PX )X = 0 Now consider the \uncorrected" model sum of squares n X Y^i2 = Y^ T Y^ Then i=1 SSE 2 2 n;k where rank(I ; PX ) = n ; rank(X ) = n;k = (PX Y)T PX Y = YT PXT PX Y = YT PX Y: We could also express this as SSE 2 2n;k 269 270 Use Result 4.7 to show n T 1 1 X ^2 2 i=1 Yi = Y ( 2 PX )Y % this is A and = 2I where The next result addresses the independence of several quadratic forms 2k (2) " k = rank(X ) 2 3 Y1 Result 4.8 Let Y = 64 .. 75 N (; ) Yn and let A1; A2; : : : ; Ap be n n symmetric matrices. If 2 = (X )T ( 12 PX )(X ) then = 12 T XT (PX X ) - this is X = 12 T XT X AiAj = 0 for all i 6= j : : : ; YT ApY are independent random variables. 271 Y T A 1 Y ; Y T A2 Y ; 272 Proof: From Result 4.1 2 3 2 3 A Y 1 64 .. 75 = 64 A..1 75 Y Ap Y Ap has a multivariate normal distribution, and for i 6= j Cov(AiY; Aj Y) = AiATj = 0 It follows from Result 4.4 that A1Y; A2Y; ; ApY are independent random vectors. Since YT AiY = YT AiA; i Ai Y T T = Y Ai A;i AiY = (AiY)T A;i (AiY) is a function of AiY only, it follows that YT A1Y; ; YT ApY are independent random variables. Example 4.4. Continuing Example 4.3, show that the \uncorrected" model sum of squares n X Y^i2 = YT PX Y i=1 and the sum of squared residuals n X (Yi ; Y^i)2 = YT (I ; PX )Y i=1 are independently distributed for the \normal theory" Gauss-Markov model where Y N (X ; 2I ): 273 Use Result 4.8 with A1 = PX and A2 = I ; PX . Note that A1A2 = (I ; PX )(2I )PX = 2(I ; PX )PX = 2(PX ; PX PX ) = 2(PX ; PX ) = 0 In Example 4.3 we showed that n 12 X Y^ 2 i=1 i T 2k XT2 X and 12 n X i=1 ! and n 12 X (Yi ; Y^i)2 i=1 Consequently, n 12 X Y^ 2 i=1 i 274 2n;k where k = rank(X ). (Yi ; Y^i)2 are independently distributed. 275 276 By Defn 4.6, F = = 1 Pn ^ 2 k2P i=1 Yi 1 n 2 (n;k)2 i=1(Yi ; Y^i) uncorrected model . mean square n X 1 Y^i2 k i =1 n X 1 (Yi ; Y^i)2 n;k i=1 - Residual mean square (MSE) Fk;n;k 12 T X T X " Use F= n 1 X Y^ 2 i k i=1 n 2 1 X n;k (Yi ; Y^i) i=1 to test the null hypothesis H0 : E (Y) = X = 0 against the alternative HA : E (Y) = X 6= 0 This reduces to a central F distribution with (k; n ; k) d.f. when X = 0 278 277 Comments (i) The null hypothesis corresponds to the. condition under which F has a central F distribution (the noncentrality parameter is zero). In this example 2 = 12 (X )T (X ) = 0 if and only if X = 0 Example 4.4 is a simple illustration of a typical ANOVA partition n X Yi2 = i=1 YT Y = YT [(I ; PX ) + PX ] Y (ii) If k = rank(X ) = number of columns in X , then H0 : X = 0 is equivalent to H0 : = 0. = YT (I ; PX )Y + YT PX Y (iii) If k = rank(X ) is less than the number of columns in X , then X = 0 for some 6= 0 and H0 : X = 0 is not equivalent to H0 : = 0. = 279 " call this A2 n X " call this A1 n X ^2 (Yi ; Y^i)2 + Yi i=1 i=1 " " d.f. = rank(A2) d.f. = rank(A1) 280 More generally an uncorrected total sum of squares can be partitioned as n X Yi2 = i=1 = YT Y Y T A1 Y + YT A2Y + + YT AkY using orthogonal projection matrices A1 + A2 + + Ak = Inn where rank(A1) + rank(A2) + + rank(Ak) = n and AiAj = 0 for any i 6= j : Since we are dealing with orthogonal projection matrices we also have ATi = Ai (symmetry) AiAi = Ai (idempodent matrices) Result 4.9 (Cochran's Theorem) 2 3 Y1 Let Y = 64 .. 75 N (; 2I ) Yn and let A1; A2; ; Ak be n n symmetric matrices with I = A1 + A2 + + Ak and n = r1 + r2 + + rk where ri = rank(Ai) . Then, for i = 1; 2; : : : ; k 1 T 2 Y AiY and 2ri 1 T 2 Ai YT A1Y; YT A2Y; ; are distributed independently. YT Ak Y 282 281 Proof: First show that (i) ) (ii) Proof: This result follows directly from Result 4.7, Result 4.8 and the following Result 4.10. Result 4.10 Let A1; A2; ; Ak be n n symmetric matrices such that A1 + A2 + + Ak = I : Then the following statments are equivalent Since Ai = I ; X Aj , we have X AiAi = Ai I ; Aj Xj 6=i = Ai ; AiAj = Ai j 6=i j 6=i Now show that (ii) ) (iii) Since an idempodent matrix has eigenvalues that are either 0 or 1 and the number of nonzero eigenvalues is the rank of the matrix, (ii) implies that tr(Ai) = rank(Ai). Then, n = tr(I ) = tr(A1 + A2 + + Ak) = tr(A1) + tr(A2) + + tr(Ak ) = rank(A1) + rank(A2) + + rank(Ak) (i) AiAj = 0 for any i 6= j (ii) AiAi = Ai for all i = 1; : : : ; k (iii) rank(A1) + + rank(Ak) = n 283 284 Finally, show that (iii) ) (i) Let ri = rank(Ai). Since Ai is symmetric, we can apply the spectral decomposition (Result 1.12) to write Ai as Ai = Ui i UiT where i is an ri ri diagonal matrix containing the non-zero eigenvalues of Ai and Then I = A1 + A2 + + Ak = U11U1T + + UkkUkT 2 66 1 2 = [U1j jUk] 64 ... 2 1 ... = U 64 k 3 75 U T 32 3 77 6 U.1T 7 75 4 . 5 T k Uk Since rank(A1) + + rank(Ak) = n and rank(Ai) is the number of columns in Ui, then U = [U1j jUk ] is an nxn matrix. Furthermore, rank(U ) = n because the identity matrix on the left side of the equal sign has rank n. Then, U T U is an nxn matrix of full rank and (U T U );1 exists, and Ui = [ u1i j u2i j j uri;i ] is an n ri matrix whose columns are the eigenvectors corresponding to the non-zero eigenvalues of Ai. 286 285 2 1 ... I = U 64 ) k 2 1 ... U T U = U T U 64 ) (U T U );1U T U = 2 1 ... (U T U );1U T U 64 ) 2 1 ... I = 64 k k k 3 75 U T It follows that 2 ;1 64 1 . . . 3 75 U T U 3 2 T3 75 = 64 U..1 75 [U1 Uk] UkT ;k 1 Consequently, UiT Uj = 0 for any i 6= j and AiAj = Uii UiT Uj j Uj = 0 " 3 75 U T U 3 75 U T U for any i 6= j . 287 this is a matrix of zeros 288 References: Johnson, R. A. and Wichern, D. W. (1998) Applied Multivariate Statistical Analysis, 4th edition, Pentice Hall (Chapter 4) Anderson, T. W. (1984) An Introduction to Multivariate Statistical Analysis, 2nd edition, Wiley, New York. Johnson, N. L., Kotz, S. and Balakrishnan, N. (1994,95) Distributions in Statistics: Continuous Univariate Distributions Vols 1 and 2, 2nd edition, Wiley, New York. Searle, S. R. (1987) Linear Models for Unbalanced Data, Wiley, New York (Chapters 7 and 8). Rencher, A. C. (2000) Linear Models in Statistics, Wiley, New York (Chapters 4 and 5). 289