Normal Theory Inference Univariate normal distribution:

advertisement
Normal Theory Inference
Defn 4.1: Univariate normal distribution:
We will use the notation
0.2
Y N (; 2)
0.0
f(y)
0.4
Density for a Normal Distribution
with mean = 0 and variance = 1
-3
-2
-1
0
1
2
3
y
A random variable Y with density function
8 9
< 1 y ; 2 =
1
f (y) = p
2 exp: ; 2 ;
Suppose Z has a normal distribution with
E (Z ) = 0 and V ar(Z ) = 1, i.e.,
Z N (0; 1);
then Z is said to have a standard normal
distribution.
is said to have a normal (Gaussian)
distribution with
E (Y ) = and V ar(Y ) = 2:
222
221
2
Defn 4.2: Suppose Z = 64
3
Z1 7
.. 5 is a random
Zm
vector whose elements are independently
distributed standard normal random variables.
For any m n matrix A, We say that
Y = + AT Z
has a multivariate normal distribution with
mean vector
E (Y ) = and variance-covariance matrix
V ar(Y) = AT A = :
When
and
Y N (; )
nn is positive denite,
the joint density function is
9
8
< 1
T ;1(y;)=
f (y ) =
(
y
;
)
exp
;
;
(2)n=2jj1=2 : 2
1
We will use the notation
Y N (; )
223
224
The multivariate normal distribution has many
useful properties:
Result 4.2 Suppose
Y=
"
Y1
Y2
#
N
"
# "
#!
1 ; 11 12
2
21 22
Result 4.1 Normality is preserved under linear
transformations
then
If Y N (; ) then
W = c + B Y N (c + B ; B B T )
for any non-random c and B.
Proof: Note that Y1 = I 0
Result 4.1.
Proof: By Defn 4.1, Y = + AT Z
where AT A = . Then
W = c + B Y = (c + B ) + BAT Z
satises Defn. 4.1, with BAT ABT = BBT .
Note: This result applies to any subset of
the elements of Y because you can move
that subset to the top of the vector by
multiplying Y by an appropriate matrix of
zeros and ones.
Y1 N (1; 11)
h
:
i
Y
and apply
226
225
Example 4.1
Suppose
2 3
02 3 2
31
Y1 7
1 7 6 4 1 ;1 7C
6
B
6
Y = 4 Y2 5 N @4 ;3 5 ; 4 1 3 ;3 5A
Y3
2
;1 ; 3 9
then
Y1 = [1 0 0] Y N (1; 4)
Y2 = [0 1 0] Y N (;3; 3)
Y3 = [0 0 1] Y N (2; 9)
"
# "
#
" # "
#!
Y1 = 1 0 0 Y N 1 ; 4 ;1
Y3
0 0 1
2
;1 9
"
call this
matrix B
"
B
"
B B T
227
Comment: If Y1 N (1; 1) and
Y2 N (2; 2), it is not always true that
Y=
"
Y1
Y2
#
has a normal distribution.
Result 4.3: If Y1 and Y2 are independent
random vectors such that Y1 N (1; 1)
and Y2 N (2; 2), then
" #
" # "
#!
Y1
0
1
1
Y=
N ; 0 2
Y2
2
228
Proof: Since Y1 N (1; 1), we have from
Denition 4.2 that
Y1 = 1 + AT1 Z1
where AT1 A1 = 1 and the elements of Z1
are independent standard normal random variables.
A similar result, Y2 = 2 + AT2 Z2, is true for
Y2.
Since Y and Y2 are independent, it follows
that Z1 1and Z2 are
independent. Then
Y=
Y 1
Y2
=
=
"
1 + AT1 Z1
2 + AT2 Z2
#
" PT 0 # Z 1
1
1 +
satises Defn 4.2.
2
0 P2T
Alternatively, you could prove Result 4.3
by showing that the product of the
characteristic functions for Y1 and Y2 is
a characteristic function or a multivariate
normal distribution.
If 1 and 2 are both non-singular, you could
prove Result 4.3 by showing that the product
of the density functions for Y1 and Y2 is a
density function for the specied multivariate
normal distribution.
Z2
229
2 3
Y1
Result 4.4 If Y = 64 .. 75 is a random vector
Yk
with a multivariate normal distribution, then
Y1; Y2; ; Yk are independent if and only if
Cov(Yi; Yj ) = 0 for all i 6= j .
Comments:
(i) In general Yi is independent of Yj implies
that Cov(Yi; Yj ) = 0.
(ii) When Y = (Y1; ; Yn)T has a multivariate
normal distribution, Yi uncorrelated with Yj
implies Yi is independent of Yj . This is
usually not true for other distributions.
231
230
Result 4.5 (Conditional distributions)
If
Y
X
!
N
"
# "
Y Y Y X
Y
X ; XY XX
#!
with a positive denite covariance matrix, then
the conditional distribution of Y given the
value of X is a normal distribution with mean
vector
1 (X ; )
E (YjX) = Y + Y X ;XX
X
and positive denite covaraince matrix
1 XY
V (YjX) = Y Y ; Y X ;XX
%
note that this does not
depend on the value of X
232
2 3
Y1
Result 4.6 If Y = 64 .. 75 is a random vector
Yn
with E (Y) = and V ar(Y) = , and A is an
n n non-random matrix, then
E (YT AY) = T A + tr(A)
Quadratic forms: YT AY
Sums of squares in ANOVA
Chi-square tests
F-tests
Estimation of variances
Some useful information about the distribution
of quadratic forms is summarized in the following results.
Proof: Note that YT AY is a random variable,
and
E (YT AY) = E (tr(YT AY))
= E (tr(AYYT ))
= tr(E (AYYT ))
= tr(AE (YYT ))
= tr(A[V ar(Y) + T ])
= tr(A + AT )
= tr(A) + tr(AT )
= tr(A) + tr(T A)
= tr(A) + T A
233
Example 4.2
The sum of squared residuals, also called the
error sum of squares, is
n
X
SSE =
(Yi ; Y^i)2
Consider a Gauss-Markov model with
E (Y) = X and V ar(Y) = 2I .
Let
234
=
b = (X T X );X T Y
be any solution to the normal equations.
i=1
n
X
i=1
eT e
e2i
=
= [(I ; PX )Y]T (I ; PX )Y
= YT (I ; PX )T (I ; PX )Y
= YT (I ; PX )(I ; PX )Y
= YT (I ; PX )Y
Since E (Y) = X is estimable, the OLS
estimator is
^ = X b = X (X T X );X T Y
Y
= PX Y
and the residual vector is
^ = (I ; PX )Y
e=Y;Y
235
236
Chi-square Distributions
From Result 4.6
E (SSE ) = E (YT (I ; PX )Y)
= T X T (I ; PX )X +tr((I ; PX )2I )
= 0 + 2tr(I ; PX )
= 2[tr(I ) ; tr(PX )]
= 2[n ; rank(PX )]
= 2[n ; rank(X )]
Consequently,
2 3
Z1
Defn 4.3 Let Z = 64 .. 75 N (0; I ),
Zn
i.e., the elements of Z are n independent
standard normal random variables. The
distribution of the random variable
W = ZT Z =
n
X
Zi2
i=1
is called the central chi-square distribution
with n degrees of freedom.
^2 = n ; SSE
rank(X )
is an unbiased estimator for 2 (provided that
rank(X ) < n)
We will use the notation
W 2(n)
237
238
0.5
Central Chi-Square Densities
1 df
3 df
5 df
7 df
0.0
0.1
E (W ) = n
V ar(W ) = 2n
0.2
f(x;df)
If W 2n, then
0.3
0.4
Moments:
0
2
4
6
8
10
x
239
240
# This code is stored in:
chiden.ssc
#================================================#
# chisq.density.plot()
#
# -----------------------#
# Input : degrees of freedom; it can be a vector.#
#
(e.g.) chisq.density.plot(c(1,2,5,7))#
#
creates curves for df = 1,3,5,and 7 #
# Output: density plot of chisquare distribution.#
# Note: Unix users must first use
motif()
#
#
to open a grpahics window before
#
#
using this function.
#
##################################################
chisq.density.plot <- function(df)
{
x <- seq(.001,10,,50)
# Create the x,y axis and title
plot(c(0,10), c(0,.5), type="n",
xlab="x", ylab="f(x;df)",
main="Central Chi-Square Densities")
for(i in 1:length(df)) {
lty.num <- 3*i-2
# specify the line types.
f.x <- dchisq(x,df[i])
# calculate density.
# draw the curve.
lines(x, f.x, type="l",lty=lty.num)
}
# The following lines are for legends;
legend( x = rep(5,length(df)) ,
y = rep(.45,length(df)) ,
legend = paste(as.character(df),"df") ,
lty = seq(1,by=3,length=length(df)) ,
bty = "n")
}
#
#
#
#
#
#
#
#
This function can be executed as follows.
Windows users should delete the motif( )
command.
motif( )
source("chiden.ssc")
par(fin=c(7,8),cex=1.2,mex=1.5,lwd=4)
chisq.density.plot(c(1,3,5,7))
242
241
2 3
Y1
Defn 4.4: Let Y = 64 .. 75 N (; I )
Yn
i.e., the
elements of Y are independent normal random
variabales with Yi N (i; 1). The distribution
of the random variable
n
X
W = YT Y = Yi2
i=1
is called a noncentral chi-square distribution
with n degrees of freedom and noncentrality
parameter
n
X
2 = T =
2i
Moments:
If W 2n(2) then
E (W ) = n + 2
V ar(W ) = 2n + 42
i=1
We will use the notation
W 2n(2)
243
244
Defn 4.5: If W1 2n1 and W2 2n2
and W1 and W2 are independent, then the
distribution of
W1=n1
F=W
=n
2 2
is called the central F distribution with n1
and n2 degrees of freedom.
We will use the notation
F Fn1;n2
Central moments:
E (F ) = n n;2 2
2
for n2 > 2
2 + n2 ; 2)
V ar(F ) = n2(nn2(n;12)
2(n2 ; 4)
1 2
for n2 > 4
245
246
# This code is stored in the file
#=================================================#
# f.density.plot()
#
# -------------------#
# Input : degrees of freedom; it can be a vector. #
#
(e.g.) f.density.plot(c(1,10,10,50))
#
#
creates a plot with density curves #
#
for (1,10) df and (10,50) df
#
# Output: density plots of the central F
#
#
distribution.
#
# Note : Unix users must use the motif()
#
#
function to open a graphics window
#
#
before using this function.
#
###################################################
1.2
Densities for Central F Distributions
0.4
0.8
(1,10) df
(10,50) df
(10,10) df
(10,4) df
f.density.plot <- function(n)
{
# draw x,y axis and title
0.0
f(x;n)
fden.ssc
0
1
2
3
x <- seq(.001,4,,50)
plot(c(0,4), c(0,1.4), type="n",
xlab="x", ylab="f(x;n)",
main="Densities for Central F Distributions")
4
x
# the length of df should be even.
legend.txt <- NULL
d.f <- matrix(n,ncol=2,byrow=T); r <- dim(d.f)[1]
247
248
for(i in 1:r) {
lty.num <- 3*i-2
# specify the line types.
f.x <- df(x,d.f[i,1],d.f[i,2]) # calculate density.
lines(x, f.x, type="l",lty=lty.num) # draw a curve.
legend.txt <- c(legend.txt,
paste("(",d.f[i,1],",",d.f[i,2],")",sep=""))
}
Defn 4.6: If W1 2n1(12) and W2 2n2
and W1 and W2 are independent, then
the distribution of
# The following lines are for inserting
# legends on plots using a motif
# graphics window.
}
#
#
#
#
#
#
#
#
legend( x = rep(1.9,r) , y = rep(1.1,r) ,
cex=1.0,
legend = paste(legend.txt,"df") ,
lty = seq(1,by=3,length=r) ,
bty = "n" )
This function can be executed with the following
commands. Windows users should delete the
motif( ) command.
motif()
par(fin=c(6,7),cex=1.0,mex=1.3,lwd=4)
source("fden.ssc")
f.density.plot(c(1,10,10,50,10,10,10,4))
W1=n1
F=W
=n
2 2
is called a noncentral F distribution with
n1 and n2 degrees of freedom and
noncentrality parameter 12.
We will use the notation
F Fn1;n2(12)
249
250
Central Moments:
for n2 > 4
251
0.4
Noncentral F
0.2
h
i
n22 (n1 + 12)2 + (n2 ; 2)(n1 + 212)
V ar(F ) =
n21(n2 ; 2)2(n2 ; 4)
Central F
0.0
for n2 > 2
f(x;v,w)
12)
E (F ) = n(2n(n;1 +
2 2)n1
0.6
0.8
Central and Noncentral F Densities
with (5,20) df and noncentrality parameter = 1.5
0
1
2
3
4
5
x
252
# This code is stored in the file:
#================================================#
# dnf; density of non.central.F
#
# ----------------------------#
# Input : x
can be a scalar or a vector
#
#
v
df for numerator
#
#
w
df for denominator
#
#
delta non-centrality parameter
#
#
(e.g.) dnf(x,5,20,1.5) when x is a
#
#
scalar,
#
#
sapply(x,dnf,5,20,1.5) when x
#
#
is a vector
#
# Output: evaluate density curve of the
#
#
noncentral F distribution
#
##################################################
Central F
0.4
Noncentral F
0.2
f(x;v,w)
0.6
0.8
Central and Noncentral F Densities
with (5,20) df and noncentrality parameter = 3
dnf <- function(x,v,w,delta)
{
sum <- 1
term <- 1
p <- ((delta*v*x)/(w+v*x))
nt <- 100
for (j in 1:nt) {
term <- term*p*(v+w+2*(j-1))/((v+2*(j-1))*j)
sum <- sum + term
}
dnf.x <- exp(-delta)*sum*df(x,v,w)
return(dnf.x)
}
0.0
0
1
2
3
4
fdennc.ssc
5
x
254
253
#
#
#
#
#
#
dnf.slow is aimed to show vectorized
calculations and one of loop avoidance
functions (sapply). Vectorized calculations
operate on entire vectors rather than on
individual components in sequence.
(V&R on p.103-108)
dnf.slow <- function(x,v,w,delta)
{
prod.seq <- function(a,b) prod(seq(b,b+2*(a-1),2))
}
j <- 1:100
p <- ((delta*v*x)/(w+v*x))
numer <- sapply(j,prod.seq,v+w,simplify=T)
denom <- gamma(j+1)*sapply(j,prod.seq,v,simplify=T)
k <- 1 + sum( p^j * numer / denom )
f.x <- k*exp(-delta)*df(x,v,w)
return(f.x)
255
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
The folowing commands can be applied
to ontain a single density value
dnf(0.5,5,20,1.5)
dnf.slow(0.5,5,20,1.5)
The following commands are used
to evalute the noncentral F density
for a vector of values
x <- seq(1,10,.1)
f.x1 <- sapply(x,dnf,5,20,1.5)
f.x2 <- sapply(x,dnf.slow,5,20,1.5)
You will notice that the performance of
dnf is better than that of dnf.slow.
The results should be the same. In this
case using a loop is better than using
vectorized calculations, but is is
usually more efficient to use
vectorized computions.
256
#=============================================#
# noncen.f.density()
#
# -------------------#
# Input : v,w
degrees of freedom
#
#
delta non-centrality parameter
#
#
(e.g.) noncen.f.density(5,20,1.5)
#
# Output: Two F density curves on one plot
#
#
(central and noncental).
#
# Note : You must define the dnf( ) function #
#
before applying this function.
#
#
Unix users must use motif() to open #
#
a graphics window before using this #
#
function.
#
###############################################
# create the axes, lines, and legends.
plot(c(0,5), c(0,0.8), type="n",xlab="x",
ylab="f(x;v,w)")
mtext(main.txt, side=3,line=2.2)
lines(x, cf.x, type="l",lty=1)
lines(x, nf.x, type="l",lty=3)
legend(x=1.6, y=0.64, legend = "Central F ", cex=0.9,
lty =1, bty = "n" )
legend(x=1.6, y=0.56, legend = "Noncentral F", cex=0.9,
lty =3, bty = "n" )
}
n.f.density.plot <- function(v,w,delta)
{
x <- seq(.001,5,length=60)
cf.x <- df(x,v,w)
nf.x <- sapply(x,dnf,v,w,delta)
# For the main title;
main1.txt <- "Central and Noncentral F Densities \n with"
df.txt <- paste(paste("(",paste(paste(v,",",sep=""),
w,sep=""),sep=""),")",sep="")
main2.txt <- paste(df.txt, "df and noncentrality parameter =")
main2.txt <- paste(main2.txt,delta)
main.txt <- paste(main1.txt,main2.txt)
#
#
#
#
#
#
#
#
#
#
#
#
This function is executed with the following
commands. Windows users should delete the
motif( ) command.
motif()
par(fin=c(6.5,7),cex=1.2,mex=1.5,lwd=4,
mar=c(5.1,4.1,4.1,2.1))
source("fdennc.ssc")
n.f.density.plot(5,20,1.5)
motif()
n.f.density.plot(5,20,3.0)
257
258
Sums of squares in ANOVA tables are
quadratic forms
YT AY
where A is a non-negative denite symmetric
matrix (generally a projection matrix).
Reminder:
If Y1; Y2; :::; Yk are independent random
vectors, then
f1(Y1); f2(Y2); : : : ; fk (Yk )
are distributed independently.
Here fi(Yi) indicates that fi( ) is a function
only of Yi and not a function of any other
Yj ; j 6= i.
These could be either real valued or vector
valued functions.
259
To develop F-tests we need to identify conditions under which
Y T AY
YT AiY
has a central (or noncentral)
chi-square distribution
and YT Aj Y are independent
260
Result 4.7: Let A be an nn symmetric matrix
with rank(A) = k, and let
2 3
Y1
Y = 64 .. 75 N (; )
Yn
where is an n n symmetric positive denite
matrix. If
A is idempotent
then
YT AY 2k T A
In addition, if A = 0 then
YT AY 2k
Proof: We will show that the denition of
a noncentral chi-square random variable
(Defn 4.4) is satised by showing that
YT AY = ZT Z
for a normal random vector
2 3
Z1
Z = 64 .. 75
with V ar(Z) = Ikk :
Zk
Step 1: Since A is idempotent we have
A = AA
Step 2: Since is positive denite, then
;1 exists and we have
A;1 = AA;1
)
A = AA
A = AT A
)
261
2 3
x1
Step 3: For any vector x = 64 .. 75 we have
xn
xT Ax = xT AT Ax 0
because is positive denite. Hence, A is non-
negative denite and symmetric.
Step 4: From the spectral decomposition of
A (Result 1.12) we have
A=
where
k
X
j =1
j vj vjT = V DV T
1 2 k > 0
are the positive eigenvalues of A,
2
3
66 1 2
7
D = 64
. . . 775
k
and the columns of V are
v1; v2; ; vk ;
the eigenvectors corresponding to the positive
eigenvalues of A.
263
262
The other n ; k eigenvalues of A are zero because rank(A) = k.
Step 5: Dene
2 1
66 p1 . .
.
B = V 64
p1
k
3
77
75
= V D;1=2
Since V T V = I , we have
B T AB = D;1=2V T V DV T V D;1=2
= D;1=2DD;1=2
= Ikk
Then, since A = AT A we have
I = B T AB = B T AT AB
264
Step 6: Dene Z = BT AY, then
and
V ar(Z) = B T AT AB = Ikk
Finally, since
Z N (B T A; I )
Z N (B T A; I )
we have
Step 7:
ZT Z
because
AT BB T A =
=
=
=
=
ZT Z = (BT AY)T (BT AY)
= YT AT BBT AY
= Y T AY
2k (2)
from Defn 4.4, where
ABB T A
V D V T V D;1=2D;1=2 V T V DV T
V DD;1DV T
V DV T
A
2 = (B T A)T (B T A)
= T AT BBT A
= T A
265
Example 4.3
266
The sum of squared residuals is
For the Gauss-Markov model with
E (Y) = X and V ar(Y) = 2I
include the assumption that
2 3
Y1
Y = 64 .. 75 N (X ; 2I ):
Yn
SSE =
n
X
e2i = eT e
i=1
=
YT (I ; PX )Y:
Use Result 4.7 to obtain the distribution of
For any solution
b = (X T X );X T Y
to the normal equations, the OLS estimator
for X is
^ = X b = X (X T X );X T Y = PX Y
Y
and the residual vector is
^ = (I ; PX )Y:
e = Y;Y
267
SSE = YT 1 (I ; P ) Y
X
2
2
Here
= E (Y ) = X = V ar(Y) = 2I is p.d.
A = 12 (I ; PX ) is symmetric
268
Note that
A = 12 (I ; PX )2I
= I ; PX
is idempodent, and
A = 12 (I ; PX )X = 0
Now consider the \uncorrected" model sum of
squares
n
X
Y^i2 = Y^ T Y^
Then
i=1
SSE
2
2 n;k
where
rank(I ; PX ) = n ; rank(X )
= n;k
= (PX Y)T PX Y
=
YT PXT PX Y
=
YT PX Y:
We could also express this as
SSE 2 2n;k
269
270
Use Result 4.7 to show
n
T 1
1 X ^2
2 i=1 Yi = Y ( 2 PX )Y %
this is A
and = 2I
where
The next result addresses the independence of
several quadratic forms
2k (2)
"
k = rank(X )
2 3
Y1
Result 4.8 Let Y = 64 .. 75 N (; )
Yn
and let A1; A2; : : : ; Ap be n n symmetric
matrices. If
2 = (X )T ( 12 PX )(X )
then
= 12 T XT (PX X )
- this is X
= 12 T XT X
AiAj = 0 for all i 6= j
: : : ; YT ApY
are independent random variables.
271
Y T A 1 Y ; Y T A2 Y ;
272
Proof: From Result 4.1
2
3 2 3
A
Y
1
64 .. 75 = 64 A..1 75 Y
Ap Y
Ap
has a multivariate normal distribution, and
for i 6= j
Cov(AiY; Aj Y) = AiATj
= 0
It follows from Result 4.4 that
A1Y; A2Y; ; ApY
are independent random vectors. Since
YT AiY = YT AiA;
i Ai Y
T
T
= Y Ai A;i AiY
= (AiY)T A;i (AiY)
is a function of AiY only, it follows that
YT A1Y; ; YT ApY
are independent random variables.
Example 4.4. Continuing Example 4.3, show
that the \uncorrected" model sum of squares
n
X
Y^i2 = YT PX Y
i=1
and the sum of squared residuals
n
X
(Yi ; Y^i)2 = YT (I ; PX )Y
i=1
are independently distributed for the \normal
theory" Gauss-Markov model where
Y N (X ; 2I ):
273
Use Result 4.8 with A1 = PX and A2 = I ; PX .
Note that
A1A2 = (I ; PX )(2I )PX
= 2(I ; PX )PX
= 2(PX ; PX PX )
= 2(PX ; PX )
= 0
In Example 4.3 we showed that
n
12 X Y^ 2 i=1 i
T
2k XT2 X and 12
n
X
i=1
!
and
n
12 X (Yi ; Y^i)2 i=1
Consequently,
n
12 X Y^ 2
i=1 i
274
2n;k
where k = rank(X ).
(Yi ; Y^i)2
are independently distributed.
275
276
By Defn 4.6,
F =
=
1 Pn ^ 2
k2P i=1 Yi
1
n
2
(n;k)2 i=1(Yi ; Y^i)
uncorrected model
. mean square
n
X
1
Y^i2
k
i
=1
n
X
1
(Yi ; Y^i)2
n;k
i=1
- Residual mean square (MSE)
Fk;n;k 12 T X T X "
Use
F=
n
1 X Y^ 2
i
k
i=1
n
2
1 X
n;k (Yi ; Y^i)
i=1
to test the null hypothesis
H0 : E (Y) = X = 0
against the alternative
HA : E (Y) = X 6= 0
This reduces to a central
F distribution with (k; n ; k) d.f.
when X = 0
278
277
Comments
(i) The null hypothesis corresponds to the.
condition under which F has a central F
distribution (the noncentrality parameter
is zero). In this example
2 = 12 (X )T (X ) = 0
if and only if X = 0
Example 4.4 is a simple illustration of a
typical ANOVA partition
n
X
Yi2 =
i=1
YT Y
=
YT [(I ; PX ) + PX ] Y
(ii) If k = rank(X ) = number of columns in
X , then H0 : X = 0 is equivalent to
H0 : = 0.
=
YT (I ; PX )Y + YT PX Y
(iii) If k = rank(X ) is less than the number
of columns in X , then X = 0 for some
6= 0 and H0 : X = 0 is not equivalent
to H0 : = 0.
=
279
"
call this A2
n
X
"
call this A1
n
X ^2
(Yi ; Y^i)2 +
Yi
i=1
i=1
"
"
d.f. = rank(A2) d.f. = rank(A1)
280
More generally an uncorrected total sum of
squares can be partitioned as
n
X
Yi2 =
i=1
=
YT Y
Y T A1 Y
+ YT A2Y + + YT AkY
using orthogonal projection matrices
A1 + A2 + + Ak = Inn
where
rank(A1) + rank(A2) + + rank(Ak) = n
and
AiAj = 0 for any i 6= j :
Since we are dealing with orthogonal projection
matrices we also have
ATi = Ai (symmetry)
AiAi = Ai (idempodent matrices)
Result 4.9 (Cochran's Theorem)
2 3
Y1
Let Y = 64 .. 75 N (; 2I )
Yn
and let A1; A2; ; Ak be n n symmetric
matrices with
I = A1 + A2 + + Ak
and
n = r1 + r2 + + rk
where ri = rank(Ai) . Then, for i = 1; 2; : : : ; k
1 T
2 Y AiY and
2ri
1 T 2 Ai
YT A1Y; YT A2Y;
;
are distributed independently.
YT Ak Y
282
281
Proof:
First show that (i) ) (ii)
Proof: This result follows directly from Result
4.7, Result 4.8 and the following Result 4.10.
Result 4.10 Let A1; A2; ; Ak be n n symmetric matrices such that
A1 + A2 + + Ak = I :
Then the following statments are equivalent
Since Ai = I ;
X
Aj , we have
X AiAi = Ai I ; Aj
Xj 6=i
= Ai ; AiAj = Ai
j 6=i
j 6=i
Now show that (ii) ) (iii)
Since an idempodent matrix has eigenvalues
that are either 0 or 1 and the number of nonzero eigenvalues is the rank of the matrix, (ii)
implies that tr(Ai) = rank(Ai). Then,
n = tr(I )
= tr(A1 + A2 + + Ak)
= tr(A1) + tr(A2) + + tr(Ak )
= rank(A1) + rank(A2) + + rank(Ak)
(i) AiAj = 0 for any i 6= j
(ii) AiAi = Ai for all i = 1; : : : ; k
(iii) rank(A1) + + rank(Ak) = n
283
284
Finally, show that (iii) ) (i)
Let ri = rank(Ai). Since Ai is symmetric, we
can apply the spectral decomposition (Result
1.12) to write Ai as
Ai = Ui i UiT
where
i is an ri ri diagonal matrix containing
the non-zero eigenvalues of Ai and
Then
I = A1 + A2 + + Ak
= U11U1T + + UkkUkT
2
66 1 2
= [U1j jUk] 64
...
2
1
...
= U 64
k
3
75 U T
32
3
77 6 U.1T 7
75 4 . 5
T
k Uk
Since rank(A1) + + rank(Ak) = n and
rank(Ai) is the number of columns in Ui, then
U = [U1j jUk ] is an nxn matrix. Furthermore,
rank(U ) = n because the identity matrix on the
left side of the equal sign has rank n. Then,
U T U is an nxn matrix of full rank and (U T U );1
exists, and
Ui = [ u1i j u2i j j uri;i ]
is an n ri matrix whose columns are
the eigenvectors corresponding to the
non-zero eigenvalues of Ai.
286
285
2
1
...
I = U 64
)
k
2
1
...
U T U = U T U 64
)
(U T U );1U T U = 2
1
...
(U T U );1U T U 64
)
2
1
...
I = 64
k
k
k
3
75 U T
It follows that
2 ;1
64 1 . . .
3
75 U T U
3 2 T3
75 = 64 U..1 75 [U1 Uk]
UkT
;k 1
Consequently,
UiT Uj = 0 for any i 6= j
and
AiAj = Uii UiT Uj j Uj = 0
"
3
75 U T U
3
75 U T U
for any i 6= j .
287
this is a matrix
of zeros
288
References:
Johnson, R. A. and Wichern, D. W. (1998)
Applied Multivariate Statistical Analysis, 4th
edition, Pentice Hall (Chapter 4)
Anderson, T. W. (1984)
An Introduction to Multivariate Statistical
Analysis, 2nd edition, Wiley, New York.
Johnson, N. L., Kotz, S. and Balakrishnan, N.
(1994,95) Distributions in Statistics: Continuous Univariate Distributions Vols 1 and 2, 2nd
edition, Wiley, New York.
Searle, S. R. (1987)
Linear Models for Unbalanced Data, Wiley, New
York (Chapters 7 and 8).
Rencher, A. C. (2000)
Linear Models in Statistics, Wiley, New York
(Chapters 4 and 5).
289
Download