n ^ P a

advertisement
e = Y Y^ Partition of a total sum of squares
YRn
Squared length of Y
n 2
X
is
yi = YT Y
Squared length of
i=1
the residual vector
is
n
X
Y^
%
=
vector space spanned by the columns of
=
X.
dimension of this space is rank(X ).
n
X
i=1
Y^i2
=
=
=
=
e = Y Y~ = (I PX )Y
is in the space orthogonal to the space
spanned by the columns of X . It has dimension
n rank(X ):
PX )Y
Squared length of
^ = PX Y is
Y
=
The residual vector
e2i = eT e
i=1
[(I
PX )Y]T (I
T
Y (I
PX )Y
^TY
^
Y
(PX Y)T (PX Y)
Y T (PX )T PX Y since PX is symmetric
Y T PX PX Y since PX is idempotent
Y T PX Y
We have
YT Y = YT (PX + I PX )Y
= YT PX Y + YT (I PX )Y :
181
180
Result 3.6 For the linear model
E (Y) = X and V ar(Y ) = ,
ANOVA
Source
of
Variation
Degrees
of
Freedom Sums of Squares
model
rank(X ) Y^ T Y^ = YT PX Y
(uncorrected)
residuals
total
(uncorrected)
n-rank(X ) eT e = YT (I PX )Y
n
n 2
YT Y = i=1
yi
^ = X b = PX Y
the OLS estimator Y
for X is
^ ) = X
(i) unbiased, i.e., E (Y
(ii) a linear function of Y
(iii) has variance-covariance matrix
V ar(Y^ ) = PX PX
X
This is true for any solution
b = (X T X ) X T Y
182
to the normal equations.
183
Comments:
Proof:
Y^ = X b = PX Y
is said to be
a linear unbiased estimator for
(ii) is trivial, since Y^ = PX Y
(iii) follows from result 2.1.(ii)
(i)
E (Y^ ) = E (PX Y)
= PX E (Y ) from result 2.1.(i)
= PX X
= X since PX X = X
E (Y) = X
For the Gauss-Markov model,
V ar(Y) = 2I and
V ar(Y^ ) = PX (2I )PX
= 2PX PX
= 2PX
= 2 X (X T X ) X T
"
this is sometimes
called the
\hat" matrix.
184
185
Estimable Functions
Questions
Is Y^ = X b = PX Y the \best" estimator for
E (Y) = X?
Is Y^ = X b = PX Y the \best" estimator for
E (Y ) = X in the class of linear, unbiased
estimators?
What other linear functions of , say
cT = c11 + c22 + + ckk;
have OLS estimators that are invariant to
the choice of
b = (X T X ) X T Y
that solves the normal equations?
186
Some estimates of linear functions of the
parameters have the same value, regardless of which solution to the normal equations is used
These are called estimable functions
An example is E(Y) = X
Check that X b has the same value
for each solution to the normal
equations obtained in Example 3.2,
i.e.,
Y1: 777
Y1: 7777
7
X b = YY23:: 77777
7
Y3: 77775
Y3:
2
66
66
66
66
66
66
66
66
66
4
3
187
Example 3.2. Blood coagulation times
Estimable Functions
Diet 1 Diet 2 Diet 3
Y11 = 62 Y21 = 71 Y31 = 72
Y12 = 60
Y32 = 68
Y33 = 67
Defn 3.6: For a linear model
E (Y) = X and V ar(Y) = we will say that
cT = c11 + c22 + + ckk
is estimable if there exists a
linear unbiased estimator aT Y for
cT , i.e., for some non-random
vector a, we have E (aT Y) = cT .
The \Eects" model
Yij = + i + ij
can be written as
2
66
66
66
66
66
66
66
66
66
4
Y11 777 666 1
Y12 7777 6666 1
Y21 7777 = 6666 1
Y31 7777 6666 1
Y32 7775 6664 1
Y33
1
3
2
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
1
1
3
77
77
77
77
77
77
77
77
77
5
2
66
66
66
66
66
4
11 777
12 7777
+ 21 7777
31 7777
3
32 7775
33
3
77
77
1 77
77
2 77
5
2
66
66
66
66
66
66
66
66
66
4
3
188
Examples of estimable functions
+ 1
189
+ 2
Choose aT = (0 0 1 0 0 0): Then,
Choose aT = ( 12 12 0 0 0 0). Then,
E (aT Y) = E ( 21 Y11 + 12 Y12)
= 12E (Y11) + 12E (Y12)
=
=
aT Y = Y21
and
1( + ) + 1( + )
1 2
1
2
+ 1
Choose aT = (1 0 0 0 0 0) and note
that E (aT Y) = E (Y11) = + 1.
190
E (aT Y) = E (Y21) = + 2:
+ 3
Choose aT = (0 0 0 1 0 0): Then,
E (aT Y) = E (Y31) = + 3
191
1 2
2 + 31 2
Note that
Note that
1 2 = ( + 1) ( + 2)
= E (Y11) E (Y21)
= E (Y11 Y21)
= E (aT Y)
where
2 + 31 2 =
=
=
=
3( + 1) ( + 2)
3E (Y11) E (Y21)
E (3Y11 Y21)
E (aT Y)
where
aT = (1 0 -1 0 0 0)
aT = (3 0 -1 0 0 0)
192
Quantities that are not estimable
include
; 1; 2; 3; 31; 1 + 2
To show that a linear function of
parameters,
c0 + c11 + c22 + c33
is not estimable, one must show
that there is no non-random vector
aT = (a0; a1; a2; a3)
for which
E (aT Y) = c0 + c11 + c22 + c33
194
193
For 1 to be estimable we would
need to nd an a that satises
1 = E (aT Y)
= a1E (Y11) + a2E (Y12) + a3E (Y21)
+a4(E (Y31) + a5E (Y32) + a6E (Y33)
= (a1 + a2)( + 1) + a3( + 2)
+(a4 + a5 + a6)( + 3)
This implies 0 = a3 = (a4 + a5 + a6),
Then 1 = (a1 + a2)( + 1)
which is impossible.
195
Example 3.1. Yield of a chemical
process
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y1
1 160
Y2
1 165
Y3 = 1 165
Y4
1 170
Y5
1 175
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
3
2
1
2
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
4
1
0
2
1 + 3
2
4
5
3
7
7
7
7
7
7
7
7
5
2
66
66
66
66
66
66
66
66
64
3
77
77
77
77
77
77
77
77
75
Since X has full column rank, each
element of is estimable.
0
Consider B1 = cT where c = 1 .
0
2
66
66
66
66
4
Since X has full column rank, the
unique least squares estimator for is
b = (X T X ) 1XT Y
and an unbiased linear estimator for
cT is
cT b = cT (X T X ) 1X T Y
call this aT
197
196
Result 3.7 For a linear model with
E (Y) = X and V ar(Y ) = (i) The expectation of any observation is estimable.
(ii) A linear combination of estimable functions is estimable.
(iii) Each element of is estimable if
and only if rank(X ) = k = number of columns.
(iv) Every cT is estimable if and
only if rank(X ) = k = number
of columns in X.
198
3
77
77
77
77
5
Proof:
Y1 77
(i) For Y = .. 77775 with E (Y) = X
Yn
we have
2
3
2
66
66
66
4
3
0
0.
.
one in
the ith
Yi = aTi Y where ai = 1
position
0.
.
0
Then
E (Yi) = E (aTi Y) = aTi E (Y)
= aTi X
= cTi 66
66
66
66
66
66
66
66
66
66
66
4
77
77
77
77
77
77
77
77
77
77
77
5
199
(ii) Suppose
is estimable. Then, there
is an ai such that E (aTi Y) = cTi .
Now consider a linear combination of
estimable functions
w1cT1 + w2cT2 + + wpcTp cTi Let a = w1a1 + w2a2 + + wpap.
Then,
E (aT Y) = E (w1aT1 Y + + wpaTp Y)
= w1E (aT1 Y) + + wpE (aTp Y)
= w1cT1 + + wpcTp (iii) Previous argument.
(iv) Follows from (ii) and (iii).
Result 3.8. For a linear model with
E (Y) = X and V ar(Y) = ,
each of the following is true if and
only if cT is estimable.
(i) cT = aT X for some a i.e., c is in
the space spanned by the rows
of X .
(ii) cT a = 0 for every a for which
X a = 0.
(iii) cT b is the same for any solution to the normal equations
(X T X )b = X T Y, i.e., there is
a unique least squares estimator
for cT .
201
200
Use Result 3.8. (ii) to show that is not estimable in Example 3.2. In
that case
1
1
E (Y) = X = 11
1
1
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
and
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
2
3
3
77
77
77
77
77
77
75
In example 3.2, X d = 0 if and only
if
d=w
= cT = [1 0 0 0]:
Let dT = [1
Part (ii) of Result 3.8 sometimes
provides a convenient way to identify all possible estimable functions
of .
2
66
66
66
66
66
66
64
1
1 for some scalar w.
1
1
3
77
77
77
77
77
77
75
Then, cT is estimable if and only
if
0 = cT d = w(c1 c2 c3 c4)
() c1 = c2 + c3 + c4.
1
1 1], then
X d = 0; but cT d = 1 6= 0
Hence, is not estimable.
202
203
Then,
(c2 + c3 + c4) + c21 + c32 + c43
is estimable for any (c2 c3 c4) and
these are the only estimable
functions of ; 1; 2; 3.
Some estimable functions are
Defn 3.7: For a linear model with
E (Y) = X and V ar(Y) = ;
where X is an n k matrix,
Crkk1 is said to be estimable if
all of its elements
+ 13 (1 + 2 + 3) (c2 = c3 = c4 = 13 )
and
C =
+ 2 (c2 = 1 c3 = c4 = 0)
but
+ 22
is not estimable.
2
66
66
66
66
66
66
66
66
66
4
cT1
cT2 =
..
cTr
3
77
77
77
77
77
77
77
77
77
5
2
66
66
66
66
66
66
66
66
66
4
cT1 cT2 ..
cTr 3
77
77
77
77
77
77
77
77
77
5
are estimable.
204
Result 3.9 For the linear model
with E (Y) = X and V ar(Y) = ,
where X is an n k matrix, each of
the following conditions hold if and
only if C is estimable.
(i) AX = C for some matrix A, i.e.,
each row of C is in the space
spanned by the rows of X .
(ii) C d = 0 for any d for which
X d = 0.
(iii) C b is the same for any solution to the normal equations
(X T X )b = X T y.
206
205
Summary
For a linear model
Y = X + with E (Y) = X and V ar(Y) = ,
we have
Any estimable function has a
unique interpretation
The OLS estimator for an estimable function C is unique
C b = C (X T X ) X T Y
The OLS estimator for an
estimable function C is
{ a linear estimator
{ an unbiased estimator
207
In the class of linear unbiased estimators for cT , is the OLS estimator the \best?"
Here \best" means smallest expected squared error. Let t(Y) denote an estimator for cT . Then,
the expected squared error is
MSE = E [t(Y) cT ]2
= E [t(Y) E (t(Y)) + E (t(Y)) cT ]2
= E [t(Y) E (t(Y))]2
+[E (t(Y)) cT ]2
+2[E (t(Y)) cT ]E [t(Y) E (t(Y))]
= E [t(Y) E (t(Y))]2 + [E (t(Y)) cT ]2
= V ar(t(Y)) + [bias]2
If we restrict our attention to linear
unbiased estimators for cT :
E(t(Y)) = cT t(Y) = aT Y for some a
then, t(Y) = aT Y is the best linear
unbiased estimator (blue) for cT if
V ar(aT Y) V ar(dT Y)
for any d and any value of .
208
Result 3.10 (Gauss-Markov
Theorem)
For the Gauss-Markov model,
E (Y) = X and V ar(Y) = 2I;
the OLS estimator of an estimable
function cT is the unique best
linear unbiased estimator (blue)
of cT .
Proof:
(i) For any solution b = (X T X ) X T Y to
the normal equations, the OLS estimator for cT is
cT b = cT (X T X ) X T y
which is a linear function of Y.
210
209
(ii) From Result 3.8.(i), there exists a vector a such that cT = aT X . Then
E (cT b) = E (cT (X T X ) X T Y)
= cT (X T X ) X T E (Y)
= cT (X T X ) X T X
= aT X (X T X ) X T X
of
X
projection
PX
"
onto the column space
= aT X
= cT Hence, cT b is an unbiased estimator.
211
(iii) Minimum variance in the class of linear
unbiased estimators
Suppose dT Y is any other linear
unbiased estimator for cT . Then
E (dT Y) = dT E (Y) = dT X = cT for every . Hence, dT X = cT and
c = X T d.
We must show that
V ar(cT b) V ar(dT Y):
First, note that
V ar(dT Y) = V ar(cT b + [dT Y cT b])
= V ar(cT b) + V ar(dT Y cT b)
+2Cov(cT b; dT Y cT b)
Then
V ar(dT Y) V ar(cT b)
+2Cov(cT b; dT Y
because
cT b)
= V ar(cT b)
Cov(cT b; dT Y
cT b) = 0:
To show this rst note that
cT b = cT (X T X ) X T Y
is invariant with respect to the choice of
(X T X ) . Consequently, we can use the
Moore-Penrose generalized inverse which
is symmteric. (Not every generalized inverse of X T X is symmetric.)
212
213
Then,
Cov(cT b; dT Y
=
cT b)
Cov(cT (X T X )
XT Y ; [dT
cT (X T X )
= (cT (X T X ) XT ) V ar (Y)[dT
= [cT (X T X ) XT ] 2 I [d
X T ]Y)
c T (X T X )
X T ]T
Then,
Cov(cT b; dT Y cT b)
= 2[cT (X T X ) c cT (X T X ) c]
X(XT X ) c]
"
This is where the
symmetry of (X T X )
is needed.
=
2[cT (X T X ) X T d
since
"
cT (X T X )
X T X (X T X )
c]
XT d = c
Since cT b is invariant to the choice of b
(result 3.8.(iii)), we were able to use the
Moore-Penrose inverse for (X T X ) which
satises
(X T X ) (X T X )(X T X ) = X T X
by denition.
214
=0
Consequently,
V ar(dT Y) V ar(cT b)
and cT b is blue.
215
(iv) To show that the OLS estimator is the
unique blue, note that
V ar(dT Y) = V ar(cT b + [dT Y cT b])
= V ar(cT b) + V ar(dT Y cT b)
because Cov(cT b; dT Y cT b) = 0.
What if you have a linear model
that is not a Gauss-Markov model?
Then, dT Y is blue if and only if
V ar(dT Y cT b) = 0 :
E (Y) = X
V ar(Y) = 6= 2I
This is equivalent to
dT Y cT b = constant:
Since both estimators are unbiased
E (dT Y cT b) = E (dT Y) E (cT b)
= 0:
Consequently, dT Y cT b = 0 for all Y
and cT b is the unique blue.
216
Parts (i) and (ii) of the
proof of result 3.11 do not
require
V ar(Y) = 2I :
217
V ar(Y) = 2I
and the OLS estimator for
any estimable quantity,
cT b = cT ( X T X ) X T Y ;
is invariant to the choice of
(X T X ) .
Consequently, the OLS estimator for cT ,
cT b = cT (X T X ) X T Y
is a linear unbiased estimator.
Result 3.8 does not require
The OLS estimator cT b may
not be blue. There may be
other linear unbiased estimators with smaller variance.
218
Generalized Least Squares (GLS)
Estimation:
Variance of the OLS estimator
of an estimable quantity:
V ar(cT b) = V ar(cT (X T X ) X T Y)
= cT (X T X ) X T X [(X T X ) ]T c
For the Gauss-Markov model
V ar(Y) = = 2I
and
V ar(cT b)
= 2cT (X T X ) X T X [(X T X ) ]T c
= 2cT (X T X ) c
Defn 3.8: For a linear model with
E (Y) = X and V ar(Y) = ;
where is positive denite, a
generalized least squares estimator
for minimizes
(Y X bGLS)T 1(Y X bGLS)
Strategy: Transform Y to a random vector Z for which the GaussMarkov model applies.
219
220
The spectral decomposition of yields
and
n
= j =1
j uj uTj :
X
Dene
E (Z) = E ( 1=2Y)
= 1=2E (Y) = 1=2X
n 1
T
1=2 = j =1
j uj uj
X
s
= W
and create the random vector
Z = 1=2Y:
and we have a Gauss-Markov model
for Z, where W = 1=2X is the
model matrix.
Then
V ar(Z) = V ar( 1=2Y)
= 1=2 1=2 = I
221
Note that
It must be a solution to the normal
equations for the Z model
W b)T (Z
(Z
W b)
= ( 1=2Y 1=2X b)T
( 1=2Y 1=2X b)
= (Y X b)T 1=2 1=2(Y
= (Y X b)T 1(Y X b)
X b)
WTWb = WTZ
, (X T 1=2 1=2X )b
= X T 1=2 1=2Y
, (X T 1X )b = X T 1Y
Hence, any GLS estimator for the
Y model is an OLS estimator for
the Z model.
222
These are the generalized least
squares estimating equations.
223
Result 3.11 For a linear model
with E (Y) = X and V ar(Y) = ,
the GLS estimator of an estimable
function cT ,
Any solution
bGLS = (W T W ) W T Z
= (X T 1X ) X T 1Y
is called a generalized least squares
(GLS) estimator for .
224
cT bGLS = cT (X T 1X ) X T 1Y ;
is the unique blue of cT .
Proof: Since cT is estimable, there is an
a such that
cT = E (aT Y)
= E (aT 1=2 1=2Y)
= E (aT 1=2Z)
Consequently, cT is estimable for the Z
model. Apply the Gauss-Markov theorem
(result 3.10) to the Z model.
225
Comments
For the linear model with
For the Gauss-Markov model,
cT bGLS = cT bOLS :
E (Y) = X and V ar(Y) = ;
both the OLS and GLS estimators for an estimable function
cT are linear unbiased estimators.
V ar(cT bOLS) =
cT (X T X ) X T X [(X T X ) ]T c
V ar(cT bGLS) =
cT (X T 1X ) X T 1X (X T 1X ) c
and
V ar(cT bOLS) V ar(cT bGLS)
The blue propoerty of cT bGLS
assumes that V ar(Y) = is
known.
The same results, including Results 3.12, hold for the Aitken
model where E (Y) = X and
V ar(Y) = 2V for some known
matrix V .
227
226
In practice V ar(Y) = is usually
unknown. An approximation to
bGLS = (X T 1X ) X T 1Y
is obtained by substituting a
consistent estimator ^ for .
{ use method of moments or
maximum likelihood
estimation to obtain ^
228
{ the resulting estimator
not a linear estimator
is is consistent
but not
necessarily unbiased
does not provide a blue
for estimable functions
may have larger mean
squared error than the
OLS estimator
229
Download