Regression Analysis Example 3.1:

advertisement
Regression Analysis
Example 3.1: Yield of a chemical
process
3. Linear Models
Yield (%) Temperature (oF) Time (hr)
Y
X1
X2
77
160
1
82
165
3
84
165
2
89
170
1
94
175
2
Unied approach
{ regression analysis
{ analysis of variance
{ balanced factorial experiments
Extensions
{ analysis of unbalanced experiments
{ models with correlated responses
split plot designs
repeated measures
{ models with xed and random
components (mixed models)
Simple linear regression model
Yi = 0 + 1 X1i + 2 X2i + i
i = 1; 2; 3; 4; 5
124
Matrix formulation:
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Analysis of Variance (ANOVA)
Y1
0 + 1 X11 + 2 X21 + 1
Y2
0 + 1 X12 + 2 X22 + 2
Y3 = 0 + 1 X13 + 2 X23 + 3
Y4
0 + 1 X14 + 2 X24 + 4
Y5
0 + 1 X15 + 2 X25 + 5
3
2
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
0 + 1 X11 + 2 X21
1
0 + 1 X12 + 2 X22
2
= 0 + 1 X13 + 2 X23 + 3
0 + 1 X14 + 2 X24
4
0 + 1 X15 + 2 X25
5
2
3
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
1
1
= 1
1
1
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
X11
X12
X13
X14
X15
X21
X22
X23
X24
X25
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
125
1
0
2
1 + 3
2
4
5
3
7
7
7
7
7
7
5
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
Source of
Variation
Sums of
d.f. Squares
Model
2
Error
2
C. total
4
5
X
i
i
(Y^
Y )2
1
2
SSmodel
(Y
Y^ )2
1
2
SSerror
=1
5
X
i
i
=1
5
X
Mean
Squares
=1
:
i
(Y
i
i
Y )2
:
where
Y: = n1 n Yi
i=1
Y^i = b0 + b1 X1i + b2 X2i
X
n = total number of observations
126
127
Example 3.2. Blood coagulation times
(in seconds) for blood samples from six
dierent rats. Each rat was fed one
of three diets.
Diet 1 Diet 2 Diet 3
Y11 = 62 Y21 = 71 Y31 = 72
Y12 = 60
Y32 = 68
Y33 = 67
observed time
for the j -th
rat fed the
i-th diet
"
mean time
for rats
given the
i-th diet
2
3
2
3
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
"
Y
"
X
2
3
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
5
"
-
random error
with
E ( ) = 0
E (Y) = X and V ar(Y) = ij
129
128
An \eects" model
Yij = + i + ij
This can be expressed as
Y11
1100
Y12
1100 Y21 = 1 0 1 0 1 +
Y31
1 0 0 1 2
Y32
1 0 0 1 3
Y33
1001
or
Y = X + :
This is a linear model with
2
3
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
"
Assuming that E (ij ) = 0 for all
(i; j ), this is a linear model with
\Means" model
Yij = i + ij
%
You can express this model as
Y11
100
11
Y12
1 0 0 1
12
Y21 = 0 1 0 2 + 21
Y31
0 0 1 3
31
Y32
001
32
Y33
001
33
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
11
12
21
31
32
33
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
You could add the assumptions
independent errors
homogeneous variance,
i.e.,
V ar(ij ) = 2
to obtain a linear model
Y = X + :
with
E (Y) = X and V ar(Y) = 130
E (Y) = X V ar(Y) = V ar() = 2 I
131
Analysis of Variance (ANOVA)
Source of
Variation
d.f.
Diets
3-1=2
3
X
Error
(n
3
X
C. total
i
(n
1) = 5
i
=1
i
1) = 3
i
=1
i
Sums of
Squares
3
X
n (Y Y )2
=1
i
ni
3 X
X
=1 j =1
i
3
X
i
=1
i:
::
(Y
ij
ni
X
j
=1
Mean
Squares
1
2 SSdiets
Y
i:
(Y
1
3
)2
SSerror
Y )2
ij
::
where
Example 3.3. A 2 2 factorial
experiment
Experimental units: 8 plots with 5 trees
per plot.
Factor 1: Variety (A or B)
Factor 2: Fungicide use (new or old)
Response: Percentage of apples
with spots
Percentage of
apples with spots
ni = number of rats fed the i-th diet
Yij
Yi: = n1
i j =1
nXi
Y111
Y112
Y121
Y122
Y211
Y212
Y211
Y222
Y:: = n1 3
Yij
: i=1 j =1
X
n: =
nXi
3
n
i=1 i
X
= total number of observations
Variety
Fungicide
use
A
A
A
A
B
B
B
B
new
new
old
old
new
new
old
old
= 4.6
= 7.4
= 18.3
= 15.7
= 9.8
= 14.2
= 21.1
= 18.9
132
133
Linear model:
Y = + + + Æ
+ "
"
"
"
"
pervariety
fung.
interrandom
cent
eects
use
action
error
with
(i=1,2)
(j=1,2)
spots
ijk
Analysis of Variance (ANOVA)
Source of
Variation
Sums of
Squares
d.f.
2
Varieties
2
1=1
4 X (Y
Fungicide use
Variety Fung.
use interaction
2
1=1
4
(2
1)(2
Error
4(2
Corrected total
8
=1
1)
1) = 4
1=7
=1
2
X
Y
i::
:::
i
2
j
=1
2
X
(Y
Y
:j:
2
X
=1 j =1
i
ij:
)2
Y
+ Y
Y
)2
Y
)2
Y
)2
i::
:j:
:::
X X X
i
j
k
X X X
i
j
k
(Y
(Y
ijk
ijk
j
ij
ijk
)2
:::
(Y
i
ij:
:::
134
Here we are using 9 parameters
T = ( 1 2 1 2 Æ11 Æ12 Æ21 Æ22)
to represent the 4 response means,
E (Yijk) = ij ; i = 1; 2; and j = 1; 2;
corresponding to the 4 combinations of levels of the two factors.
135
\Means" model
Yijk = ij + ijk
Write this model in the form
%
Y =X+
Y111
Y112
Y121
Y122
Y211
Y212
Y221
Y224
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
3
2
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
=
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
2
111
112
121
122
211
212
221
222
3
+
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
1
0
0
1
1
0
0
0
0
1
1
0
0
1
1
1
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
1
3
2
6
76
76
76
76
76
6
76
76
76
76
7
76
76
6
76
76
76
76
7
76
76
56
6
4
1
2
1
2
Æ11
Æ12
Æ21
Æ22
ij = E (Yijk) = mean
percentage
of apples
with spots
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
This linear model can be written in
the form Y = X + , that is,
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y111
Y112
Y121
Y122
Y211
Y212
Y221
Y222
3
2
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
136
Any linear model can be written as
Y = X + 6
6
6
6
6
6
6
4
"
Y1
Y2
..
Y
n
3
2
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
4
=
X11
X12
..
X1
n
X21 X 1
X22 X 2
..
..
X2 X
k
k
"
n
kn
observed the elements of X
responses are known
(non-random) values
3
2
7
76
76
76
7
76
74
5
0
0
1
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
1
3
2
7
7
7
72
7
7
76
76
76
7
76
76
76
74
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
11
12
21
22
3
7
7
7
7
7
7
5
+
111
112
121
122
211
212
221
222
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
137
General Linear Model
2
=
1
1
0
0
0
0
0
0
Y1
Y = Y..2 is a random vector with
Yn
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
5
(1) E (Y) = X for some known n k matrix X
of constants and unknown k 1
parameter vector 1
..
k
2
3
7
7
7
7
5
+
6
6
6
6
6
6
6
4
1
2
..
3
7
7
7
7
7
7
7
5
n
"
random errors
are not
observed
138
(2) Complete the model by specifying a probability distribution for
the possible values of Y or 139
Sometimes we will only specify the
covariance matrix
Gauss-Markov Model
V ar(Y) = Defn 3.1: The linear model
Y =X+
is a Gauss-Markov model if
Y = X + V ar(Y) = V ar() = 2 I
Since
we have
for an unknown constant 2.
= Y X = Y E (Y)
Notation: Y ( X ; 2 I )
"
"
"
and
distributed E (Y) V ar(Y)
as
E () = 0
V ar() = V ar(Y) = The distribution of Y is not
completely specied.
141
140
Normal Theory
Gauss-Markov Model
Defn 3.2: A normal theory GaussMarkov model is a GaussMarkov model in which Y (or
) has a multivariate normal
distribution.
The additional assumption of a
normal distribution is
(1) not needed for some estimation
results
(2) useful in creating
condence intervals
tests of hypotheses
Y N (X ; 2 I )
%
distr.
as
"
-
-
multivar. E (Y) V ar(Y)
normal
distr.
142
143
Objectives
(iii) Analysis of Variance (ANOVA)
(i) Develop estimation procedures
Estimate ?
Estimate E(Y) = X Estimable functions of .
(iv) Tests of hypotheses
Distributions of quadratic forms
F-tests
power
(ii) Quantify uncertainty in
estimates
variances, standard deviations
distributions
condence intervals
(v) sample size determination
144
145
Least Squares Estimation
OLS Estimator
For the linear model with
Defn 3.3: For a linear model with
E (Y) = X, any vector b that
minimizes the sum of squared
residuals
E (Y) = X and V ar(Y) = we have
Y1
X11
Y2 = X12
..
..
Yn
X1n
and
2
3
2
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
4
X21 Xk1
X22 Xk2 ..1 + ..1
..
.
X2n Xkn k n
3
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
3
2
3
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
5
Yi = 1 X1i + 2 X2i + + k Xki + i
= XTi + i
where XTi = (X1i X2i Xki) is the
i-th row of the model matrix X .
146
n
Q(b) = i=1
(Yi XTi b)2
= (Y X b)T (Y X b)
X
is an ordinary least squares (OLS)
estimator for .
147
OLS Estimating Equations
For j = 1; 2; : : : ; k, solve
(b) = 2 n (Y XT b) X
0 = @Q
ij
i
i=1 i
@bj
X
These equations are expressed in
matrix form as
0 = X T (Y X b)
= XT Y XT Xb
If Xnk has full column rank, i.e.,
rank(X ) = k, then
(i) X T X is non-singular
(ii) (X T X ) 1 exists and is unique
Consequently,
(X T X ) 1(X T X )b = (X T X ) 1X T y
or
XT Xb = XT Y
These are called the \normal"
equations.
148
and
b = (X T X ) 1 X T Y
is the unique solution to the normal
equations.
149
Generalized Inverse
If rank(X ) < k, then
(i) there are innitely many solutions to the normal equations
(ii) if (X T X ) is a generalized inverse of X T X , then
b = (X T X ) X T Y
is a solution of the normal equations.
150
Defn 3.4: For a given m n matrix
A, any n m matrix G that satises
AGA = A
is a generalized inverse of A.
Comments
We will often use A to denote
a generalized inverse of A.
There may be innitely many
generalized inverses.
If A is an m m nonsingular
matrix, then G = A 1 is the
unique generalized inverse for
A.
151
Example 3.5.
2
A=
6
6
6
6
6
6
6
6
4
16 6 10
6 21 15
10 15 25
Example 3.2. \means" model
3
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
with rank(A) = 2
A generalized inverse is
2
G=
6
6
6
6
6
6
6
6
6
6
4
1
20 01 0
0 30 0
0 0 501
3
7
7
7
7
7
7
7
7
7
7
5
Y11
1 0 0
Y12
1 0 0
Y21 = 0 1 0
Y31
0 0 1
Y32
0 0 1
Y33
0 0 1
G=
16
6
0
6
21
0
2
3
0
0
0
7
7
7
7
7
5
=
6
6
6
6
6
6
4
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
11
12
1
2 + 21
31
3
32
33
3
7
7
7
7
7
7
5
n1 0 0
X T X = 0 n2 0
0 0 n3
Another generalized inverse is
3 1
5
2
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
21
300
6
300
0
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
For this model
Check that AGA = A.
2 2
6 4
6
6
6
6
4
3
6
300
16
300
0
0
0
0
2
3
6
6
6
6
6
6
4
7
7
7
7
7
7
5
Y11 + Y12
X T Y = Y21
Y31 + Y32 + Y33
3
7
7
7
7
7
7
5
2
3
6
6
6
6
6
6
4
7
7
7
7
7
7
5
152
153
Example 3.2. \eects" model
2
and the unique OLS estimator for
= (1 2 3)T is
b = (X T X )
1
n1 0
= 0 n12
0 0
2
6
6
6
6
6
6
6
6
6
6
6
6
4
1 XT Y
0
0
1
n3
3
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
4
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y11
1 1 0
Y12
1 1 0
Y21 = 1 0 1
Y31
1 0 1
Y32
1 0 0
Y33
1 0 0
Here
Y1:
Y1:
Y2: = Y2:
Y3:
Y3:
3
2
3
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
5
154
3
2
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
4
n n1
X T X = nn12 n01
n3 0
Y::
X T Y = YY12::
Y3:
2
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
5
11
12
1 + 21
2
31
3
32
33
3
7
7
7
7
7
7
7
7
7
7
5
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
n2 n3
0 0
n2 0
0 n3
3
7
7
7
7
7
7
7
7
7
7
7
7
7
5
155
Solution B: Another generalized inverse
Solution A:
0 0 0 0
0 1 0 0
(X T X ) = 0 n01 1 0
n2
0 0 0 n13
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
5
for X T X is
(X T X )
and a solution to the normal
equations is
(X T X )
b =
0 0
0 n1 1
=
0 0
0 0
0
1:
Y
= Y2:
Y3:
XT Y
0
0
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2
3
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
5
0
0
0
n2 1
0 n3 1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
4
Let
Y::
Y1:
Y2:
Y3:
n n 1 n2
n1 n1 0
n2 0 n2
1
3
7
7
7
7
7
7
5
0 0 0
n n1 n2
A = n1 n1 0
n2 0 n 2
3
7
7
7
7
7
7
7
7
7
7
5
=
2 2
6
6 6
6 6
6 6
6 6
6 6
6 6
6 4
6
6
6
4
2
3
6
6
6
6
6
6
4
7
7
7
7
7
7
5
0
0
0
0
A 1 = n1
3
2
6
6
6
6
6
6
6
6
4
A
1
=
=
1
1
1
1
n1+n3
n1
1
1
1
3
n2+n3
n2
1
jA11j
jA12j
jA13j
2
jAj
6
6
6
4
1
2
n1n2n3
6
6
6
4
jA21j
jA22j
jA23j
jA31j
jA32j
jA33j
3
7
7
7
5
n1n2
n1n2
n1n2
n1n2 n2(n1 + n3) n1n2
n1n2 n1n2
n1(n2 + n3)
b = (X T X ) X T Y
1
1
1
0
= n1
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
n1+n3
n1
1
0
1 0
1 0
n2+n3 0
n2
0 0
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
4
Y:: Y1: Y2:
n1+n3
= n1 Y:: + ( n1 )nY2+1:n+3 Y2:
Y:: + Y1: + ( n2 )Y2:
3
Y::
Y1:
Y2:
Y3:
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
0
3
7
7
7
7
7
7
7
7
7
7
5
3
7
7
7
5
157
Then
7
7
7
7
7
7
7
7
5
and a solution to the normal equations is
2
7
7
7
7
7
7
7
7
7
7
7
5
and use result 1.4(ii) to compute
156
Then
3
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
b = Y1:
Y2:
Y3:
0
3
Y3:
Y3:
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
This is the OLS estimator for
= 12
3
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
5
reported by PROC GLM in the SAS
package, but it is not the only possible solution to the normal equations.
158
159
Solution C: Another generalized
inverse for X T X is
2
(X T X ) =
1
2
n
1
n
1
n
1
n
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
n
1
n1
1
n
n
0
0
0
0
0
1
n2
0
1
n3
Evaluating Generalized Inverses
3
Algorithm 3.1:
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
The corresponding solution to the
normal equations is
b = (X T X ) X T Y
2
=
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
2
1
n
1
n
1
n
1
n
1
n
1
n1
1
n
n
0
0
0
0
0
1
n2
0
1
n3
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
Y::
Y::
Y1: = Y1: Y::
Y2:
Y2: Y::
Y3:
Y3: Y::
2
6
6
6
6
6
6
6
6
6
6
4
3
2
3
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
5
(i) Find any r r nonsingular submatrix of
A where r=rank(A). Call this matrix
W.
(ii) Invert and transpose W , ie., compute
(W 1)T .
(iii) Replace each element of W in A with
the corresponding element of (W 1)T
(iv) Replace all other elements in A with
zeros.
(v) Transpose the resulting matrix to obtain G, a generalized inverse for A.
161
160
Example 3.6.
4 1 2 0
1 5 15
A= 1 1 3 5
3 2
3
6
6
6
6
6
6
4
7
7
7
7
7
7
5
-
Another solution
W = 11 53
2
3
6
6
4
7
7
5
(W 1)T =
0
G = 0
0
0
2
6
6
6
6
6
6
6
6
6
4
2
=
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
3
2
5
2
0
0
0
0
0 0
1 0
2
1 0
2
0 0
3
2
1
2
0
5
2
1
2
0
2
6
6
6
4
3=2
5=2
1=2
1=2
3
7
7
7
5
4 1 2 0
A = 1 1 5 15
3 1 3 5
W = 43 05
2
3
6
6
6
6
6
6
4
7
7
7
7
7
7
5
2
3
6
6
4
7
7
5
(W 1)T =
2
6
6
6
6
4
T
3
7
7
7
7
7
7
7
7
7
5
0 0
G = 0 0 0
0
20 0 0
2
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
5
20
2
=
Check: AGA = A
162
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
3
20
0
4
20
0 200
0 0 0
0 0 0
3 0 4
20
20
5
20
3
7
7
7
7
7
7
7
7
7
5
5
20
0
20
3
20
4
20
3
7
7
7
7
5
T
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
163
Algorithm 3.2
For any m n matrix A with rank(A) = r,
Proof: Check if AGA = A.
(i) compute a singular value decomposition of A (see result 1.14) to obtain
P AQ = Drr 0
AGA = P
2
6
6
4
0
0
7
7
5
1 F
1
(ii) G = Q D
F2 F3 P is a generalized
inverse for A for any choice of F1; F2; F3.
3
6
6
4
7
7
5
4
3
where
P is an m m orthogonal matrix
Q is an n n orthogonal matrix
D is an r r matrix of singular values
2
2
T
2
=P
T
=P
T
=P
T
4
2
4
2
4
D
0
0
0
3
5
2
Q Q4 D
F2
T
"
1
F1 35 P P
F3
Defn 3.5: For any matrix A there
is a unique matrix M , called the
Moore-Penrose inverse, that
satises
(i) AMA = A
(ii) MAM = M
(iii) AM is symmetric
(iv) MA is symmetric
166
4
"
D
0
0
0
3
5
Q
T
nn
mm
identity
identity
matrix
matrix
32
32
1
D 0 5 4 D F1 5 4 D 0 35 Q
0 0
F2 F3 0 0
32
I DF1 5 4 D 0 35 Q
T
0
D
0
0
0 0
0 0
3
5
Q
T
T
=A
164
Moore-Penrose Inverse
2
T
165
Result 3.1
1
M = Q D0 00 P
2
3
6
6
6
6
4
7
7
7
7
5
is the Moore-Penrose inverse of A,
where
P AQ = D 0
3
2
6
6
6
4
0 0
7
7
7
5
is a singular value decomposition of
A.
167
Properties of generalized inverses
of X T X
Normal equations: (X T X )b = X T Y
Result 3.3 If G is a generalized inverse of X T X , then
(i) GT is a generalized inverse of
XT X.
(ii) XGX T X = X , i.e., GX T is a
generalized inverse of X .
(iii) XGX T is invariant with respect
to the choice of G.
(iv) XGX T is symmetric.
168
Hence, 0 = BX T
) BX T = X T
) X T XGT X T
XT
= XT
Taking the transpose
X = (X T XGT X T )T
= XGX T X
Hence, GX T is a generalized inverse for X .
(iii) Suppose F and G are generalized inverses for X T X . Then, from (ii)
XGX T X = X
and
XF X T X = X
170
Proof:
(i) Since G is a generalized inverse of
(X T X ), (X T X )G(X T X ) = X T X .
Taking the transpose of both sides
X T X T = (X T X )G(X T X ) T
= (X T X )T GT (X T X )T
But (X T X )T = X T (X T )T = X T X ,
hence (X T X )GT (X T X ) = (X T X )
"
#
"
#
(ii) From (i) (X T X )GT (X T X ) = (X T X )
- Call this B
Then
0 = BX T X X T X
= (BX T X X T X )(BT I )
= BX T XBT X T XBT BX T X X T X
= (BX T X T )(BX T X T )T
169
It follows that
0 = X X
= (XGX T X XF X T X )
= (XGX T X XF X T X )(GT X T F T X T )
= (XGX T XF X T )X (GT X T F T X T )
= (XGX T XF X T )(XGT X T XF T X T )
= (XGX T XF X T )(XGX T XF X T )T
Since the (i,i) diagonal element of the result of multiplying a matrix by its transpose
is the sum of the squared entries in the i-th
row of the matrix, the diagonal elements
of the product are all zero only if all entries are zero in every row of the matrix.
Consequently,
(XGX T XF X T ) = 0
171
Estimation of the Mean Vector
E (Y) = X
(iv) For any generalized inverse G,
T = GX T XGT
is a symmetric generalized inverse.
Then
XT X T
is symmetric and from (iii),
XGX T = XT X T :
For any solution to the normal equations,
say
b = (X T X ) X T Y ;
the OLS estimator for E (Y) = X is
Y^ = X b = X (X T X ) X T Y
= PX Y
The matrix PX = X (X T X ) X T
is called an \orthogonal projection
matrix".
Y^ = PX Y is the projection of Y onto
the space spanned by the columns
of X .
172
Result 3.4 Properties of a
projection matrix
PX = X (X T X ) X T
173
(ii) PX is symmetric
(from Result 3.3 (iv))
(i) PX is invariant to the choice of
(X T X ) . For any solution
b = (X T X ) X T Y
(iii) PX is idempodent (PX PX = PX )
(iv) PX X = X
(from Result 3.3 (ii))
(v) Partition X as
to the normal equations
X = [X1jX2j jXk] ;
Y^ = X b = PX Y
then PX Xj = Xj
is the same.
(from Result 3.3 (iii))
174
175
Result 3.5 Properties of I PX
Residuals:
(i) I PX is symmetric
(ii) I PX is idempodent
ei = Yi Y^i i = 1; : : : ; n
The vector of residuals is
e=
=
=
=
Y
Y
Y
(I
(I PX )(I PX ) = I PX
Y^
Xb
PX Y
PX )Y
(iii)
(I PX )PX = PX PX PX
= PX PX = 0
Comment: I PX is a projection matrix that projects Y onto
the space orthogonal to the space
spanned by the columns of X .
(iv)
(I PX )X = X PX X
= X X=0
176
(v) Partition X as [X1jX2j jXk]
then
177
e = Y Y^ YRn
(I PX )Xj = 0
(vi) Residuals are invariant with respect to the choice of (X T X ) ,
so
e Y X b = (I PX )Y
%
Y^
vector space spanned by the columns of X .
dimension of this space is rank(X ).
The residual vector
e = Y Y~ = (I PX )Y
is the same for any solution
is in the space orthogonal to the space
spanned by the columns of X . It has dimension
n rank(X ):
b = (X T X ) X T Y
to the normal equations
178
179
Partition of a total sum of squares
Squared length of Y
is X y2 = Y Y
n
=1
i
Squared length of
the residual vector
is
T
i
e2 = e e
[(I P )Y] (I P
Y (I P )Y
n
X
=1
i
=
=
ANOVA
T
i
T
X
T
X
)Y
X
Squared length of
^ = P Y is
Y
X
Y^ 2 = Y^ Y^
=1
= (P Y) (P Y)
= Y (P ) P Y since P is symmetric
= Y P P Y since P is idempotent
= Y P Y
X
n
i
T
i
T
X
T
T
T
X
X
X
T
X
X
Source
of
Variation
Degrees
of
Freedom Sums of Squares
model
rank(X ) Y^ T Y^ = YT PX Y
(uncorrected)
residuals
X
X
n-rank(X ) eT e = YT (I PX )Y
X
total
We have
(uncorrected)
n
n 2
YT Y = i=1
yi
X
YT Y = YT (PX + I PX )Y
= YT PX Y + YT (I PX )Y :
181
180
Result 3.6 For the linear model
E (Y) = X and V ar(Y ) = ,
the OLS estimator Y^ = X b = PX Y
for X is
(i) unbiased, i.e., E (Y^ ) = X
(ii) a linear function of Y
(iii) has variance-covariance matrix
V ar(Y^ ) = PX PX
This is true for any solution
b = (X T X ) X T Y
to the normal equations.
182
Proof:
(ii) is trivial, since Y^ = PX Y
(iii) follows from result 2.1.(ii)
(i)
E (Y^ ) = E (PX Y)
= PX E (Y ) from result 2.1.(i)
= PX X
= X since PX X = X
183
Comments:
Y^ = X b = PX Y is said to be
a linear unbiased estimator for
E (Y) = X
For the Gauss-Markov model,
V ar(Y) = 2I and
V ar(Y^ ) = PX (2I )PX
= 2PX PX
= 2 PX
= 2X (X T X ) X T
"
this is sometimes
called the
\hat" matrix.
Questions
Is Y^ = X b = PX Y the \best" estimator for
E (Y) = X?
Is Y^ = X b = PX Y the \best" estimator for
E (Y ) = X in the class of linear, unbiased
estimators?
What other linear functions of , say
cT = c11 + c22 + + ckk;
have OLS estimators that are invariant to
the choice of
b = (X T X ) X T Y
that solves the normal equations?
184
185
Estimable Functions
Some estimates of linear functions of the
parameters have the same value, regardless of which solution to the normal equations is used
These are called estimable functions
An example is E(Y) = X
Check that X b has the same value
for each solution to the normal
equations obtained in Example 3.2,
i.e.,
Y1:
Y1:
X b = YY23::
Y3:
Y3:
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
186
Estimable Functions
Defn 3.6: For a linear model
E (Y) = X and V ar(Y) = we will say that
cT = c11 + c22 + + ckk
is estimable if there exists a T
linear
unbiased estimator a Y for
cT , i.e., for some non-random
vector a, we have E (aT Y) = cT .
187
Example 3.2. Blood coagulation times
Diet 1 Diet 2 Diet 3
Y11 = 62 Y21 = 71 Y31 = 72
Y12 = 60
Y32 = 68
Y33 = 67
The \Eects" model
Yij = + i + ij
can be written as
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y11
1 1 0
Y12
1 1 0
Y21 = 1 0 1
Y31
1 0 0
Y32
1 0 0
Y33
1 0 0
3
2
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
0
0
0
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
4
11
12
1 + 21
2
31
3
32
33
3
7
7
7
7
7
7
7
7
7
7
5
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
Examples of estimable functions
+ 1
Choose aT = ( 12 21 0 0 0 0). Then,
E (aT Y) = E ( 12 Y11 + 21 Y12)
=
=
=
1E (Y ) + 1E (Y )
2 11 2 12
1( + ) + 1( + )
1 2
1
2
+ 1
Choose aT = (1 0 0 0 0 0) and note
that E (aT Y) = E (Y11) = + 1.
189
188
+ 2
Choose aT = (0 0 1 0 0 0): Then,
aT Y = Y21
1 2
Note that
1 2 = ( + 1) ( + 2)
= E (Y11) E (Y21)
= E (Y11 Y21)
= E (aT Y)
and
E (aT Y) = E (Y21) = + 2:
+ 3
Choose aT = (0 0 0 1 0 0): Then,
where
aT = (1 0 -1 0 0 0)
E (aT Y) = E (Y31) = + 3
190
191
Quantities that are not estimable
include
2 + 31 2
; 1; 2; 3; 31; 1 + 2
Note that
2 + 31 2 =
=
=
=
3( + 1) ( + 2)
3E (Y11) E (Y21)
E (3Y11 Y21)
E (aT Y)
where
aT
To show that a linear function of
parameters,
c0 + c11 + c22 + c33
is not estimable, one must show
that there is no non-random vector
a = (a0; a1; a2; a3)T
= (3 0 -1 0 0 0)
for which
E (aT Y) = c0 + c11 + c22 + c33
192
For 1 to be estimable we would
need to nd an a that satises
1 =
E (aT Y)
= a1E (Y11) + a2E (Y12) + a3E (Y21)
+a4(E (Y31) + a5E (Y32) + a6E (Y33)
= (a1 + a2)( + 1) + a3( + 2)
+(a4 + a5 + a6)( + a3)
This implies 0 = a3 = (a4 + a5 + a6),
Then 1 = (a1 + a2)( + 1)
193
Example 3.1. Yield of a chemical
process
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
Y1
1 160
Y2
1 165
Y3 = 1 165
Y4
1 170
Y5
1 175
3
2
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
3
2
1
2
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
4
1
0
2
1 + 3
2
4
5
3
7
7
7
7
7
7
7
7
5
2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
Since X has full column rank, each
element of is estimable.
which is impossible.
194
195
2
6
6
6
6
6
6
6
6
4
0
0
3
7
7
7
7
7
7
7
7
5
Consider B1 = cT where c = 1 .
Since X has full column rank, the
unique least squares estimator for is
b = (X T X ) 1XT Y
and an unbiased linear estimator for
cT is
cT b = cT (X T X ) 1X T Y
call this aT
Result 3.7 For a linear model with
E (Y) = X and V ar(Y ) = (i) The expectation of any observation is estimable.
(ii) A linear combination of estimable functions is estimable.
(iii) Each element of is estimable if
and only if rank(X ) = k = number of columns.
(iv) Every cT is estimable if and
only if rank(X ) = k = number
of columns in X.
197
196
Proof:
Y1
(i) For Y = .. with E (Y) = X
Yn
we have
2
3
6
6
6
6
6
6
4
7
7
7
7
7
7
5
0
0.
.
one in
the ith
Yi = aTi Y where ai = 1
position
0.
.
0
Then
E (Yi) = E (aTi Y) = aTi E (Y)
= aTi X
= cTi 2
3
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
198
(ii) Suppose cTi is estimable. Then, there
is an ai such that E (aTi Y) = cTi .
Now consider a linear combination of
estimable functions
w1cT1 + w2cT2 + + wpcTp Let a = w1a1 + w2a2 + + wpap.
Then,
E (aT Y) = E (w1aT1 Y + + wpaTp Y)
= w1E (aT1 Y) + + wpE (aTp Y)
= w1cT1 + + wpcTp (iii) Previous argument.
(iv) Follows from (ii) and (iii).
199
Result 3.8. For a linear model with
E (Y) = X and V ar(Y) = ,
each of the following is true if and
only if cT is estimable.
(i) cT = aT X for some a i.e., c is in
the space spanned by the rows
of X .
Use Result 3.8. (ii) to show that is not estimable in Example 3.2. In
that case
(ii) cT a = 0 for every a for which
X a = 0.
and
(iii) cT b is the same for any solution to the normal equations
(X T X )b = X T y, i.e., there is
a unique least squares estimator
for cT .
200
Part (ii) of Result 3.8 sometimes
provides a convenient way to identify all possible estimable functions
of .
In example 3.2, X d = 0 if and only
if
2
d=W
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
1 for some scalar W .
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
5
Then, cT is estimable if and only
if
0 = cT d = W (c1 c2 c3 c4) = 0
() c1 = c2 + c3 + c4.
202
1
1
E (Y) = X = 11
1
1
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
1
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1
2
3
3
7
7
7
7
7
7
7
7
7
7
7
7
7
5
= cT = [1 0 0 0]:
Let dT = [1
1
1 1], then
X d = 0; but cT d = 1 6= 0
Hence, is not estimable.
201
Then,
(c2 + c3 + c4) + c21 + c32 + c43
is estimable for any (c2 c3 c4) and
these are the only estimable
functions of ; 1; 2; 3.
Some estimable functions are
+ 13 (1 + 2 + 3) (c2 = c3 = c4 = 13 )
and
+ 2 (c2 = 1 c3 = c4 = 0)
but
+ 22
is not estimable.
203
Defn 3.7: For a linear model with
E (Y) = X and V ar(Y) = ;
where X is an n k matrix,
Crkk1 is said to be estimable if
all of its elements
2
C =
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
cT1
cT2 =
..
cTr
3
2
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
cT1 cT2 ..
cTr 3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
are estimable.
204
Summary
For a linear model
Y = X + with E (Y) = X and V ar(Y) = ,
we have
Any estimable function has a
unique interpretation
The OLS estimator for an estimable function C is unique
C b = C (X T X ) X T Y
The OLS estimator for an
estimable function C is
{ a linear estimator
{ an unbiased estimator
206
Result 3.9 For the linear model
with E (Y) = X and V ar(Y) = ,
where X is an n k matrix, each of
the following conditions hold if and
only if C is estimable.
(i) AX = C for some matrix A, i.e.,
each row of C is in the space
spanned by the rows of X .
(ii) C d = 0 for any d for which
X d = 0.
(iii) C b is the same for any solution to the normal equations
(X T X )b = X T y.
205
In the class of linear unbiased estimators for cT , is the OLS estimator the \best?"
Here \best" means smallest expected squared error. Let t(Y) denote an estimator for cT . Then,
the expected squared error is
MSE = E [t(Y) cT ]2
= E [t(Y) E (t(Y)) + E (t(Y)) cT ]2
= E [t(Y) E (t(Y))]2
+[E (t(Y)) cT ]2
+2[E (t(Y)) cT ]E [t(Y) E (t(Y))]
= E [t(Y) E (t(Y))]2 + [E (t(Y)) cT ]2
= V ar(t(Y)) + [bias]2
207
We are restricting our attention to
linear
unbiased estimators for cT :
E(t(Y)) = cT t(Y) = aT Y for some a
Then, t(Y) = aT Y is the best linear
unbiased estimator (blue) for cT if
V ar(aT Y) V ar(dT Y)
for any d and any value of .
Result 3.10 (Gauss-Markov Theorem)
For the Gauss-Markov model,
E (Y) = X and V ar(Y) = 2I;
the OLS estimator
of an estimable
function cT is the unique best linear unbiased estimator (blue) of
cT .
Proof:
(i) For any solution b = (X T X ) X T Y to
the normal equations, the OLS estimator for cT is
cT b = cT (X T X ) X T y
which is a linear function of Y.
208
(ii) From Result 3.8.(i), there exists a vector a such that cT = aT X . Then
E (cT b) = E (cT (X T X ) X T Y)
= cT (X T X ) X T E (Y)
= cT (X T X ) X T X
= aT X (X T X ) X T X
of X
"
projection P onto the column space
X
= aT X
= cT 209
(iii) Minimum variance in the class of linear
unbiased estimators
Suppose dT Y is any other linear
unbiased estimator for cT . Then
E (dT Y) = dT E (Y) = dT X = cT for every . Hence, dT X = cT and
c = X T d.
We must show that
V ar(cT b) V ar(dT Y):
First, note that
V ar(dT Y) = V ar(cT b + [dT Y cT b])
= V ar(cT b) + V ar(dT Y cT b)
+2Cov(cT b; dT Y cT b)
Hence, cT b is an unbiased estimator.
210
211
Then,
Then
V ar(dT Y) V ar(cT b)
+2Cov(cT b; dT Y
because
Cov(c b; d
T
Y
c b)
T
= Cov (c (X
T
cT b)
= (c (X
T
= [c (X
T
= V ar(cT b)
Cov(cT b; dT Y
T
T
X)
X Y; [d
T
c (X
T
T
T
X)
X ) V ar (Y)[d
T
X)
X ] 2 I [d
T
X) X
c (X
T
T
X(X
T
T
T
"
X)
T
T
]Y)
X) X
T
]
T
c]
This is where the
symmetry of (X X )
is needed.
T
cT b) = 0:
= 2[c (X X ) X
T
To show this rst note that
cT b = cT (X T X ) X T Y
is invariant with respect to the choice of
(X T X ) . Consequently, we can use the
Moore-Penrose generalized inverse which
is symmteric. (Not every generalized inverse of X T X is symmetric.)
T
T
"
since X
d
T
c (X
T
T
X ) X X (X X )
T
T
c]
d=c
Since cT b is invariant to the choice of b
(result 3.8.(iii)), we were able to use the
Moore-Penrose inverse for (X T X ) which
satises
(X T X ) (X T X )(X T X ) = X T X
by denition.
213
212
(iv) To show that the OLS estimator is the
unique blue, note that
V ar(dT Y) = V ar(cT b + [dT Y cT b])
= V ar(cT b) + V ar(dT Y cT b)
because Cov(cT b; dT Y cT b) = 0.
Then,
Cov(cT b; dT Y cT b)
= 2[cT (X T X ) c cT (X T X ) c]
Then, dT Y is blue if and only if
V ar(dT Y cT b) = 0 :
=0
This is equivalent to
Consequently,
V ar(dT Y) V ar(cT b)
and cT b is blue.
dT Y cT b = constant:
Since both estimators are unbiased
E (dT Y cT b) = E (dT Y) E (cT b)
= 0:
Consequently, dT Y cT b = 0 for all Y
and cT b is the unique blue.
214
215
Parts (i) and (ii) of the
proof of result 3.11 do not
require
What if you have a linear model
that is not a Gauss-Markov model?
V ar(Y) = 2I :
E (Y) = X
V ar(Y) = 6= 2I
Consequently, the OLS estimator for cT ,
cT b = cT (X T X ) X T Y
is a linear unbiased estimator.
216
Result 3.8 does not require
V ar(Y) = 2I
and the OLS estimator for
any estimable quantity,
Note that
V ar(cT b) = V ar(cT (X T X ) X T Y)
= cT (X T X ) X T X [(X T X ) ]T c
cT b = cT (X T X ) X T Y ;
is invariant to the choice of
(X T X ) .
The OLS estimator cT b may
not be blue. There may be
other linear unbiased estimators with smaller variance.
217
For the Gauss-Markov model
and
V ar(Y) = = 2I
V ar(cT b) = 2cT (X T X ) X T X [(X T X ) ]T c
= 2cT (X T X ) c
218
Generalized Least Squares (GLS)
Estimation:
The spectral decomposition of yields
Defn 3.8: For a linear model with
n
= j =1
j uj uTj :
E (Y) = X and V ar(Y) = ;
where is positive denite, a generalized least squares estimator for
minimizes
(Y
X bGLS)T 1(Y
X bGLS)
Strategy: Transform Y to a random vector Z for which the GaussMarkov model applies.
X
Dene
n 1
T
1=2 = j =1
j uj uj
X
s
and create the random vector
Z = 1=2Y:
Then
V ar(Z) = V ar( 1=2Y)
= 1=2 1=2 = I
220
219
Note that
and
E (Z) = E ( 1=2Y)
= 1=2E (Y) = 1=2X
= W
and we have a Gauss-Markov model
for Z, where W = 1=2X is the
model matrix.
(Z W b)T (Z W b)
= ( 1=2Y 1=2X b)T ( 1=2Y 1=2X b)
= (Y X b)T 1=2 1=2(Y X b)
= (Y X b)T 1(Y X b)
Hence, any GLS estimator for the
Y model is an OLS estimator for
the Z model.
221
WTWb = WTZ
, (X T 1=2 1=2X )b
= X T 1=2 1=2Y
It must be a solution to the normal
equations for the Z model
WTWb = WTZ
, (X T 1X )b = X T 1Y
These are the generalized least
squares estimating equations.
Any solution
bGLS = (W T W ) W T Z
= (X T 1X ) X T 1Y
is called a generalized least squares
(GLS) estimator for .
222
Result 3.11 For the linear model
with E (Y) = X and V ar(Y) = the GLS estimator of an estimable
function cT ,
cT bGLS = cT (X T 1X ) X T 1Y ;
is the unique blue of cT .
cT Proof: Since
is estimable, there is an
a such that
cT = E (aT Y)
= E (aT 1=2 1=2Y)
= E (aT 1=2Z)
Consequently, cT is estimable for the Z
model. Apply the Gauss-Markov theorem
(result 3.10) to the Z model.
223
Comments
224
For the linear model with
E (Y) = X and V ar(Y) = ;
both the OLS and GLS estimators for an estimable function
cT are linear unbiased estimators.
V ar(cT bOLS) =
cT (X T X ) X T X [(X T X ) ]T c
V ar(cT bGLS) =
cT (X T 1X ) X T 1X (X T 1X ) c
cT bOLS is not a \bad" estimator, but
V ar(cT bOLS) V ar(cT bGLS)
because cT bGLS is the unique blue for
cT .
For the Gauss-Markov model,
cT bGLS = cT bOLS :
The results for bGLS and cT bGLS
assume that V ar(Y) = is
known.
The same results, including Results 3.12, hold for the Aitken
model where E (Y) = X and
V ar(Y) = 2V for some known
matrix V .
225
In practice V ar(Y) = is usually
unknown.
Then an approximation to
bGLS = (X T 1X ) X T 1Y
is obtained by substituting a consistent
estimator ^ for .
{ use method of moments or
maximum likelihood estimation
to obtain ^
{ the resulting estimator
is not a linear estimator
is consistent but not
necessarily unbiased
226
does not provide a blue
for
estimable functions
may have larger mean
squared error than the
OLS estimator
To create condence intervals or
test hypotheses about estimable
functions for a linear model with
E (Y) = X and V ar(Y) = we must
(ii) estimate 2 when
V ar(Y) = 2I
or
(i) specify a probability distribution for Y so we can derive a
distribution for
V ar(Y) = 2V
for some known V .
cT b = cT (X T 1X ) X T 1Y
227
The \eects" linear model and the
\means" linear model are equivalent in the sense that the space of
possible mean vectors
E (Y) = X
(iii) Estimate when
is the same for the two models.
the model matrices dier
the parameter vectors dier
the columns of the model matrices span the same vector
space
V ar(Y) = E (Y) = X
228
229
1
= [X1jX2j jXk] ..
k
= 1 X1 + 2 X2 + + k Xk
2
3
6
6
6
6
6
6
6
6
4
7
7
7
7
7
7
7
7
5
Download