Document 10639913

advertisement
The Gauss-Markov Model
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
1 / 61
Recall that
Cov(u, v) = E((u − E(u))(v − E(v)))
= E(uv) − E(u)E(v)
Var(u)
= Cov(u, u)
= E(u − E(u))2
= E(u2 ) − (E(u))2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
2 / 61
If u and v are random vectors, then
m×1
n×1
Cov(u, v) = [Cov(ui , vj )]m×n
Var(u)
c
Copyright 2012
Dan Nettleton (Iowa State University)
and
= [Cov(ui , uj )]m×m .
Statistics 611
3 / 61
It follows that
Cov(u, v) = E((u − E(u))(v − E(v))0 )
= E(uv0 ) − E(u)E(v0 )
Var(u)
= E((u − E(u))(u − E(u))0 )
= E(uu0 ) − E(u)E(u0 ).
(Note that if Z = [zij ], then E(Z) ≡ [E(zij )]; i.e., the expected value of a
matrix is defined to be the matrix of the expected values of the
elements in the original matrix.)
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
4 / 61
From these basis definitions, it is straightforward to show the following:
If a, b, A, B are fixed and u, v are random, then
Cov(a + Au, b + Bv) = ACov(u, v)B0 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
5 / 61
Some common special cases are
Cov(Au, Bv) = ACov(u, v)B0
Cov(Au, Bu) = ACov(u, u)B0
= AVar(u)B0
Var(Au)
c
Copyright 2012
Dan Nettleton (Iowa State University)
= Cov(Au, Au)
= ACov(u, u)A0
= AVar(u)A0 .
Statistics 611
6 / 61
The Gauss-Markov Model
y = Xβ + ε,
where
E(ε) = 0
and Var(ε) = σ 2 I for some unknown σ 2 .
“Mean zero, constant variance, uncorrelated errors.”
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
7 / 61
More succinct statement of Gauss-Markov Model:
E(y) = Xβ,
Var(y) = σ 2 I
or
E(y) ∈ C(X),
c
Copyright 2012
Dan Nettleton (Iowa State University)
Var(y) = σ 2 I.
Statistics 611
8 / 61
Note that the Gauss-Markov Model (GMM) is a special case of the
GLM that we have been studying.
Hence, all previous results for the GLM apply to the GMM.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
9 / 61
Suppose c0 β is an estimable function.
Find Var(c0 β̂), the variance of the LSE of c0 β.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
10 / 61
Var(c0 β̂) = Var(c0 (X0 X)− X0 y)
c
Copyright 2012
Dan Nettleton (Iowa State University)
= Var(a0 X(X0 X)− X0 y)
= Var(a0 PX y)
= a0 PX Var(y)P0X a
= a0 PX (σ 2 I)PX a
= σ 2 a0 PX PX a
= σ 2 a0 PX a
= σ 2 a0 X(X0 X)− X0 a
= σ 2 c0 (X0 X)− c.
Statistics 611
11 / 61
Example: Two-treatment ANCOVA
yij = µ + τi + γxij + εij
i = 1, 2; j = 1, . . . , m
∀ i = 1, 2; j = 1, . . . , m

0
if (i, j) 6= (s, t)
Cov(εij , εst ) =
σ 2 if (i, j) = (s, t).
E(εij ) = 0
xij is the known value of a covariate for treatment i and observation j
(i = 1, 2; j = 1, . . . , m).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
12 / 61
Under what conditions is γ estimable?
γ is estimable ⇐⇒ · · ·?
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
13 / 61
γ is estimable iff x11 , . . . , x1m are not all equal, or
x21 , . . . , x2m are not all equal, i.e.,


x11
 . 
 . 
 . 


 
x 
1
0
 1m 
/ C m×1 m×1 .
x= ∈
 x21 
0 1
 
m×1 m×1
 . 
 .. 
 
x2m
This makes sense intuitively, but to prove the claim formally...
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
14 / 61

"
X=
1 1 0 x1
#
1 0 1 x2
,
where

xi1
 . 
. 
xi = 
 . 
xim
i = 1, 2.
 
µ
 
τ 
 1
β =  .
τ2 
 
γ
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
15 / 61
If x =
" #
x1
x2
∈C
"
#!
1 0
0 1
X=
, then
"
#
1 1 0 a1
1 0 1 b1
c
Copyright 2012
Dan Nettleton (Iowa State University)
for some
a, b ∈ R.
Statistics 611
16 / 61
z ∈ N (X) ⇐⇒


z1 + z2 + az4
=0
z + z + bz
1
3
4
=0

0
 
−a
 
⇒   ∈ N (X).
−b
 
1
Now note that

h
0

 
i −a
 
0 0 0 1   = 1 6= 0.
−b
 
1
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
17 / 61
It follows that
 
µ
 
h
i τ 
 1
0 0 0 1  =γ
τ2 
 
γ
is not estimable.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
18 / 61
This proves that if γ is estimable, then x ∈
/C
Conversely, if x ∈
/C
"
#!
1 0
0 1
c
Copyright 2012
Dan Nettleton (Iowa State University)
"
#!
1 0
0 1
.
, then ∃ xij 6= xij0 for some i and j 6= j0 .
Statistics 611
19 / 61
Taking the difference of the corresponding rows of X yields
h
i
0 0 0 xij − xij0 .
Dividing by xij − xij0 , yields
h
i
0 0 0 1 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
20 / 61
Thus,
 
0
 
0
 
  ∈ C(X0 ),
0
 
1
and
 
µ
 
h
i τ 
 1
0 0 0 1   = γ is estimable.
τ2 
 
γ
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
21 / 61
"
To find the LSE of γ, recall X =
1 1 0 x1
1 0 1 x2
#
. Thus


2m m m x··


m m 0 x 
1· 

0
XX=
.
 m 0 m x2· 


x·· x1· x2· x0 x
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
22 / 61
When x is not in C
"
#!
1 0
0 1
, rank(X) = rank(X0 X) = 3. Thus, a GI of
X0 X is


0 0


1×3
,
0 A
where
3×1
c
Copyright 2012
Dan Nettleton (Iowa State University)
−1
m
0
x1·

A=
0
m

x2· 

.
x1· x2· x0 x
Statistics 611
23 / 61


m
0
x1·

Because 
0
m

x2· 
 is not so easy to invert, let’s consider a
x1· x2· x0 x
different strategy.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
24 / 61
To find LSE of γ and its variance in an alternative way, consider
"
W=
1 0 x1 − x̄1· 1
0 1 x2 − x̄2· 1
#
.
This matrix arises by dropping the first column of X and applying GS
orthogonalization to remaining columns.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
25 / 61
X

0
"
#

1 1 0 x1 1

1 0 1 x2 
0
0
c
Copyright 2012
Dan Nettleton (Iowa State University)
=
T
0
0
W

 "
#
1 0 x1 − x̄1· 1
0 −x̄1· 

.
=
0 1 x2 − x̄2· 1
1 −x̄2· 

0
1
Statistics 611
26 / 61
"


# 1 1 0 x̄1·
"
#

1 0 x1 − x̄1· 1 
1
1
0
x
1
1 0 1 x̄2·  =

0 1 x2 − x̄2· 1 
1 0 1 x2
0 0 0 1
W
c
Copyright 2012
Dan Nettleton (Iowa State University)
S
=
X
Statistics 611
27 / 61
"
0
WW=
1 0 x1 − x̄1· 1
#0 "
#
1 0 x1 − x̄1· 1
0 1 x2 − x̄2· 1
0 1 x2 − x̄2· 1


m
0
10 x1 − x̄1· 10 1


.
=
0
m
10 x2 − x̄2· 10 1


P
P
2
m
0
0
0
0
2
1 x1 − x̄1· 1 1 1 x2 − x̄2· 1 1
i=1
j=1 (xij − x̄i· )
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
28 / 61


m 0
0



=
0
0 m

P2 Pm
2
0 0
i=1
j=1 (xij − x̄i· )
1
0
0
m

1
0
(W 0 W)−1 = 
0 m
0 0 P2 Pm 1
i=1
c
Copyright 2012
Dan Nettleton (Iowa State University)


.

2
j=1 (xij −x̄i· )
Statistics 611
29 / 61
"
0
Wy=
1 0 x1 − x̄1· 1
#0 " #
y1
0 1 x2 − x̄2· 1
y2


y1·


.
=
y2·


P2 Pm
i=1
j=1 yij (xij − x̄i· )
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
30 / 61
α̂ = (W 0 W)− W 0 y


ȳ1·




=  P P ȳ2·
.
m
 2
yij (xij −x̄i· ) 
c
Copyright 2012
Dan Nettleton (Iowa State University)
i=1
P2
i=1
j=1
Pm
2
j=1 (xij −x̄i· )
Statistics 611
31 / 61
Recall that
E(y) = Xβ = Wα
= WSβ = XTα.
c0 β estimable ⇒ c0 Tα is estimable with LSE c0 T α̂.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
32 / 61
Thus, LSE of β is
c0
T
α̂




0 0
0
ȳ1·



h
i 1 0 −x̄  
1·  


  P P ȳ2·

0 0 0 1 
m
0 1 −x̄2·   2

 Pi=1 Pj=1 yij (xij −x̄i· )

2
m
2
i=1
j=1 (xij −x̄i· )
0 0
1
P2 Pm
i=1
j=1 yij (xij − x̄i· )
= P2 Pm
.
2
i=1
j=1 (xij − x̄i· )
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
33 / 61
Note that in this case
h
i
c0 β̂ = 0 0 1 α̂ = α̂3 .
Var(c0 β̂) = Var
h
i 0 0 1 α̂
 
0
h
i
 
0
−1  
2
= σ 0 0 1 (W W) 0
1
= σ 2 P2
c
Copyright 2012
Dan Nettleton (Iowa State University)
i=1
1
Pm
j=1 (xij
− x̄i· )2
.
Statistics 611
34 / 61
Theorem 4.1 (Gauss-Markov Theorem):
Suppose the GMM holds. If c0 β is estimable, then the LSE c0 β̂ is the
best (minimum variance) linear unbiased estimator (BLUE) of c0 β.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
35 / 61
Proof of Theorem 4.1:
c0 β is estimable ⇒ c0 = a0 X for some vector a.
The LSE of c0 β is
c0 β̂ = a0 Xβ̂
= a0 X(X0 X)− X0 y
= a0 PX y.
Now suppose u + v0 y is any other linear unbiased estimator of c0 β.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
36 / 61
Then
E(u + v0 y) = c0 β
∀ β ∈ Rp
⇐⇒ u + v0 Xβ = c0 β
⇐⇒ u = 0
c
Copyright 2012
Dan Nettleton (Iowa State University)
∀ β ∈ Rp
and v0 X = c0 .
Statistics 611
37 / 61
Var(u + v0 y) = Var(v0 y)
= Var(v0 y − c0 β̂ + c0 β̂)
= Var(v0 y − a0 PX y + c0 β̂)
= Var((v0 − a0 PX )y + c0 β̂)
= Var(c0 β̂) + Var((v0 − a0 PX )y)
c
Copyright 2012
Dan Nettleton (Iowa State University)
+ 2Cov((v0 − a0 PX )y, c0 β̂).
Statistics 611
38 / 61
Now
Cov((v0 − a0 PX )y, c0 β̂) = Cov((v0 − a0 PX )y, a0 PX y)
c
Copyright 2012
Dan Nettleton (Iowa State University)
= (v0 − a0 PX )Var(y)PX a
= σ 2 (v0 − a0 PX )PX a
= σ 2 (v0 − a0 PX )X(X0 X)− X0 a
= σ 2 (v0 X − a0 PX X)(X0 X)− X0 a
= σ 2 (v0 X − a0 X)(X0 X)− X0 a
= 0 ∵ v0 X = c0 = a0 X.
Statistics 611
39 / 61
∴ we have
Var(u + v0 y) = Var(c0 β̂) + Var((v0 − a0 PX )y).
It follows that
Var(c0 β̂) ≤ Var(u + v0 y)
with equality iff
Var((v0 − a0 PX )y) = 0;
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
40 / 61
i.e., iff
σ 2 (v0 − a0 PX )(v − PX a) = σ 2 (v − PX a)0 (v − PX a)
= σ 2 kv − PX ak2
= 0;
i.e., iff
v = PX a;
i.e., iff
u + v0 y is a0 PX y = c0 β̂, the LSE of c0 β.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
41 / 61
Result 4.1:
The BLUE of an estimable c0 β is uncorrelated with all linear unbiased
estimators of zero.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
42 / 61
Proof of Result 4.1:
Suppose u + v0 y is a linear unbiased estimator of zero. Then
E(u + v0 y) = 0
∀ β ∈ Rp
⇐⇒ u + v0 Xβ = 0
∀ β ∈ Rp
⇐⇒ u = 0
and v0 X = 00
⇐⇒ u = 0
and X0 v = 0.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
43 / 61
Thus,
Cov(c0 β̂, u + v0 y) = Cov(c0 (X0 X)− X0 y, v0 y)
c
Copyright 2012
Dan Nettleton (Iowa State University)
= c0 (X0 X)− X0 Var(y)v
= c0 (X0 X)− X0 (σ 2 I)v
= σ 2 c0 (X0 X)− X0 v
= σ 2 c0 (X0 X)− 0
= 0.
Statistics 611
44 / 61
Suppose c01 β, . . . , c0q β are q estimable functions 3 c1 , . . . , cq are LI.
 
c01
.
.
Let C = 
C ) = q.
 . . Note that rank(q×p
c0q
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
45 / 61


c01 β
 . 
. 
Cβ = 
 .  is a vector of estimable functions.
c0q β
The vector of BLUEs is the vector of LSEs


c01 β̂
 . 
 ..  = Cβ̂.


0
cq β̂
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
46 / 61
Because c01 β, . . . , c0q β are estimable, ∃
ai 3 X0 ai = ci
∀ i = 1, . . . , q.
 
a01
.
.
If we let A = 
 . , then AX = C.
a0q
Find Var(Cβ̂).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
47 / 61
Var(Cβ̂) = Var(AXβ̂)
c
Copyright 2012
Dan Nettleton (Iowa State University)
= Var(AX(X0 X)− X0 y)
= Var(APX y)
= σ 2 APX (APX )0
= σ 2 APX A0
= σ 2 AX(X0 X)− X0 A0
= σ 2 C(X0 X)− C0 .
Statistics 611
48 / 61
Find rank(Var(Cβ̂)).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
49 / 61
q = rank(C) = rank(AX)
= rank(APX X) ≤ rank(APX )
= rank(P0X A0 ) = rank(APX P0X A0 )
= rank(APX PX A0 ) = rank(APX A0 )
= rank(AX(X0 X)− X0 A0 )
= rank(C(X0 X)− C0 )
≤ rank(C) = q. ∴ rank(σ 2 C(X0 X)− C0 ) = q.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
50 / 61
Note
Var(Cβ̂) = σ 2 C(X0 X)− C0
is a q × q matrix of rank q and is thus nonsingular.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
51 / 61
We know that each component of Cβ̂ is the BLUE of each
corresponding component of Cβ; i.e., c0i β̂ is the BLUE of
c0i β
∀ i = 1, . . . , q.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
52 / 61
Likewise, we can show that Cβ̂ is the BLUE of Cβ in the sense that
Var(s + Ty) − Var(Cβ̂)
is nonnegative definite for all unbiased linear estimators s + Ty of Cβ.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
53 / 61
Nonnegative Definite and Positive Definite Matrices
A symmetric matrixn×n
A is nonnegative definite (NND) if and only if
x0 Ax ≥ 0 ∀ x ∈ Rn .
A symmetric matrixn×n
A is positive definite (PD) if and only if
x0 Ax > 0 ∀ x ∈ Rn \ {0}.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
54 / 61
Proof that Var(s + Ty) − Var(Cβ̂) is NND:
s + Ty is a linear unbiased estimator of Cβ
⇐⇒ E(s + Ty) = Cβ
⇐⇒ s + TXβ = Cβ
⇐⇒ s = 0
∀ β ∈ Rp
∀ β ∈ Rp
and TX = C.
Thus, we may write any linear unbiased estimators of Cβ as Ty where
TX = C.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
55 / 61
Now ∀ w ∈ Rq consider
w0 [Var(Ty) − Var(Cβ̂)]w = w0 Var(Ty)w − w0 Var(Cβ̂)w
= Var(w0 Ty) − Var(w0 Cβ̂)
≥ 0 by Gauss-Markov Theorem because . . .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
56 / 61
(i) w0 Cβ is an estimable function:
w0 C = w0 AX ⇒ X0 A0 w = C0 w
⇒ C0 w ∈ C(X0 ).
(ii) w0 Ty is a linear unbiased estimator of w0 Cβ :
E(w0 Ty) = w0 TXβ = w0 Cβ
∀ β ∈ Rp .
(iii) w0 Cβ̂ is the LSE of w0 Cβ and is thus the BLUE of w0 Cβ.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
57 / 61
We have shown
w0 [Var(Ty) − Var(Cβ̂)]w ≥ 0,
∀ w ∈ Rp .
Thus,
Var(Ty) − Var(Cβ̂)
is nonnegative definite (NND).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
58 / 61
Is it true that
Var(Ty) − Var(Cβ̂)
is positive definite if Ty is a linear unbiased estimator of Cβ that is not
the BLUE of Cβ̂?
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
59 / 61
No. Ty could have some components equal to Cβ̂ and others not.
Then
∃ w 6= 0 3 w0 [Var(Ty) − Var(Cβ̂)]w = 0.
For example, suppose first row of T is c01 (X0 X)− X0 but the second row
is not c02 (X0 X)− X0 .
Then Ty 6= Cβ̂ but
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
60 / 61
w0 [Var(Ty) − Var(Cβ̂)]w
= Var(w0 Ty) − Var(w0 Cβ̂)
=0
for w0 = [1, 0, . . . , 0].
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
61 / 61
Download