Estimating Estimable Functions of β The Response Depends on

advertisement
The Response Depends on β Only through Xβ
In the Gauss-Markov or Normal Theory Gauss-Markov Linear
Model, the distribution of y depends on β only through Xβ, i.e.,
y ∼ (Xβ, σ 2 I)
Estimating Estimable Functions of β
or
y ∼ N(Xβ, σ 2 I)
If X is not of full column rank, there are infinitely many vectors in
the set {b : Xb = Xβ} for any fixed value of β.
Thus, no matter what the value of E(y), there will be infinitely many
vectors b such that Xb = E(y) when X is not of full column rank.
The response vector y can help us learn about E(y) = Xβ, but
when X is not of full column rank, there is no hope of learning
about β alone unless additional information about β is available.
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
1 / 17
Treatment Effects Model
⎡
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎣
y11
y12
y13
y21
y22
y23
i = 1, 2; j = 1, 2, 3
⎤ ⎡
⎤
μ + τ1
11
⎥ ⎢ μ + τ1 ⎥ ⎢ 12 ⎥
⎥ ⎢
⎥ ⎢
⎥
⎥ ⎢ μ + τ1 ⎥ ⎢ 13 ⎥
⎥=⎢
⎥+⎢
⎥
⎥ ⎢ μ + τ2 ⎥ ⎢ 21 ⎥
⎥ ⎢
⎥ ⎢
⎥
⎦ ⎣ μ + τ2 ⎦ ⎣ 22 ⎦
μ + τ2
23
⎤
⎡
⎡
1 1 0
11
⎥ ⎢ 1 1 0 ⎥⎡
⎤ ⎢ 12
⎥
⎢
⎥ ⎢
⎢
⎥ ⎢ 1 1 0 ⎥ μ
⎥ ⎣ τ1 ⎦ + ⎢ 13
⎥=⎢
⎢ 21
⎥ ⎢ 1 0 1 ⎥
⎥
⎢
⎥ ⎢
⎣ 22
⎦ ⎣ 1 0 1 ⎦ τ2
1 0 1
23
y11
⎢ y12
⎢
⎢ y13
⎢
⎢ y21
⎢
⎣ y22
y23
⎤
⎤
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
2 / 17
Treatment Effects Model (continued)
Researchers randomly assigned a total of six experimental units to two
treatments and measured a response of interest.
yij = μ + τi + ij ,
c
Copyright 2010
Dan Nettleton (Iowa State University)
⎡
In this case, it makes no sense to estimate β = [μ, τ1 , τ2 ] because
there are multiple (infinitely many, in fact) choices of β that define
the same mean for y.
For example,
⎡
⎤ ⎡
⎤ ⎡ ⎤
⎡
⎤
5
μ
0
999
⎣ τ1 ⎦ = ⎣ −1 ⎦ , ⎣ 4 ⎦ , or ⎣ −995 ⎦
τ2
1
6
−993
⎤
all yield same Xβ = E(y).
⎥
⎥
⎥
⎥
⎥
⎥
⎦
When multiple values for β define the same E(y), we say that β is
non-estimable.
Statistics 511
3 / 17
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
4 / 17
Estimable Functions of β
Treatment Effects Model (continued)
⎡
A linear function of β, Cβ, is said to be estimable if there is a
linear function of y, Ay, that is an unbiased estimator of Cβ.
Otherwise, Cβ is said to be non-estimable.
⎢
⎢
⎢
Xβ = ⎢
⎢
⎢
⎣
Note that Ay is an unbiased estimator of Cβ if and only if
E(Ay) = Cβ ∀ β ∈ IRp
⇐⇒
AXβ = Cβ ∀ β ∈ IRp
⇐⇒
AX = C.
This says that we can estimate Cβ as long as Cβ = AXβ = AE(y)
for some A, i.e., as long as Cβ is a linear function of E(y).
The bottom line is that we can always estimate E(y) and all linear
functions of E(y); all other linear functions of β are non-estimable.
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
5 / 17
Estimating Estimable Functions of β
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
1
⎤
⎡
⎥⎡
⎤ ⎢
⎥ μ
⎢
⎥
⎢
⎥ ⎣ τ1 ⎦ = ⎢
⎥
⎢
⎥ τ2
⎢
⎦
⎣
μ + τ1
μ + τ1
μ + τ1
μ + τ2
μ + τ2
μ + τ2
⎤
⎥
⎥
⎥
⎥ =⇒
⎥
⎥
⎦
[1, 0, 0, 0, 0, 0]Xβ
=
[1, 1, 0]β
=
μ + τ1
[0, 0, 0, 1, 0, 0]Xβ
=
[1, 0, 1]β
=
μ + τ2
[1, 0, 0, −1, 0, 0]Xβ
=
[0, 1, −1]β
=
τ1 − τ 2
are estimable functions of β.
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
6 / 17
Invariance of Cβ̂ to the Choice of β̂
If Cβ is estimable, then there exists a matrix A such that C = AX
and Cβ = AXβ = AE(y) for any β ∈ IRp .
Although there are infinitely many solutions to the normal
equations when X is not of full column rank, Cβ̂ is the same for all
normal equation solutions β̂ whenever Cβ is estimable.
To see this, suppose β̂ 1 and β̂ 2 are any two solutions to the
normal equations. Then
It makes sense to estimate Cβ by
= Aŷ = APX y = AX(X X)− X y = AX(X X)− X Xβ̂
AE(y)
Cβ̂ 1 = AXβ̂ 1 = APX Xβ̂ 1
= APX Xβ̂ = AXβ̂ = Cβ̂.
= AX(X X)− X Xβ̂ 1 = AX(X X)− X y
Cβ̂ is called an Ordinary Least Squares (OLS) estimator of Cβ.
Note that although the “hat” is on β, it is Cβ that we are estimating.
= AX(X X)− X Xβ̂ 2 = APX Xβ̂ 2
= AXβ̂ 2 = Cβ̂ 2 .
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
7 / 17
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
8 / 17
Treatment Effects Model (continued)
Treatment Effects Model (continued)
Suppose our aim is to estimate τ1 − τ2 .
As noted before,
⎡
⎢
⎢
⎢
Xβ = ⎢
⎢
⎢
⎣
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
1
⎤
⎡
⎥⎡
⎤ ⎢
⎢
⎥ μ
⎢
⎥
⎥ ⎣ τ1 ⎦ = ⎢
⎢
⎥
⎢
⎥ τ2
⎣
⎦
μ + τ1
μ + τ1
μ + τ1
μ + τ2
μ + τ2
μ + τ2
The normal equations in this case are
⎤
⎡
⎥
⎥
⎥
⎥ =⇒
⎥
⎥
⎦
⎢
⎢
⎢
⎢
⎢
⎢
⎣
[1, 0, 0, −1, 0, 0]Xβ = [0, 1, −1]β = τ1 − τ2 .
Thus, we can compute the OLS estimator of τ1 − τ2 as
1
1
1
0
0
0
0
0
0
1
1
1
⎤ ⎡
⎥
⎥
⎥
⎥
⎥
⎥
⎦
⎢
⎢
⎢
⎢
⎢
⎢
⎣
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
1
⎤
⎡
⎥⎡
⎢
⎤
⎥ b1
⎢
⎥
⎢
⎥ ⎣ b2 ⎦ = ⎢
⎥
⎢
⎥ b3
⎢
⎦
⎣
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
1
⎤ ⎡
⎥
⎥
⎥
⎥
⎥
⎥
⎦
⎢
⎢
⎢
⎢
⎢
⎢
⎣
y11
y12
y13
y21
y22
y23
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
⎡
⎤⎡
⎤
⎡
⎤
6 3 3
b1
y··
⇐⇒ ⎣ 3 3 0 ⎦ ⎣ b2 ⎦ = ⎣ y1· ⎦ .
3 0 3
b3
y2·
[1, 0, 0, −1, 0, 0]ŷ = [0, 1, −1]β̂,
where ŷ = X(X X)− X y and β̂ is any solution to the normal
equations.
c
Copyright 2010
Dan Nettleton (Iowa State University)
1
1
1
1
1
1
Statistics 511
9 / 17
Treatment Effects Model (continued)
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
10 / 17
Treatment Effects Model (continued)
⎡
⎡
⎤
⎤
ȳ··
0
are each solutions to
β̂ 1 ≡ ⎣ ȳ1· − ȳ·· ⎦ and β̂ 2 ≡ ⎣ ȳ1· ⎦ the normal equations
ȳ2· − ȳ··
ȳ2·
because
⎡
⎤
⎡
1/6
0
0
0
0
⎣ 0
⎦ and (X X)− = ⎣ 0 1/3
1/6
−1/6
Let (X X)−
=
1
2
0 −1/6
1/6
0
0
⎡
⎤ ⎡
⎤ ⎡
⎤
⎤⎡
⎤⎡
6 3 3
ȳ··
y··
6 3 3
0
⎣ 3 3 0 ⎦ ⎣ ȳ1· − ȳ·· ⎦ = ⎣ y1· ⎦ = ⎣ 3 3 0 ⎦ ⎣ ȳ1· ⎦ .
ȳ2· − ȳ··
ȳ2·
y2·
3 0 3
3 0 3
−
It is straightforward to verify that (X X)−
1 and (X X)2 are each
generalized inverses of X X.
Thus, the OLS estimator of Cβ = [0, 1, −1]β = τ1 − τ2 is
⎡
⎤
⎡
⎤
ȳ··
0
Cβ̂ 1 = [0, 1, −1] ⎣ ȳ1· − ȳ·· ⎦ = ȳ1· − ȳ2· = [0, 1, −1] ⎣ ȳ1· ⎦ = Cβ̂ 2 .
ȳ2· − ȳ··
ȳ2·
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
⎤
0
0 ⎦.
1/3
11 / 17
− It is also easy to show that β̂ 1 = (X X)−
1 X y and β̂ 2 = (X X)2 X y.
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
12 / 17
Treatment Effects Model (continued)
Treatment Effects Model (continued)
Thus
⎤⎡
0 0 0
⎢
⎢
0 0 0 ⎥
⎥⎢
⎢
⎢
⎢
0 0 0 ⎥
= ŷ = PX y = ⎢
⎥⎢
E(y)
⎢ 0 0 0 1 1 1 ⎥⎢
⎢
3
3
3 ⎥⎢
⎣ 0 0 0 1 1 1 ⎦⎣
3
3
3
0 0 0 13 13 13
⎡
⎡
PX
⎢
⎢
⎢
− = X(X X) X = ⎢
⎢
⎢
⎣
⎡
⎢
⎢
⎢
= ⎢
⎢
⎢
⎣
1
1
1
1
1
1
⎤
1
1
1
0
0
0
0 1/3 0
⎡
0 1/3 0 ⎥
⎥ 1
⎥
0 1/3 0 ⎥ ⎣
1
0 0 1/3 ⎥
⎥ 0
0 0 1/3 ⎦
0 0 1/3
0
0
0
1
1
1
1
1
0
⎡
⎤
⎥⎡
⎤⎢
⎥ 0
0
0 ⎢
⎢
⎥
⎥ ⎣ 0 1/3
0 ⎦⎢
⎢
⎥
⎥ 0
0 1/3 ⎢
⎣
⎦
⎡
1
1
0
1
0
1
1
0
1
1
1
1
1
1
1
1
3
1
3
1
3
⎢
⎢
1
⎢
0 ⎦=⎢
⎢ 0
⎢
1
⎣ 0
0
⎤
1
1
1
0
0
0
1
3
1
3
1
3
0
0
0
c
Copyright 2010
Dan Nettleton (Iowa State University)
0
0
0
1
1
1
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
1
3
1
3
1
3
0
0
0
1
3
1
3
1
3
0
0
0
0
0
0
1
3
1
3
1
3
Statistics 511
0
0
0
1
3
1
3
1
3
⎤
⎥
⎥
⎥
⎥.
⎥
⎥
⎦
13 / 17
Treatment Effects Model (continued)
is our OLS estimator of
⎡
⎢
⎢
⎢
E(y) = Xβ = ⎢
⎢
⎢
⎣
1
3
1
3
1
3
1
3
1
3
1
3
1
1
1
1
1
1
1
1
1
0
0
0
1
3
1
3
1
3
0
0
0
1
1
1
⎤
⎡
⎥⎡
⎤ ⎢
⎢
⎥ μ
⎢
⎥
⎥ ⎣ τ1 ⎦ = ⎢
⎢
⎥
⎢
⎥ τ2
⎣
⎦
y11
y12
y13
y21
y22
y23
⎤
⎡
⎥ ⎢
⎥ ⎢
⎥ ⎢
⎥=⎢
⎥ ⎢
⎥ ⎢
⎦ ⎣
μ + τ1
μ + τ1
μ + τ1
μ + τ2
μ + τ2
μ + τ2
c
Copyright 2010
Dan Nettleton (Iowa State University)
ȳ1·
ȳ1·
ȳ1·
ȳ2·
ȳ2·
ȳ2·
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
⎤
⎥
⎥
⎥
⎥.
⎥
⎥
⎦
Statistics 511
14 / 17
Treatment Effects Model (continued)
Also, we can see that the OLS estimator of
⎡
⎢
⎤
⎢
μ
⎢
= [0, 1, −1] ⎣ τ1 ⎦ = [1, 0, 0, −1, 0, 0] ⎢
⎢
⎢
τ2
⎣
⎡
τ1 − τ2
⎡
⎢
⎢
⎢
= [1, 0, 0, −1, 0, 0] ⎢
⎢
⎢
⎣
c
Copyright 2010
Dan Nettleton (Iowa State University)
μ + τ1
μ + τ1
μ + τ1
μ + τ2
μ + τ2
μ + τ2
⎤
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
1
⎤
⎥⎡
⎤
⎥ μ
⎥
⎥ ⎣ τ1 ⎦
⎥
⎥ τ2
⎦
= [1, 0, 0, −1, 0, 0]ŷ
[1, 0, 0, −1, 0, 0]E(y)
⎡
⎢
⎢
⎢
= [1, 0, 0, −1, 0, 0] ⎢
⎢
⎢
⎣
⎥
⎥
⎥
⎥ = [1, 0, 0, −1, 0, 0]E(y) is
⎥
⎥
⎦
Statistics 511
= ȳ1· − ȳ2·
15 / 17
c
Copyright 2010
Dan Nettleton (Iowa State University)
ȳ1·
ȳ1·
ȳ1·
ȳ2·
ȳ2·
ȳ2·
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
Statistics 511
16 / 17
The Gauss-Markov Theorem
Under the Gauss-Markov Linear Model, the OLS estimator c β̂ of an
estimable linear function c β is the unique Best Linear Unbiased
Estimator (BLUE) in the sense that Var(c β̂) is strictly less than the
variance of any other linear unbiased estimator of c β for all β ∈ IRp
and all σ 2 ∈ IR+ .
The Gauss-Markov Theorem says that if we want to estimate an
estimable linear function c β using a linear estimator that is
unbiased, we should always use the OLS estimator.
In our simple example of the treatment effects model, we could
have used y11 − y21 to estimate τ1 − τ2 . It is easy to see that
y11 − y21 is a linear estimator that is unbiased for τ1 − τ2 , but its
variance is clearly larger than the variance of the OLS estimator
ȳ1· − ȳ2· (as guaranteed by the Gauss-Markov Theorem).
c
Copyright 2010
Dan Nettleton (Iowa State University)
Statistics 511
17 / 17
Download