Document 10639900

advertisement
The Orthogonal Projection Matrix onto C(X)
and the Normal Equations
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
1 / 53
Theorem 2.1:
PX , the orthogonal projection matrix onto C(X), is equal to X(X0 X)− X0 .
The matrix X(X0 X)− X0 satisfies
(a) [X(X0 X)− X0 ][X(X0 X)− X0 ] = [X(X0 X)− X0 ]
(b) X(X0 X)− X0 y ∈ C(X) ∀ y ∈ Rn
(c) X(X0 X)− X0 x = x ∀ x ∈ C(X)
(d) X(X0 X)− X0 = [X(X0 X)− X0 ]0
(e) X(X0 X)− X0 is the same ∀ GI of X0 X.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
2 / 53
It is useful to establish a few results before proving Theorem 2.1.
Show that ifn×n
A is a symmetric matrix and n×n
G is a GI ofn×n
A, then n×n
G0 is also
GI ofn×n
A.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
3 / 53
Result 2.4:
X0 XA = X0 XB ⇐⇒ XA = XB.
One direction is obvious (⇐=).
Can you prove the other (=⇒)?
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
5 / 53
Result 2.5:
Suppose (X0 X)− is any GI of X0 X. Then (X0 X)− X0 is a GI of X, i.e.,
X(X0 X)− X0 X = X.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
8 / 53
Corollary to Result 2.5:
For (X0 X)− any GI of X0 X,
X0 X(X0 X)− X0 = X0 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
10 / 53
Now we can prove Theorem 2.1:
First show X(X0 X)− X0 is idempotent.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
13 / 53
Now show that
X(X0 X)− X0 y ∈ C(X) ∀ y ∈ Rn .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
15 / 53
Now show that
X(X0 X)− X0 z = z ∀ z ∈ C(X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
17 / 53
Now we know X(X0 X)− X0 is a projection matrix onto C(X).
To be the unique projection matrix onto C(X), X(X0 X)− X0 must be
symmetric.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
19 / 53
To show symmetry of X(X0 X)− X0 , it will help to first show that
− 0
0
0
X(X0 X)−
1 X = X(X X)2 X
−
0
for any two GIs of X0 X denoted (X0 X)−
1 and (X X)2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
20 / 53
Now show that X(X0 X)− X0 is symmetric.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
22 / 53
We have shown that X(X0 X)− X0 is the orthogonal projection matrix
onto C(X).
We know that it is the only symmetric projection matrix onto C(X) by
Result A.16.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
24 / 53
Example:
Find the orthogonal projection matrix onto C( 1 ).
n×1
Also, simplify PX y in this case.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
25 / 53
By Corollary A.4 to Result A.16, we know that
I − PX = I − X(X0 X)− X0
is the orthogonal projection matrix onto C(X)⊥ = N (X0 ).
This is Result 2.6.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
29 / 53
By Result A.4, any y ∈ Rn may be written as
y = s + t,
where s ∈ C(X) and t ∈ C(X)⊥ = N (X0 ).
Furthermore, the vectors s ∈ C(X) and t ∈ C(X)⊥ are unique.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
30 / 53
Because
y = PX y + (I − PX )y
PX y ∈ C(X) and (I − PX )y ∈ C(X)⊥ ,
we get s = PX y and t = (I − PX )y.
Moreover,
ŷ ≡ PX y = the vector of fitted values
ε̂ ≡ (I − PX )y = y − ŷ = the vector of residuals.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
31 / 53
Let PW and PX denote the orthogonal projection matrices onto C(W)
and C(X), respectively.
Suppose C(W) ⊆ C(X). Show that
PW PX = PX PW = PW .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
32 / 53
Theorem 2.2:
If C(W) ⊆ C(X), then PX − PW is the orthogonal projection matrix onto
C((I − PW )X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
35 / 53
We have previously seen that
Q(b) = (y − Xb)0 (y − Xb)
= ky − Xbk2 ≥ ky − PX yk2
∀ b ∈ Rp with equality iff Xb = PX y.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
40 / 53
Now we know that PX = X(X0 X)− X0 .
Thus, β̂ minimizes Q(b) iff Xβ̂ = X(X0 X)− X0 y.
By Result 2.4, this equation is equivalent to
X0 Xβ̂ = X0 X(X0 X)− X0 y.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
41 / 53
Because X0 X(X0 X)− X0 = X0 , X0 Xβ̂ = X0 X(X0 X)− X0 y is equivalent to
X0 Xβ̂ = X0 y.
This system of linear equations is known as the Normal Equations
(NE).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
42 / 53
We have established Result 2.3:
β̂ is a solution to the NE (X0 Xb = X0 y) iff β̂ minimizes Q(b).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
43 / 53
Corollary 2.1:
The NE are consistent.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
44 / 53
By Result A.13, β̂ is a solution to X0 Xb = X0 y iff
β̂ = (X0 X)− X0 y + [I − (X0 X)− X0 X]z
for some z ∈ Rp .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
46 / 53
Corollary 2.3:
Xβ̂ is invariant to the choice of a solution β̂ to the NE, i.e., if β̂ 1 and β̂ 2
are any two solutions to the NE, then Xβ̂ 1 = Xβ̂ 2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
47 / 53
We will finish this set of notes with some other results from Section 2.2
of the text.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
49 / 53
Lemma 2.1: N (X0 X) = N (X).
Result 2.2: C(X0 X) = C(X0 ).
Corollary 2.2: rank(X0 X) = rank(X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
50 / 53
Download