Document 10639899

advertisement
The Orthogonal Projection Matrix onto C(X)
and the Normal Equations
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
1 / 53
Theorem 2.1:
PX , the orthogonal projection matrix onto C(X), is equal to X(X0 X)− X0 .
The matrix X(X0 X)− X0 satisfies
(a) [X(X0 X)− X0 ][X(X0 X)− X0 ] = [X(X0 X)− X0 ]
(b) X(X0 X)− X0 y ∈ C(X) ∀ y ∈ Rn
(c) X(X0 X)− X0 x = x ∀ x ∈ C(X)
(d) X(X0 X)− X0 = [X(X0 X)− X0 ]0
(e) X(X0 X)− X0 is the same ∀ GI of X0 X.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
2 / 53
It is useful to establish a few results before proving Theorem 2.1.
Show that ifn×n
A is a symmetric matrix and n×n
G is a GI ofn×n
A, then n×n
G0 is also
GI ofn×n
A.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
3 / 53
Proof:
AGA = A ⇒ (AGA)0 = A0
⇒ A0 G0 A0 = A0
⇒ AG0 A = A (∵ A = A0 )
∴ G0 is a GI of A.
c
Copyright 2012
Dan Nettleton (Iowa State University)
.
Statistics 611
4 / 53
Result 2.4:
X0 XA = X0 XB ⇐⇒ XA = XB.
One direction is obvious (⇐=).
Can you prove the other (=⇒)?
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
5 / 53
Proof (=⇒):
(XA − XB)0 (XA − XB) = (A0 X0 − B0 X0 )(XA − XB)
= A0 X0 XA − A0 X0 XB − B0 X0 XA + B0 X0 XB
= A0 X0 XA − A0 X0 XA − B0 X0 XA + B0 X0 XA
= 0.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
6 / 53
∴ by Lemma A1,
XA − XB = 0 ⇒ XA = XB.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
7 / 53
Result 2.5:
Suppose (X0 X)− is any GI of X0 X. Then (X0 X)− X0 is a GI of X, i.e.,
X(X0 X)− X0 X = X.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
8 / 53
Proof of Result 2.5
Since (X0 X)− is a GI of X0 X,
X0 X(X0 X)− X0 X = X0 X.
By Result 2.4, the result follows.
(Take A = (X0 X)− X0 X and B = I. Then
XA = X(X0 X)− X0 X = X = XB.)
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
9 / 53
Corollary to Result 2.5:
For (X0 X)− any GI of X0 X,
X0 X(X0 X)− X0 = X0 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
10 / 53
Proof of Corollary:
Suppose (X0 X)− is any GI of X0 X.
∵ X0 X is symmetric, [(X0 X)− ]0 is also a GI of X0 X.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
11 / 53
Thus Result 2.5 implies that
X[(X0 X)− ]0 X0 X = X
0
⇒ X[(X0 X)− ]0 X0 X = [X]0
⇒ X0 X(X0 X)− X0 = X0 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
12 / 53
Now we can prove Theorem 2.1:
First show X(X0 X)− X0 is idempotent.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
13 / 53
X(X0 X)− X0 X(X0 X)− X0 = X(X0 X)− X0 by Result 2.5.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
14 / 53
Now show that
X(X0 X)− X0 y ∈ C(X) ∀ y ∈ Rn .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
15 / 53
X(X0 X)− X0 y = Xz, where z = (X0 X)− X0 y.
Thus, X(X0 X)− X0 y ∈ C(X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
16 / 53
Now show that
X(X0 X)− X0 z = z ∀ z ∈ C(X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
17 / 53
If z ∈ C(X), ∃ c 3 z = Xc. ∴
X(X0 X)− X0 z = X(X0 X)− X0 Xc
= Xc (By Result 2.5)
= z.
Thus, we have X(X0 X)− X0 z = z ∀ z ∈ C(X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
18 / 53
Now we know X(X0 X)− X0 is a projection matrix onto C(X).
To be the unique projection matrix onto C(X), X(X0 X)− X0 must be
symmetric.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
19 / 53
To show symmetry of X(X0 X)− X0 , it will help to first show that
− 0
0
0
X(X0 X)−
1 X = X(X X)2 X
−
0
for any two GIs of X0 X denoted (X0 X)−
1 and (X X)2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
20 / 53
By the Corollary to Result 2.5
0
X0 = X0 X(X0 X)−
2X.
Therefore,
− 0
− 0
0
0
0
X(X0 X)−
1 X = X(X X)1 X X(X X)2 X
=
0
X(X0 X)−
2 X by Result 2.5.
∴ X(X0 X)− X0 is the same regardless of which GI for X0 X is used.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
21 / 53
Now show that X(X0 X)− X0 is symmetric.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
22 / 53
[X(X0 X)− X0 ]0 = X[(X0 X)− ]0 X0
= X(X0 X)− X0
∵ symmetry of X0 X implies that [(X0 X)− ]0 is a GI of X0 X, and X(X0 X)− X0
is the same regardless of choice of GI.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
23 / 53
We have shown that X(X0 X)− X0 is the orthogonal projection matrix
onto C(X).
We know that it is the only symmetric projection matrix onto C(X) by
Result A.16.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
24 / 53
Example:
Find the orthogonal projection matrix onto C( 1 ).
n×1
Also, simplify PX y in this case.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
25 / 53
X0 X = 10 1 = n. Thus (X0 X)− = n1 .
X(X0 X)− X0 = 1
c
Copyright 2012
Dan Nettleton (Iowa State University)
1 0
1
n
1
= 110 .
n
Statistics 611
26 / 53
Thus, when X = 1, PX is an n × n matrix whose entries are each 1n .
"
We saw previously the special case
c
Copyright 2012
Dan Nettleton (Iowa State University)
1/2 1/2
#
.
1/2 1/2
Statistics 611
27 / 53
1
PX y = 110 y
n
n
n
1X
1 X
= 1
yi = 1
yi
n
n
i=1
i=1
 
ȳ
 
ȳ
 
= 1ȳ =  . .
 ..
 
ȳ
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
28 / 53
By Corollary A.4 to Result A.16, we know that
I − PX = I − X(X0 X)− X0
is the orthogonal projection matrix onto C(X)⊥ = N (X0 ).
This is Result 2.6.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
29 / 53
Fitted Values and Residuals
By Result A.4, any y ∈ Rn may be written as
y = s + t,
where s ∈ C(X) and t ∈ C(X)⊥ = N (X0 ).
Furthermore, the vectors s ∈ C(X) and t ∈ C(X)⊥ are unique.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
30 / 53
Fitted Values and Residuals
Because
y = PX y + (I − PX )y
PX y ∈ C(X) and (I − PX )y ∈ C(X)⊥ ,
we get s = PX y and t = (I − PX )y.
Moreover,
ŷ ≡ PX y = the vector of fitted values
ε̂ ≡ (I − PX )y = y − ŷ = the vector of residuals.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
31 / 53
Let PW and PX denote the orthogonal projection matrices onto C(W)
and C(X), respectively.
Suppose C(W) ⊆ C(X). Show that
PW PX = PX PW = PW .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
32 / 53
Proof:
C(W) ⊆ C(X) ⇒ ∃ B 3 XB = W.
∴ PX PW = X(X0 X)− X0 W(W 0 W)− W 0
= X(X0 X)− X0 XB(W 0 W)− W 0
=
XB(W 0 W)− W 0
=
W(W 0 W)− W 0
=
PW .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
33 / 53
Now
PX PW = PW ⇒ (PX PW )0 = P0W
⇒ P0W P0X = P0W
⇒ PW PX = PW .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
34 / 53
Theorem 2.2:
If C(W) ⊆ C(X), then PX − PW is the orthogonal projection matrix onto
C((I − PW )X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
35 / 53
Proof of Theorem 2.2:
(PX − PW )0 = P0X − P0W = PX − PW .
∴ PX − PW is symmetric.
(PX − PW )(PX − PW ) = PX PX − PW PX − PX PW + PW PW
= PX − PW − PW + PW
= PX − PW .
∴ PX − PW is idempotent.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
36 / 53
Is (PX − PW )y ∈ C((I − PW )X) ∀ y?
(PX − PW )y = (PX − PW PX )y
= (I − PW )PX y
= (I − PW )X(X0 X)− X0 y
∈ C((I − PW )X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
37 / 53
Is (PX − PW )z = z
z ∈ C((I − PW )X)?
(PX − PW )(I − PW )X = PX (I − PW )X − PW (I − PW )X
= (PX − PX PW )X − (PW − PW PW )X
= (PX − PW )X − (PW − PW )X
= (PX − PW PX )X
= (I − PW )PX X
= (I − PW )X.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
38 / 53
Now z ∈ C((I − PW )X).
⇒ z = (I − PW )Xc for some c. Therefore,
(PX − PW )z = (PX − PW )(I − PW )Xc
= (I − PW )Xc
= z.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
39 / 53
We have previously seen that
Q(b) = (y − Xb)0 (y − Xb)
= ky − Xbk2 ≥ ky − PX yk2
∀ b ∈ Rp with equality iff Xb = PX y.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
40 / 53
Now we know that PX = X(X0 X)− X0 .
Thus, β̂ minimizes Q(b) iff Xβ̂ = X(X0 X)− X0 y.
By Result 2.4, this equation is equivalent to
X0 Xβ̂ = X0 X(X0 X)− X0 y.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
41 / 53
Because X0 X(X0 X)− X0 = X0 , X0 Xβ̂ = X0 X(X0 X)− X0 y is equivalent to
X0 Xβ̂ = X0 y.
This system of linear equations is known as the Normal Equations
(NE).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
42 / 53
We have established Result 2.3:
β̂ is a solution to the NE (X0 Xb = X0 y) iff β̂ minimizes Q(b).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
43 / 53
Corollary 2.1:
The NE are consistent.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
44 / 53
Proof of Corollary 2.1:
NE are X0 Xb = X0 y.
If we take β̂ = (X0 X)− X0 y, then
X0 Xβ̂ = X0 X(X0 X)− X0 y
c
Copyright 2012
Dan Nettleton (Iowa State University)
= X0 y.
Statistics 611
45 / 53
By Result A.13, β̂ is a solution to X0 Xb = X0 y iff
β̂ = (X0 X)− X0 y + [I − (X0 X)− X0 X]z
for some z ∈ Rp .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
46 / 53
Corollary 2.3:
Xβ̂ is invariant to the choice of a solution β̂ to the NE, i.e., if β̂ 1 and β̂ 2
are any two solutions to the NE, then Xβ̂ 1 = Xβ̂ 2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
47 / 53
Proof of Corollary 2.3:
X0 Xβ̂ 1 = X0 Xβ̂ 2 (= X0 y)
⇒ Xβ̂ 1 = Xβ̂ 2 by Result 2.4.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
48 / 53
We will finish this set of notes with some other results from Section 2.2
of the text.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
49 / 53
Lemma 2.1: N (X0 X) = N (X).
Result 2.2: C(X0 X) = C(X0 ).
Corollary 2.2: rank(X0 X) = rank(X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
50 / 53
Proof of Lemma 2.1:
Xc = 0 ⇒ X0 Xc = 0
∴ N (X) ⊆ N (X0 X).
X0 Xc = 0 ⇒ c0 X0 Xc = 0
⇒ Xc = 0
∴ N (X0 X) ⊆ N (X).
Thus, N (X0 X) = N (X).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
51 / 53
Proof of Result 2.2:
X0 Xc = X0 (Xc) ⇒ C(X0 X) ⊆ C(X0 ).
X0 c = X0 PX c = X0 X(X0 X)− X0 c
= X0 X[(X0 X)− X0 c]
⇒ C(X0 ) ⊆ C(X0 X).
Thus, C(X0 X) = C(X0 ).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
52 / 53
Proof of Corollary 2.2:
C(X0 X) = C(X0 )
⇒ dim(C(X0 X)) = dim(C(X0 ))
⇒ rank(X0 X) = rank(X0 )
⇒ rank(X0 X) = rank(X).
(Can facilitate finding (X0 X)− needed for β̂ and PX .)
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
53 / 53
Download