Geometry of linear regression models

advertisement
The Geometry of Linear Regression
Guochang Zhao
RIEM, SWUFE
Fall 2016
November 3, 2016
The Geometry of Vector Spaces
Some notations
I
Denote the real line as R, and denote the set of n-vectors
as Rn .
I
Denote the a Euclidean space in n dimensions as E n .
I
Addition, subtraction and multiplication of Euclidean
space is the same as n × 1 vectors.
I
Scalar product:
hx, yi ≡ xT y
I
The notation on the left is generally used in the context of
the geometry of vectors, while the notation on the right is
generally used in the context of matrix algebra.
The Geometry of Vector Spaces
Some notations
I
The scalar product is commutative:
hx, yi = hy, xi.
I
The length, or norm, of a vector x is simply
||x|| = (xT x)1/2
I
In scalar term, it is
||x|| =
2
X
i=1
x2i
!1/2
Pythagoras’ Theorem
Pythagoras’ Theorem
Pythagoras’
Theorem
2.2 The Geometry of Vector Spaces
I
45
AC: hypotenuse
.........
.......... ....
...
..........
..........
...
..........
...
.........
.
.
.
.
.
.
...
.
.
.
........
.
...
.
.
.
.
.
.
.
.
...
.........
.
.
.
.
.
.
.
...
.
........
.
.
.
...
.
.
.
.
.
.
...
.........
.
.
.
.
.
.
.
.
...
....
...
............
...
...
...
...
...
...
...
...
...
...
...
2
2
...
...
...
1
2
...
...
...
...
...
...
...
...
....................................................................
...
...
...
...
....
...
...
...
...
...
...
...
2
...
...
...
...
2
...
....
...
...
............
...
. ......................................................................
...
.
.
.
.
....
....
...
...
...
...
.....
.....
....
....
...
...
...
...
...
...
...
...
2
...
...
....
....
1
...
...
.....
.....
....
....
...
...
...
...
...
...
...
...
...
.
.....................................................................................................................................
x +x
C
......
......... ....
.
.
.
.
.
.
.
...
........
..
........
x
..
.........
.
.
.
.
.
.
.
..
.
.
.
.
.
.
.
.
.
....................................................................................
.
.
A
B
x
Figure 2.1 Pythagoras’ Theorem
other sides, AB and BC, of lengths x1 and x2 respectively. The squares on
each of the three sides of the triangle are drawn, and the area of the square
on the hypotenuse is shown as x21 + x22 , in accordance with the theorem.
Pythagoras’ Theorem
A beautiful proof of Pythagoras’ Theorem, not often found in geometry texts,
is shown in Figure 2.2. Two squares of equal area are drawn. Each square
contains four copies of the same right-angled triangle. The square on the left
also contains the squares on the two shorter sides of the triangle, while the
I Proof of Pythagoras theorem
.....................................................................................................................................................................................................................
...
. ..............
.
...
..........
....
....
..........
...
....
....
..........
..........
....
....
....
..........
2
.
....
..........
.
.
....
..........
....
2
...
.
.
.
.
.
....
..........
..
..........
....
.
.
...
..........
....
.......... ....
...
.
..........................................................................................................................................................................................................................
.
.......
....
....
.... ....
....
....
.... .....
....
....
.... ....
....
....
...
.
....
.
.
.
....
...
.
....
....
...
....
.
...
....
.
.
....
.
...
....
....
...
2
....
.
...
....
.
.
....
.
...
1
....
....
...
....
.
...
....
.
.
.
....
...
....
....
...
....
... ...
....
....
... ..
....
... ...
....
... ...
....
....
.....
.
....
..
.
.
.........................................................................................................................................................................................................
x
x
.....................................................................................................................................................................................................................
.. .
...
.
.......... ....
...
....
...
..........
...
..........
...
....
.........
...
....
..........
....
...
..........
.
.
....
.
.
.
.
.
.
.
....
.
...
.........
.
.
....
.
.
.
.
.
.
.
....
.
...
.........
.
....
.
.
.
.
.
.
.
.
.
....
.
.
...
.... ...................
.
....
...
...............
.
.
....
...
.........
.
....
...
.... ....
.
.
....
...
.... .....
.
....
...
.
.... ....
2
2
.
.
... ....
...
....
.
.
1
2
... ...
...
....
... ...
...
... ...
...
....
......
...
....
..
...
.........
...
....
.......... ....
...
..........
.
....
.
.
.
...
.
.
.
.
.
....
...
.........
.
....
.
.
.
.
.
.
.
.
....
...
.......
....
...
..........
....
...
..........
.
.
....
.
.
.
.
.
.
....
.
...
.........
.
.
....
.
.
...
.
.
.
.
....
.
...
.........
.
....
.
.
.
.
.
.
.
....
.
... ..........
.
....
.
.
.
.
.
.
.
.
.
.
.........................................................................................................................................................................................................
x +x
Figure 2.2 Proof of Pythagoras’ Theorem
c 1999, Russell Davidson and James G. MacKinnon
Copyright �
Pythagoras’ Theorem
I
Any vector x ∈ E 2 has two components, usually denoted as
x1 and x2 ; these two components can be interpreted as the
Cartesian coordinates of the vector in the plane.
Pythagoras’ Theorem
Pythagoras’ Theorem
I
46
Any vector x ∈ E 2 has two components, usually denoted as
x1 and x2 ; these two components can be interpreted as the
Geometry
Linear Regression
Cartesian coordinates of the The
vector
in the of
plane.
B
..
.........
.........
...
...........
...
...... ....
.
.
.
.
...
.
.
...
...
......
...
..
......
...
..
......
.
.
.
.
...
..
.
.
.
.
.
.
...
..
.
.
.
.
.
.
...
..
.
.
.
.
.
.
...
..
.
.
.
.
.
.
...
..
.
.
.
.
.
.
...
.
..
.
.
.
.
.
....
.
.. 2
.
.
.
.
.
...
.
..
.
.
.
.
.
...
..
.
.
.
.
.
.
...
..
.
.
.
.
.
.
...
..
.
.
.
.
.
.
...
..
.
.
.
.
.
.
...
.
..
.
.
.
.
....
.
.
..
.
.
.
.
.
...
.
..
.
.
.
.
.
...
..
.
.
.
.
.
.
...
..
.
.
.... .......
.
.............................................................................................................................................................................................................................................................................................................................................................................................................................................
.
x
O
x1
Figure 2.3 A vector x in E 2
x
A
2 The
Geometryofofvectors
Vector Spaces
Addition
47
C
.........
......
..
...
...............
...
........................
.
.
.
.
.
.
.
.
...
.
.
..
.......... ..... ..
....
.......... .......... ....
...
..........
.
. ..
.
.
.
.
.
.
.
.
.
.
.
.
...
.
...
..... ..
..........
..... ...
...
..........
.
.
.
.
.
.
.
.
.
...
.
.
.
.
.
.
.
..
...
.....
..........
...
..
.....
..........
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.
.
..
.
..
..
.....
..........
...
..
..........
.....
..
.....
2 ....... ......................................................................
.
.
.
.
.
.
.
....
.. .
..
.....
...
..
.. ..
.....
.. ...
..
...
.....
.
.
.
.
.
... ...
....
.
.
..
..
.....
.. ...
....
..
.....
..
..
..
..
.....
.
.
..
.
..
.
.
.
...
.
.
.
..
.....
...
..
..
..
...
.....
..
..
..
.....
....
..
..
.. .......
..
.
.
.. ......
....
..
..
.
..
..
..
2 ................................................................................................................................................................
.
.
.
..
.....
..........
..... ..
..
..
..
..........
..... ....
..
..
...
..........
.....
.
.
.
.
.
..
.
.
.
.
.
.
..
.
.
.
.
.
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.
...
.. ........
.
....
... .........
.
.
..
..
.
.
.
.
.
.
.
.
..
.
.
.
..
... ......... ................... ....
....
..
.. .. ....
..
...
..
.... .. ..... ..............
..
...
.. .. ..... ..........
..
..
.... ....... ..........
..
...
..................
....................................................................................................................................................................................................................................................................................................................................................
x
y
B
x+y
y
y
x
A
x
O
y1
x1
Figure 2.4 Addition of vectors
48
Multiplication
by a scalar
O
I
The Geometry of Linear Regression
D
.................
..............
..............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..............
..............
αx...................................
.
.
............
..............
..............
..............
.
.
.
.
.
.
.
.
.
.
.
.
.
.......
..............
B
..............
........
................
..............
C
.
.
.
.
.
.
.
.
.
.
.
.
.
............
..............
..............
..............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.............•
..............
A
..............
..............
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.
.
.
.
.
.
.
x
.
.
.
.
.
..........
..............
Figure 2.5 Multiplication by a scalar
The vector αx is always parallel with x.
For the norm,
we have
TheI Geometry
of Scalar
Products
The scalar product of two vectors x and y, whether in E 2 or E n, can be
T
1/2
||αx|| = hαx,
αxi1/2
= |α|(x
|α|vectors
· ||x|| and the
expressed geometrically
in terms
of the
lengths x)
of the=two
angle between them, and this result will turn out to be very useful. In the
case of E 2, it is natural to think of the angle between two vectors as the angle
between the two line segments that represent them. As we will now show, it
The geometry of scalar products
I
The scalar product of two vectors x and y, can be
expressed geometrically in terms of the lengths of the two
vectors and the angle between them.
I
If the angle between two vectors is 0, they must be parallel.
For example, y = αx, then
hx, yi = hx, αxi = α(xT x) = α · ||x||
I
If α > 0, it follows that
hx, yi = ||x|| · ||y||.
This result is true only if x and y are parallel and point in
the same direction (rather than in opposite directions).
The geometry of scalar products
I Consider
2.2 The Geometry
Vector Spaces
two ofvectors,
w and
49 θ
z, both of length 1, and let
denote the angle between them.
O
I
I
z.
............ B
..... ..
..... ....
.
.
.
.
...
.
.
.
.
....
..
....
.....
.....
..
.....
.
.
.
.
...
...
.
.
.
....
.
...
.
.
.
.
..
...
.
.
.
...
.
...
.
.
.
....
.
...
.
.
.
..
.
...
.
.
.
....
.
...
.
.
.
...
.
...
.
.
.
..
.
...
.
.
.
....
.
...
.
.
.
..
.
...
.
.
.
...
.
...
.
.
.
....
.
...
.
.
.
..
.
.
........
.
.
...
.
... .....
.
.
.
.
....
... θ
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
..
.........................................................................................................................................................................................................................................
A
w
By elementaryFigure
trigonometry,
the coordinates of z must be
2.6 The angle between two vectors
(cos θ, sin θ)
the angle
BOAproduct
is given, byofthewusual
rule, by the ratio of the
The
scalar
andtrigonometric
z is
length of the opposite side AB to that of the hypotenuse OB. This ratio is
sin θ/1 = sin θ, and so the angle
BOA is indeed equal to θ.
hw, zi = wT z = w1 z1 + w2 z2 = z1 = cos θ
Now let us compute the scalar product of w and z. It is
because w1 =�w,
1 z�
and
� 2 = 0.
= ww
z = w1 z1 + w2 z2 = z1 = cos θ,
The geometry of scalar products
I
More generally, let x = αw and y = γz, for positive scalars
α and γ.
I
Then ||x|| = α, ||y|| = γ, and
hx, yi = xT y = αγwT z = αγhw, zi = ||x|| · ||y|| · cos θ
I
It is the same in E n as in E 2 .
I
If two nonzero scalar product, they are at right angles, they
are called orthogonal, or, less commonly, perpendicular.
I
Since the consine function can take on values only between
-1 and 1, a consequence of the above equation is that
|xT y| ≤ ||x|| · ||x||.
This is the Cauchy-Schwartz inequality.
Subspaces of Euclidean Space
I
A subspace have a dimension lower than n.
I
A subspace of particular interest is the one for which the
columns of X provide the basis vectors:
The subspace ∆(x1 , x2 , ..., xk ) consists of every vector that
can be formed as a linear combination of the
xi , i = 1, ..., k. Formally, it is defined as
(
)
k
X
∆(x1 , x2 , ..., xk ) ≡ z ∈ E n |z =
bi xi , bi ∈ R .
i=1
I
Here, x1 , x2 , ..., xk are the basis vector and they span this
subspace ∆.
I
Correspondingly, ∆ is called the column space of X, or less
formally, the span of X, or the span of the xi .
2.2 The Geometry
of Vector Spaces
Subspaces
of Euclidean
Space
51
...
... ⊥
... S (X )
...
...
...
...
...
...
...
...
......
...
...........
...........
...
...........
S(X)
...
...........
...
...........
.
.
.
.
.
.
.
.
.
.
.
...
............................
...
...
......................
...
............................ x
..................
.
.
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
...............
..............
............................
... .......................................
. .........
.................
.
.
.
.
.
.
.
.
.
.
........... O ....
..........
...
...........
...
...
...
...
...
...
..
Figure 2.7 The spaces S(X ) and S⊥ (X )
The orthogonal complement of ∆(X) in E n , which is
T (X), is the set of all vectors w in E n that are
denoted
The
subspace ∆
defined
in (2.09) is called the subspace spanned by the xi ,
i = 1, . . . , k, or the column space of X; less formally, it may simply be referred
tooreverything
toorthogonal
as the span of X,
the span of theinxi∆(X):
.
I
The orthogonal complement
of S(X)nin ETn, which is denoted S⊥ (X), is the
⊥ w in E n that are orthogonal
set of all vectors
∆ (X) ≡ w ∈ E |w z =to 0everything
for all in
z S(X).
∈ ∆XThis.
means that, for every z in S(X), �w, z� = w�z = 0. Formally,
I
�
�
�
S⊥ (X) ≡ w ∈ E n ∆
| w⊥
z is
= 0(n
for−
allk)-dimensional.
z ∈ S(X) .
If ∆ is k-dimensional,
Subspaces of Euclidean Space
I
52
The Geometry of Linear Regression
The subspace in E 2 .
x
2
.......
....
...
...
.
.
...
...
...
...
...
.
.
.
...
...
b2 x2 ............... ............................................................................................................................................................. b1 x1 + b2 x2
.......
...
...
.
..
..............
.
.
..............
...
...
..
.............
..
...
..............
...
..............
.
.
.
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....
...
...
..............
..
...
..............
..
...
..............
..
...... .........................
...
..
...
... .... ..θ
................................................................................................................................................................................................................................................................. x1
O
b1 x 1
Figure 2.8 A 2-dimensional subspace
I
2.2 The Geometry of Vector Spaces
53
The geometry of linear regression model.
......... of the page, with
as our second dimension the vertical direction in the plane
............
...... ..
the result that we can draw an arrow for x2 , as shown.
....... ...
.
.......
.....
......
..
..
..
....... of Figure
Any vector in S(x1 , x2 ) can be drawn in the plane
2.8. Consider,
.......
...
.......
... by the expression
.......
for instance, the linear combination of x1.........and
x2 given
..
.
.
.
.
.
.
.
.
.
.
.
z ≡ b1 x1 + b2 x2 . We could draw the ...vector
z by computing
its length and
.......
...
y........ ...
... u apply the rules for
the angle that it makes with x1 . Alternatively,
we could
..
.......
..
.......
.
.
.
.
.
.
.
.
.
adding vectors geometrically that
2.4 to the vectors
....... were illustrated in ..Figure
.......
... case in which b = 2/
......
b1 x1 and b2 x2 . This is illustrated
in the figure for the
..
1
3
.......
..
.......
.
.
.
.
.
.
.
.
and b2 = 1/2 .
......
...
.......
.....
......
..
..
..
In precisely the same
.............................way,
...........................we
...................can
......................represent
....................................................any
.......................... three
Xβ vectors by arrows in
O
3--dimensional space,
but we leave this task to the reader. It will be easier to
appreciate Figure
the renderings
of vectorsofinthe
three
dimensions
perspective that
2.9 The geometry
linear
regressionin
model
Linear independence
I
I
I
The Assumption 1.3 — X is full column rank — is
equivalent to that the columns of X should be linearly
independent.
Linear independence is different from statistical
independence.
The vectors x1 through xk are said to be linearly
dependent, if there is a vector xj , 1 ≤ j ≤ k, and
coefficients ci such that
X
xj =
ci xi
i6=j
I
Or in other words, there exists coefficients bi , at least one
of which is nonzero, such that
k
X
i=1
bi xi = 0.
Linear independence
I linearly independent in matrix form:
Xb = 0,
where b is a k-vector with typical element bi .
I Why does linearly dependent mean XT T is not invertible?
I
Consider and vector a, and suppose that
XT Xa = c.
I
If XT X is invertible, then we could have
(XT X)−1 c = a,
I
(*)
However, linear independence also means
XT X(a + b) = c,
which wold give
(XT X)−1 c = a + b.
I
Equations (*) and (#) cannot hold at the same time.
(#)
The Geometry of OLS estimation
I
The matrix product of Xβ can be written as


β1


k
X
 β2 

Xβ = [x1 x2 · · · xk ]  . 
=
x
β
+x
β
+·
·
·+x
β
=
βi x i
1
1
2
2
k
k

 .. 
i=1
βk
I
Because of the linear independence, any element of the subspace
∆(X) = ∆(x1 , ..., xk ), can be written as Xβ for some β.
I
Thus, every n-vector Xβ belongs to ∆(X), which is, in general, a
k-dimensional subspace of E n .
I
The vector Xβ̂ constructed using the OLS estimator β̂ belongs
to this subspace.
I
The estimator β̂ was obtained by solving the following equation:
XT (y − Xβ̂) = 0.
The Geometry of OLS estimation
The Geometry of Linear Reg
I
The Geometry of OLS estimation
O
........
..........
...... ....
.
.
.
.
.
...
....
......
....
......
.
.
.
.
.
.....
....
.
.
.
.
.
.....
....
.
.
.
.
.
...
.....
.
.
.
.
.
....
.....
.
.
.
.
.
.....
....
.
.
.
.
.
... û
y.............
....
..
.
.
.
.
.
...
.....
.
.
.
.
.
....
.....
.
.
.
.
.
.....
....
.
.
.
.
.
.....
...
.
.
.
.
.
....
.....
.
.
.
.
.
....
.....
.
.
.
.
.
..........
.
................
.
.
.
.
.
.
. .
.. . θ
..................................................................................................................................................................................................................................... Xβ̂
Figure 2.10 Residuals and fitted values
.....
The Geometry of OLS estimation.....................................B
I
.
...
.....
...
.....
...
.....
.
.
.
...
.
..
.
.
.
...
.
2.3 The Geometry of OLS Estimation
57
y .......
...
.
.
.
.
... û
.
..
.
.
.
... .......
.
..
.
.
.
.
.
.
.
.
B
.
.
.
.
... .
.....
....... ...
..... . ..
.....
x2 .....
.... .... .....
.....
...........
.....
... ...
.....
.
.....
.........
.
.
.
.
.
.
.
.
..
.........
............ ..
...
S(x1 , x2 )
.....
..... ... .....
.........
.....
.....
.
.
.
. ...
.
y
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.........
.
.
X...β̂
..
......................... . A
.
.
.
.
.
.
.
.
.
.
.
.
........
.
.
.
.
.
.
.
.
..... û
.
.
.
.
......... ............ θ ........................................... .. ..........
..
.
................................................................................................................................................................................................................................
.
.. x1
...
x
.....
Linear regression in three dimensions
2 ......
..
..
..........
....
...
.........
.....
...
.........
.....
S(x1 , x2 )
...............
.........
.....
..........
.
.
.
.
.
.
.
.
.
.
.
.
.........
.
.
.
X...β̂
........................ . A
....
.
.
.
.
.
.
........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
......... ........... θ ...................................
.......................................................................................................................................................................................................................
.. x1
O
a) y projected on two regressors
O
x.2
a) y projected on two regressors
.
........
...
...
x2
...
.........
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
.
........................................................................................................................................................................................A
A
β̂2 x2 .......
β̂2 x2 ......... ...... ........................................................................................
... ....
...
...
.... ......
............. B .......
... ...
...
....
...
...
.
......... ..... .............. B
.
.
.
.
.
.
.
...
.
.
......... .......... ...
...
......
...
......... ........
X.....β̂ ..........
...
...
...
.... .......
......... ............. .....
.
.
.
.
.
.
.
X
β̂
...
.
.
.
y
.
.
.
..
.
.
.
..
...
.
...
.....
...
.. ....
......
...
......... .............
..... û ...
...
....
...
...
.
.........
........y
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.... û
.
.
.
.
.
.
...
...
... ..... ...
.
....... ...........
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
...
.
.
.
.
..
.
... ....
.............
............ ............
................................................................................................................................ x
...
.....
...
..................................θ
.........................................................................................
.
.
1
...
... .......
.
.
.
.......
...
.
O
O
A
.
................
.
.
........
.
.
X
β̂
.
β̂
x
.
........................................................................................................................1.. x1 1
............................θ
........................................................................................
O
O
A
Xβ̂
β̂1 x1
b) The span S(x1 , x2 ) of the regressors
c) The vertical plane through y
Orthogonal Projections
I
A projection is a mapping that takes each point of E n into
a point in a subspace of E n , while leaving all points in that
subspace unchanged.
I
The subspace is called the invariant subspace of the
projection.
I
An orthogonal projection maps any point into the point of
the subspace that is closest to it. If a point is already in
the invariant subspace, it is mapped into itself.
I
Algebraically, an orthogonal projection on to a given
subspace can be performed by premultiplying the vector to
be projected by a suitable projection matrix.
P ≡ X(X0 X)−1 X0 , and M ≡ In − P
n×n
n×n
Orthogonal Projections
I
Geometrically, an orthogonal decomposition
y = PX y + MX y can be represented by a right-angled
triangle, with y as the hypotenuse and PX y and MX y as
the other two sides.
I
In terms of projections, the Pythagoras Theorem can be
written as
||y||2 = ||PX y||2 + ||MX y||2
I
For any orthogonal projection matrix PX and any vector
y ∈ E n , ||PX y|| ≤ ||y||.
The Frish-Waugh-Lovell Theorem
I Adding a constant does not affect the slope coefficient
2.4 The
Frisch-Waugh-Lovell Theorem
Figure 2.12
z
cι
x
D ...............................................................................................................B
..
....
..
...
..
...
.
.
...
.
...
ŷ
..
α̂2 z ..............................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
A
.
.
.
... C
.
.
... β̂2 x
.
.......... ... ..
...
......... ...... ....
...
...
.........
.
.
.
.
.
.
.
.. .
.
.
.....
.
.
.
.... ....
.........
...
.
...
..........
.
.
...
.
.
..
.
.
.
.
.
.
.
.
.
.....
.
.
...
...
.........
.
.
...
.
.
.
.
.
.
.
.
.
....
.
.
.
.
.
...
...
..........
...
...
...
.........
...
..
...
.........
.
.
....
.
.
.
.
.
.
.
.
.
.
.
.....
.
.
.
...
.........
...
.
...
...
..........
...
...
...
...
.........
.
.
...
.
.
.
.
.
.
.
.
.
....
.
.
.
.
..
..
.........
...
...
...
..........
...
....
....
...
.........
.
.
.
.
.
.
.
.
.
.
.
..... .....
.
.
.
........
..
...
...
.........
...
... ....
.........
...
...
.
.
..... ...... ..................
.
.
.
..
.. .....
....
... .... .............
...
...
.................................................................................................................................................................................................................................................................................................................
...
..
.. ι
O
α̂
ι
1
β̂1 ι
The Frish-Waugh-Lovell Theorem
I
The Geometry of Linea
Orthogonal regressors may be ommited
z ........
α̂2 z
...
...
..
..
ŷ
.
..........................................................................................................................................................................................................
...
...
...
...
...
...
...
...
..
..
...
...
....
....
...
...
...
...
...
...
..
..
...
...
...
...
...
...
...
...
............
..
.............................................................................................................................................................................................................................
O
α̂1 ι
ι
The group of regressors
I
Partition X into X1 and X2 , with k1 + k2 = k, then we have
n×k
n×k1
n×k2
y = X1 β 1 + X2 β 2 + u
I
(2.34)
If all the regressors in X1 are orthogonal to all the regressors in
0
X2 , so that X2 X1 = 0, then the OLS estimate β̂ 1 obtained by
the above equation is the same as it obtained from
y = X1 β 1 + u1
(2.35)
It is the same for the β̂ 2 .
I
In addition, we also have
0
0
P1 PX = PX P1 = P1 , where P1 ≡ PX1 = X1 (X1 X1 )−1 X1
I
Thus,
P1 y = P1 pX y = P1 (X1 β̂ 1 + X2 β̂ 2 ) = P1 X1 β̂ 1 = X1 β̂ 1
The Frish-Waugh-Lovell Theorem
I
If we transform the regressor matrix in (2.34) by adding
X1 A to X2 , where A is a k1 × k2 matrix, and leaving X1
as it is, we have the regression
y = X1 α1 + (X2 + X1 A)α2 + u
(2.38)
I
The α̂ from Eq. (2.38) is the same as β̂ 2 from Eq. (2.34).
I
Define M1 ≡ I − P1 , so we can run the following regression:
y = X1 α1 + M1 X2 α2 + u
0
0
= X1 X 1 + (X2 − X1 (X1 X1 )−1 X1 X2 )α2 + u
I
I
α̂2 will be unchanged, if we omit X1 .
This regression is a special case of Eq. (2.38).
The Frish-Waugh-Lovell Theorem
I
The following two equations
y = X1 α1 + M1 X2 β 2 + u
(2.39)
y = M1 X2 β 2 + v
(2.40)
have the same estimate of β̂ 2
I
But they have different residuals:
M1 y = M1 X2 β 2 + ν
This regression is called the FWL regression.
The Frish-Waugh-Lovell Theorem
The Frish-Waugh-Lovell Theorem
I
The OLS estimates of β 2 from regressions (2.34) and (2.41)
are numerically identical.
I
The residuals from regressions (2.34) and (2.41) are
numerically identical.
I
Proof:
I
Remark: the FWL Theorem allows us to obtain explicit
0
0
expressions based on (X2 M1 X2 )−1 X2 M1 y for subsets of
the parameter estimates of a linear regression.
I
Application and assignment: Using FWL theorem to
handle the data with seasonal variation and time trend.
Download