CLASSICAL REGRESSION
THE MODEL
Jae Hyun GWON
Incheon Nat’l Univ.
Spring 2025
Jae Hyun GWON (Incheon Nat’l Univ.)
CLASSICAL REGRESSION
Spring 2025
1/9
Least Squares Estimator
Regression equation in matrix form
y = X β + ε,
▶
ε ∼ (0, σ 2 I )
The parameters to estimate are β and σ 2 .
LS Estimator
β̂ = argminβ (y − X β)′ (y − X β)
▶
▶
It is to minimize the “distance” between y and X β.
There exists a unique X β̂.
β̂ = (X ′ X )−1 X ′ y
Jae Hyun GWON (Incheon Nat’l Univ.)
CLASSICAL REGRESSION
Spring 2025
2/9
Geometry of Ordinary Least Squares (OLS)
Jae Hyun GWON (Incheon Nat’l Univ.)
CLASSICAL REGRESSION
Spring 2025
3/9
Least Squares Estimator
LS Estimator
2
σ̂ =
▶
▶
▶
Pn
2
i=1 εi
n
=
ε′ ε
n
However, we cannot know what true εi ’s are.
In place, we substitute ε̂i for εi .
Pn 2
ε̂
ε̂′ ε̂
2
σ̂ = i=1 i =
.
n
n
Why should we minimize the sum of the squared residuals?
⋆
⋆
No reason. It’s up to you.
It’s just simply what LS estimator is. That’s it.
Jae Hyun GWON (Incheon Nat’l Univ.)
CLASSICAL REGRESSION
Spring 2025
4/9
Geometry of LS
Decomposition
y = X β̂ + ε̂
X β̂ = PX y
ε̂ = (I − PX )y
▶
(I − PX ) is an “annihilator” of X .
Jae Hyun GWON (Incheon Nat’l Univ.)
CLASSICAL REGRESSION
Spring 2025
5/9
Decomposition of Squarings
By Pythagorean Theorem, we have
y ′ y = y ′ PX y + y ′ (I − PX )y
= (PX y )′ (PX y ) + ((I − PX )y )′ ((I − PX )y )
= ŷ ′ ŷ + ε̂′ ε̂.
Hence,
TSS = ESS + RSS
where TSS abbreviates total sum of squares, ESS stands for explained
sum of squares, and RSS for residual sum of squares.
▶
Some others use ESS and TSS exactly in the opposite meaning where
ESS is a short for error sum of squares and RSS for regression sum of
squares.
Jae Hyun GWON (Incheon Nat’l Univ.)
CLASSICAL REGRESSION
Spring 2025
6/9
Coefficient of Determination (R 2 )
One of the statistics that indicates the overall fitness of the regression is
R2 =
y ′ PX y
y ′ (1 − PX )y
=
1
−
y ′y
y ′y
R 2 is bounded above 1 and below 0.
Note that R 2 increases as R(X ) increases.
That explains why R 2 with cross-sectional data is likely low to 0 since
the number of observations is way more than the number of variables.
▶ We can infer that R 2 with time-series data is likely high to 1 since y is
close to X in terms of distance.
▶
R 2 always increases when we add a new variable.
Jae Hyun GWON (Incheon Nat’l Univ.)
CLASSICAL REGRESSION
Spring 2025
7/9
Mean-Adjusted R 2 (R∗2 )
We can construct a similar indicator using de-meaned y ’s.
R∗2 = 1 −
y ′ (I − PX )y
y ′ (I − Pι )y
where Pι is the orthogonal projection on R(ι) with constant vector
′
ι = 1 1 ··· 1 .
▶
Pι averages the component of a vector (mean operator) whereas
(I − Pι ) subtract the average from each component (de-mean
operator).
ιι′
y = ȳ
Pι y =
n
▶
ȳ
···
ȳ
′
′
and y (I − Pι )y =
n
X
(yi − ȳ )2 .
i=1
It’s possible to have R∗2 < 0 if X does NOT include constant vector ι
since
⋆
⋆
R∗2 < 0 ⇔ y ′ (I − PX )y > y ′ (I − Pι )y .
This is the R-squared reported in various statistical softwares.
Jae Hyun GWON (Incheon Nat’l Univ.)
CLASSICAL REGRESSION
Spring 2025
8/9
Adjusted R 2 by Theil (R̄ 2 )
As noted, R 2 or R∗2 increases as a new variable is added.
▶
It doesn’t matter whether the added variable is related to y or not.
To penalize adding irrelevant variables, Henri Theil suggested
1
R̄ 2 = 1 − n−m
1
y ′ (I − PX )y
′
n−1 y (I − Pι )y
▶
.
It is called “adjusted R-squared” and most softwares report it.
R̄ 2 increases (decreases) when you add a new variable of which
t-statistics is greater (less) than 2.
▶
▶
Why is magic number 2 important? No reason.
In this sense, R̄ 2 is just an auxiliary index.
Jae Hyun GWON (Incheon Nat’l Univ.)
CLASSICAL REGRESSION
Spring 2025
9/9