Regression Analysis Lecture 6: MIT 18.S096

advertisement
Regression Analysis
Lecture 6: Regression Analysis
MIT 18.S096
Dr. Kempthorne
Fall 2013
MIT 18.S096
Regression Analysis
1
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.S096
Regression Analysis
2
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Multiple Linear Regression: Setup
Data Set
n cases i = 1, 2, . . . , n
1 Response (dependent) variable
yi , i = 1, 2, . . . , n
p Explanatory (independent) variables
xi = (xi,1 , xi,2 , . . . , xi,p )T , i = 1, 2, . . . , n
Goal of Regression Analysis:
Extract/exploit relationship between yi and xi .
Examples
Prediction
Causal Inference
Approximation
Functional Relationships
MIT 18.S096
Regression Analysis
3
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
General Linear Model: For each case i, the conditional
distribution [yi | xi ] is given by
yi = ŷi + i
where
ŷi = β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p
β = (β1 , β2 , . . . , βp )T are p regression parameters
(constant over all cases)
i Residual (error) variable
(varies over all cases)
Extensive breadth of possible models
Polynomial approximation (xi,j = (xi )j , explanatory variables are different
powers of the same variable x = xi )
Fourier Series: (xi,j = sin(jxi ) or cos(jxi ), explanatory variables are different
sin/cos terms of a Fourier series expansion)
Time series regressions: time indexed by i, and explanatory variables include
lagged response values.
Note: Linearity of ŷi (in regression parameters) maintained with non-linear x.
MIT 18.S096
Regression Analysis
4
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Steps for Fitting a Model
(1) Propose a model in terms of
Response variable Y (specify the scale)
Explanatory variables X1 , X2 , . . . Xp (include different
functions of explanatory variables if appropriate)
Assumptions about the distribution of over the cases
(2) Specify/define a criterion for judging different estimators.
(3) Characterize the best estimator and apply it to the given data.
(4) Check the assumptions in (1).
(5) If necessary modify model and/or assumptions and go to (1).
MIT 18.S096
Regression Analysis
5
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Specifying Assumptions in (1) for Residual Distribution
Gauss-Markov: zero mean, constant variance, uncorrelated
Normal-linear models: i are i.i.d. N(0, σ 2 ) r.v.s
Generalized Gauss-Markov: zero mean, and general covariance
matrix (possibly correlated,possibly heteroscedastic)
Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto,
Contaminated normal: some fraction (1 − δ) of the i are i.i.d.
N(0, σ 2 ) r.v.s the remaining fraction (δ) follows some
contamination distribution).
MIT 18.S096
Regression Analysis
6
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Specifying Estimator Criterion in (2)
Least Squares
Maximum Likelihood
Robust (Contamination-resistant)
Bayes (assume βj are r.v.’s with known prior distribution)
Accommodating incomplete/missing data
Case Analyses for (4) Checking Assumptions
Residual analysis
Model errors i are unobservable
Model residuals for fitted regression parameters β̃j are:
ei = yi − [β̃1 xi,1 + β̃2 xi,2 + · · · + β̃p xi,p ]
Influence diagnostics (identify cases which are highly
‘influential’ ?)
Outlier detection
MIT 18.S096
Regression Analysis
7
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.S096
Regression Analysis
8
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Ordinary Least Squares Estimates
Least Squares Criterion: For β = (β1 , β2 , . . . , βp )T , define
PN
2
Q(β) =
i=1 [yi − ŷi ]
PN
2
=
i=1 [yi − (β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p )]
Ordinary Least-Squares (OLS) estimate β̂: minimizes Q(β).
Matrix Notation



y1
x1,1 x1,2 · · ·
 y2 
 x2,1 x2,2 · · ·



y= . X= .
..
..
 .. 
 ..
.
.
yn
xn,1 xn,2 · · ·
MIT 18.S096
x1,p
x2,p
.
..
xp,n
Regression Analysis





β=


β1
.. 
. 
βp
9
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Solving for OLS Estimate β̂



ŷ = 

Q(β) =
ŷ1
ŷ2
..
.



 = Xβ and

ŷn
Pn
i=1 (yi
− ŷi )2 = (y − ŷ)T (y − ŷ)
= (y − Xβ)T (y − Xβ)
OLS β̂ solves
∂Q(β)
∂βj
=
=
=
∂Q(β)
∂βj
=0,
j = 1, 2, . . . , p
Pn
∂
2
i=1 [yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
∂βj
Pn
i=1 2(−xi,j )[yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
−2(X[j] )T (y − Xβ) where X[j] is the jth column of X
MIT 18.S096
Regression Analysis
10
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Solving for OLS Estimate β̂

∂Q
∂β


= 


∂Q
∂β1
∂Q
∂β2
..
.
∂Q
∂βp






 = −2 




So the OLS Estimate β̂ solves
XT (y − Xβ)
⇐⇒
XT Xβ̂
=⇒
β̂
XT
[1] (y − Xβ)
XT
[2] (y − Xβ)
..
.
XT
[p] (y − Xβ)
the
=
=
=



 = −2XT (y − Xβ)


“Normal Equations”
0
XT y
(XT X)−1 XT y
N.B. For β̂ to exist (uniquely)
(XT X) must be invertible
⇐⇒
X must have Full Column Rank
MIT 18.S096
Regression Analysis
11
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
(Ordinary) Least Squares Fit
OLS Estimate:

β̂ = 


yˆ = 


ŷ1
ŷ2
..
.
ŷn
β̂1
βˆ2
..
.
β̂p



T
T
 = (X X)−1 X y Fitted Values:

 
 
=
 
x1,1 β̂1 + · · · + x1,p β̂p
x2,1 β̂1 + · · · + x2,p β̂p
..
.
xn,1 βˆ1 + · · · + xn,p βˆp





= Xβ̂ = X(XT X)−1 XT y = Hy
Where
H = X(XT X)−1 XT is the n × n “Hat Matrix”
MIT 18.S096
Regression Analysis
12
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
(Ordinary) Least Squares Fit
The Hat Matrix H projects R n onto the column-space of X
Residuals: ˆi = yi − ŷi , i = 1, 2, . . . , n


ˆ = 
ˆ1
ˆ2
..
.


 = y − ŷ = (In − H)y
ˆn


0
 
Normal Equations: XT (y − Xβ̂) = XT ˆ = 0p =  ... 
0
N.B. The Least-Squares Residuals vector ˆ is orthogonal to the
column space of X
MIT 18.S096
Regression Analysis
13
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.S096
Regression Analysis
14
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Gauss-Markov Theorem: Assumptions

x1,1 x1,2 · · · x1,p


 x2,1 x2,2 · · · x2,p 




Data y = 
 and X =  ..
..
.. 
.
.


 .
.
.
. 
yn
xn,1 xn,2 · · · xp,n
follow a linear model satisfying the Gauss-Markov Assumptions
if y is an observation of random vector Y = (Y1 , Y2 , . . . YN )T and

y1
y2
..
.


E (Y | X, β) = Xβ, where β = (β1 , β2 , . . . βp )T is the
p-vector of regression parameters.
Cov (Y | X, β) = σ 2 In , for some σ 2 > 0.
I.e., the random variables generating the observations are
uncorrelated and have constant variance σ 2 (conditional on X,
and β).
MIT 18.S096
Regression Analysis
15
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Gauss-Markov Theorem
For known constants c1 , c2 , . . . , cp , cp+1 , consider the problem of
estimating
θ = c1 β1 + c2 β2 + · · · cp βp + cp+1 .
Under the Gauss-Markov assumptions, the estimator
θ̂ = c1 βˆ1 + c2 β̂2 + · · · cp βˆp + cp+1 ,
where β̂1 , β̂2 , . . . βˆp are the least squares estimates is
1) An Unbiased Estimator of θ
2) A Linear Estimator
of θ, that is
P
θ̂ = ni=1 bi yi , for some known (given X) constants bi .
Theorem: Under the Gauss-Markov Assumptions, the estimator
θ̂ has the smallest (Best) variance among all Linear Unbiased
Estimators of θ, i.e., θ̂ is BLUE .
MIT 18.S096
Regression Analysis
16
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Gauss-Markov Theorem: Proof
Proof: Without loss of generality, assume cp+1 = 0 and
define c =(c1 , c2 , . . . , cp )T .
The Least Squares Estimate of θ = cT β is:
θ̂ = cT β̂ = cT (XT X)−1 XT y ≡ dT y
a linear estimate in y given by coefficients d = (d1 , d2 , . . . , dn )T .
Consider an alternative linear estimate of θ:
θ̃ = bT y
with fixed coefficients given by b = (b1 , . . . , bn )T .
Define f = b − d and note that
θ̃ = bT y = (d + f)T y = θ̂ + f T y
If θ̃ is unbiased then because θ̂ is unbiased
0 = E (f T y) = dT E (y) = f T (Xβ) for all β ∈ R p
=⇒ f is orthogonal to column space of X
=⇒ f is orthogonal to d = X(XT X)−1 c
MIT 18.S096
Regression Analysis
17
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
If θ̃ is unbiased then
The orthogonality of f to d implies
Var (θ̃) =
=
=
=
=
=
≥
Var (bT y) = Var (dT y + f T y)
Var (dT y) + Var (f T y) + 2Cov (dT y, f T y)
Var (θ̂) + Var (f T y) + 2dT Cov (y)f
Var (θ̂) + Var (f T y) + 2dT (σ 2 In )f
Var (θ̂) + Var (f T y) + 2σ 2 dT f
Var (θ̂) + Var (f T y) + 2σ 2 × 0
Var (θ̂)
MIT 18.S096
Regression Analysis
18
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.S096
Regression Analysis
19
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Generalized Least Squares (GLS) Estimates
Consider generalizing the Gauss-Markov assumptions for the linear
regression model to
Y = Xβ + where the random n-vector : E [] = 0n and E [0 ] = σ 2 Σ.
σ 2 is an unknown scale parameter
Σ is a known (n × n) positive definite matrix specifying the
relative variances and correlations of the component
observations.
1
1
Transform the data (Y, X) to Y∗ = Σ− 2 Y and X∗ = Σ− 2 X and
the model becomes
Y∗ = X∗ β + ∗ , where E [∗ ] = 0n and E [∗ (∗ )0 ] = σ 2 In
By the Gauss-Markov Theorem, the BLUE (‘GLS’) of β is
β̂ = [(X∗ )T (X∗ )]−1 (X∗ )T (Y∗ ) = [XT Σ−1 X]−1 (XT Σ−1 Y)
MIT 18.S096
Regression Analysis
20
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.S096
Regression Analysis
21
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Normal Linear Regression Models
Distribution Theory
Yi = xi,1 β1 + xi,2 β2 + · · · xi,p βp + i
= µ i + i
Assume {1 , 2 , . . . , n } are i.i.d N(0, σ 2 ).
=⇒ [Yi | xi,1 , xi,2 , . . . , xi,p , β, σ 2 ] ∼ N(µi , σ 2 ),
independent over i = 1, 2, . . . n.
Conditioning on X, β, and σ 2 


Y = Xβ + , where = 

1
2
..
.



 ∼ Nn (On , σ 2 In )

n
MIT 18.S096
Regression Analysis
22
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Distribution Theory


µ1


µ =  ...  = E (Y | X, β, σ 2 ) = Xβ
µn
MIT 18.S096
Regression Analysis
23
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Regression Analysis




Σ = Cov (Y | X, β, σ 2 ) = 


σ2
0
0
...
0 0
σ2 0
0 σ2
...
0
0
···
···
..
.
···
0
0
0
...
σ2




 = σ 2 In


That is, Σi,j = Cov (Yi , Yj | X, β, σ 2 ) = σ 2 × δi,j .
Apply Moment-Generating Functions (MGFs) to derive
Joint distribution of Y = (Y1 , Y2 , . . . , Yn )T
Joint distribution of β̂ = (β̂1 , β̂2 , . . . , β̂p )T .
MIT 18.S096
Regression Analysis
24
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MGF of Y
For the n-variate r.v. Y, and constant n−vector t = (t1 , . . . , tn )T ,
MY (t) =
=
=
=
=
T
E (e t Y ) = E (e t1 Y1 +t2 Y2 +···tn Yn )
E (e t1 Y1 ) · E (e t2 Y2 ) · · · E (e tn Yn )
MY1 (t1 ) · MY2 (t2 ) · · · MYn (tn )
Qn
ti µi + 21 ti2 σ 2
i =1 e
P
n
1 Pn
1 T
T
e i=1 ti µi + 2 i,k=1 ti Σi,k tk = e t u+ 2 t Σt
=⇒ Y ∼ Nn (µ, Σ)
Multivariate Normal with mean µ and covariance Σ
MIT 18.S096
Regression Analysis
25
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Regression Analysis
MGF of β̂
For the p-variate r.v. β̂, and constant p−vector τ = (τ1 , . . . , τp )T ,
Mβ̂ (τ ) = E (e τ
T β̂
ˆ
ˆ
) = E (e τ1 β1 +τ2 β2 +···τp βp )
Defining A = (XT X)−1 XT we can express
β̂ = (XT X)−1 XT y = AY
and
Mβˆ (τ ) =
=
=
=
=
T
ˆ
E (e τ β )
T
E (e τ AY )
T
E (e t Y ), with t = AT τ
MY (t)
1 T
T
e t u+ 2 t Σt
MIT 18.S096
Regression Analysis
26
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MGF of β̂
For
Mβ̂ (τ ) = E (e τ
= e
T β̂
)
tT u+ 12 tT Σt
Plug in:
t = AT τ = X(XT X)−1 τ
µ = Xβ
Σ = σ 2 In
Gives:
tT µ = τ T β
tT Σt = τ T (XT X)−1 XT [σ 2 In ]X(XT X)−1 τ
= τ T [σ 2 (XT X)−1 ]τ
So the MGF of β̂ is
1 T 2
T
T
−1
Mβ̂ (τ ) = e τ β+ 2 τ [σ (X X) ]τ
T Regression
2
−1 Analysis
MIT 18.S096
ˆ
27
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Marginal Distributions of Least Squares Estimates
Because
β̂ ∼ Np (β, σ 2 (XT X)−1 )
the marginal distribution of each β̂j is:
β̂j ∼ N(βj , σ 2 Cj,j )
where Cj.j = jth diagonal element of (XT X)−1
MIT 18.S096
Regression Analysis
28
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
The Q-R Decomposition of X
Consider expressing the (n × p) matrix X of explanatory variables
as
X=Q·R
where
Q is an (n × p) orthonormal matrix, i.e., QT Q = Ip .
R is a (p × p) upper-triangular matrix.
The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by
performing the Gram-Schmidt Orthonormalization procedure on
the columns of X = [X[1] , X[2] , . . . , X[p] ]
MIT 18.S096
Regression Analysis
29
Regression Analysis

If
X[1]
=⇒
r1,1 r1,2 · · ·
 0
r2,2 · · ·


..
R= 0
.
0

 0
0
0
0
···
= Q[1] r1,1
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
r1,p−1
r2,p−1
...
r1,p
r2,p
...
rp−1,p−1 rp−1,p
0
rp,p




, then


2
r1,1
= XT
[1] X[1]
Q[1] = X[1] /r1,1
X[2] = Q[1] r1,2 + Q[2] r2,2
=⇒
T
T
T
Q[1]
X[2] = Q[1]
Q[1] r1,2 + Q[1]
Q[2] r2,2
= 1 · r1,2 + 0 · r2,2
= r1,2 (known since Q[1] specfied)
MIT 18.S096
Regression Analysis
30
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
With r1,2 and Q[1] specfied we can solve for r2,2 :
=⇒
Q[2] r2,2 = X[2] − Q[1] r1,2
Take squared norm of both sides:
T
2 = XT X
2
r2,2
[2] [2] − 2r1,2 Q[1] X[2] + r1,2
(all terms on RHS are known)
With r2,2 specified
=⇒
1
Q[2] = r2,2
X[2] − r1,2 Q[1]
Etc. (solve for elements of R, and columns of Q)
MIT 18.S096
Regression Analysis
31
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
With the Q-R Decomposition
X = QR
(QT Q = Ip , and R is p × p upper-triangular)
β̂ = (XT X)−1 XT y = R−1 QT y
(plug in X = QR and simplify)
Cov (β̂) = σ 2 (XT X)−1 = σ 2 R−1 (R−1 )T
H = X(XT X)−1 XT = QQT
(giving ŷ = Hy and ˆ = (In − H)y)
MIT 18.S096
Regression Analysis
32
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
More Distribution Theory
Assume y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e.,
∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Theorem* For any (m × n) matrix A of rank m ≤ n, the random
normal vector y transformed by A,
z = Ay
is also a random normal vector:
z ∼ Nm (µz , Σz )
where
µz = AE (y) = AXβ,
and
Σz = ACov (y)AT = σ 2 AAT .
Earlier, A = (XT X)−1 XT yields the distribution of β̂ = Ay
With a different definition of A (and z) we give an easy proof of:
MIT 18.S096
Regression Analysis
33
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Theorem For the normal linear regression model
y = Xβ + ,
where
X (n × p) has rank p and
∼ Nn (0n , σ 2 In ).
(a)
(b)
(c)
(d)
β̂ = (XT X)−1 XT y and ˆ = y − Xβ̂ are independent r.v.s
β̂ ∼ N (β, σ 2 (XT X)−1 )
Pn p2
ˆi = ˆT ˆ ∼ σ 2 χ2n−p (Chi-squared r.v.)
i=1 For each j = 1, 2, . . . , p
ˆtj =
where
β̂j −βj
σ̂Cj,j ∼ tn−p (t− distribution)
1
Pn
σ̂ 2 = n−p
ˆ2i
i=1 Cj,j = [(XT X)−1 ]j,j
MIT 18.S096
Regression Analysis
34
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Proof: Note that (d) follows immediately from (a), (b), (c)
QT
Define A =
WT
, where
A is an (n × n) orthogonal matrix (i.e. AT = A−1 )
Q is the column-orthonormal matrix in a Q-R decomposition
of X
Note: W can be constructed by continuing the Gram-Schmidt
Orthonormalization process (which was used to construct Q from
X) with X∗ = [ X In ].
Then, consider
T Q y
zQ
(p × 1)
z = Ay =
=
T
(n − p) × 1
z
W y
W
MIT 18.S096
Regression Analysis
35
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
The distribution of z = Ay is Nn (µz , Σz )
where
T Q
µz = [A][Xβ] =
[Q · R · β]
WT
T
Q Q
=
[R · β]
T
W Q Ip
=
[R · β]
0
(n−p)×p R·β
=
0(n−p)×p
Σz = A · [σ 2 In ] · AT = σ 2 [AAT ] = σ 2 In
since AT = A−1
MIT 18.S096
Regression Analysis
36
Regression Analysis
Thus z =
=⇒
zQ
zW
∼ Nn
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Rβ
On−p
2
, σ In
zQ ∼ Np [(Rβ), σ 2 Ip ]
zW ∼ N(n−p) [(O(n−p) , σ 2 I(n−p) ]
and
zQ and zW are independent.
The Theorem follows by showing
(a*) β̂ = R−1 zQ and ˆ = WzW ,
(i.e. β̂ and ˆ are functions of different independent vecctors).
(b*) Deducing the distribution of β̂ = R−1 zQ ,
applying Theorem* with A = R−1 and “y” = zQ
(c*) ˆT ˆ = zW T zW
= sum of (n − p) squared r.v’s which are i.i.d. N(0, σ 2 ).
2
∼ σ 2 χ(n−p)
, a scaled Chi-Squared r.v.
MIT 18.S096
Regression Analysis
37
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Proof of (a*)
β̂ = R−1 zQ follows from
β̂ = (XT X)−1 Xy and
X = QR with Q : QT Q = Ip
ˆ =
=
=
=
=
y − ŷ = y − Xβ̂ = y − (QR) · (R−1 zQ )
y − QzQ
y − QQT y = (In − QQT )y
WWT y (since In = AT A = QQT + WWT )
WzW
MIT 18.S096
Regression Analysis
38
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.S096
Regression Analysis
39
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Maximum-Likelihood Estimation
Consider the normal linear regression model:
y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e.,
∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Definitions:
The likelihood function is
L(β, σ 2 ) = p(y | X, B, σ 2 )
where p(y | X, B, σ 2 ) is the joint probability density function
(pdf) of the conditional distribution of y given data X,
(known) and parameters (β, σ 2 ) (unknown).
The maximum likelihood estimates of (β, σ 2 ) are the values
maximizing L(β, σ 2 ), i.e., those which make the observed
data y most likely in terms of its pdf.
MIT 18.S096
Regression Analysis
40
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
2
Because
Ppthe yi are independent r.v.’s with yi ∼ N(µi , σ ) where
µi = j=1 βj xi,j ,
Qn
L(β, σ 2 ) =
p (yi | β, σ 2 )
i
P
Qni=1 h 1
− 1 2 (yi − j=1 βj xi,j )2
√
2σ
=
e
i=1
2
2πσ
− 12 (y−Xβ)T (σ 2 In )−1 (y−Xβ)
1
e
(2πσ 2 )n/2
likelihood estimates (β̂, σ̂ 2 ) maximize the
=
The maximum
log-likeliood function (dropping constant terms)
logL(β, σ 2 ) = − n2 log (σ 2 ) − 12 (y − Xβ)T (σ 2 In )−1 (y − Xβ)
= − n2 log (σ 2 ) − 2σ1 2 Q (β)
where Q(β) = (y − Xβ)T (y − Xβ) ( “Least-Squares Criterion”!)
The OLS estimate β̂ is also the ML-estimate.
The ML estimate
of σ 2 solves
2
∂log L(β̂ ,σ )
∂(σ 2 )
ˆ
2 =
=⇒ σML
= 0 ,i.e., − n2 σ12 − 21 (−1)(σ 2 )−2 Q(β̂) = 0
P
Q(β̂)/n = ( ni=1 ˆ2i )/n (biased!)
MIT 18.S096
Regression Analysis
41
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.S096
Regression Analysis
42
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Generalized M Estimation
For data y, X fit the linear regression model
y i = xT
i β + i , i = 1, 2, . . . , n.
by specifying β =P
β̂ to minimize
Q(β) = ni=1 h(yi , xi , β, σ 2 )
The choice of the function h( ) distinguishes different estimators.
2
(1) Least Squares: h(yi , xi , β, σ 2 ) = (yi − xT
i β)
(2) Mean Absolue Deviation (MAD): h(yi , xi , β, σ 2 ) = |yi − xiT β|
(3) Maximum Likelihood (ML): Assume the yi are independent
with pdf’s p(yi | β, xi , σ 2 ),
h(yi , xi , β, σ 2 ) = −log p(yi | β, xi , σ 2 )
(4) Robust M−Estimator: h(yi , xi , β, σ 2 ) = χ(yi − xT
i β)
χ( ) is even, monotone increasing on (0, ∞).
MIT 18.S096
Regression Analysis
43
Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
(5) Quantile Estimator: Forτ : 0 < τ < 1, a fixed quantile
τ |yi − xT
i β|, if yi ≥ xi β
h(yi , xi , β, σ 2 ) =
(1 − τ )|yi − xT
i β|, if yi < xi β
E.g., τ = 0.90 corresponds to the 90th quantile /
upper-decile.
τ = 0.50 corresponds to the MAD Estimator
MIT 18.S096
Regression Analysis
44
MIT OpenCourseWare
http://ocw.mit.edu
18.S096 Topics in Mathematics with Applications in Finance
Fall 2013
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download