Document 13434533

advertisement
Regression Analysis II
Regression Analysis
MIT 18.443
Dr. Kempthorne
Spring 2015
MIT 18.443
Regression Analysis II
1
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.443
Regression Analysis II
2
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Marginal Distributions of Least Squares Estimates
Because
β̂ ∼ Np (β, σ 2 (XT X)−1 )
the marginal distribution of each β̂j is:
β̂j ∼ N(βj , σ 2 Cj,j )
where Cj.j = jth diagonal element of (XT X)−1
MIT 18.443
Regression Analysis II
3
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
The Q-R Decomposition of X
Consider expressing the (n × p) matrix X of explanatory variables
as
X=Q·R
where
Q is an (n × p) orthonormal matrix, i.e., QT Q = Ip .
R is a (p × p) upper-triangular matrix.
The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by
performing the Gram-Schmidt Orthonormalization procedure on
the columns of X = [X[1] , X[2] , . . . , X[p] ]
MIT 18.443
Regression Analysis II
4
Regression Analysis II
⎡
If
r1,1 r1,2 · · ·
⎢ 0
r2,2 · · ·
⎢
⎢
..
R =
⎢ 0
.
0
⎢
⎣
0
0
0
0
···
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
r1,p−1
r2,p−1
..
.
r1,p
r2,p
...
rp−1,p−1 rp−1,p
0
rp,p
⎤
⎥
⎥
⎥
⎥, then
⎥
⎦
X[1] = Q[1] r1,1
=⇒
2
r1,1
= XT
[1] X[1]
Q[1] = X[1] /r1,1
X[2] = Q[1] r1,2 + Q[2] r2,2
=⇒
T
T
QT
[1] X[2] = Q[1] Q[1] r1,2 + Q[1] Q[2] r2,2
= 1 · r1,2 + 0 · r2,2
= r1,2 (known since Q[1] specfied)
MIT 18.443
Regression Analysis II
5
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
With r1,2 and Q[1] specfied we can solve for r2,2 :
=⇒
Q[2] r2,2 = X[2] − Q[1] r1,2
Take squared norm of both sides:
T
2 = XT X
2
r2,2
[2] [2] − 2r1,2 Q[1] X[2] + r1,2
(all terms on RHS are known)
With r2,2 specified
=⇒
�
�
1
Q[2] = r2,2
X[2] − r1,2 Q[1]
Etc. (solve for elements of R, and columns of Q)
MIT 18.443
Regression Analysis II
6
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
With the Q-R Decomposition
X = QR
(QT Q = Ip , and R is p × p upper-triangular)
β̂ = (XT X)−1 XT y = R−1 QT y
(plug in X = QR and simplify)
Cov (β̂) = σ 2 (XT X)−1 = σ 2 R−1 (R−1 )T
H = X(XT X)−1 XT = QQT
(giving ŷ = Hy and ˆ� = (In − H)y)
MIT 18.443
Regression Analysis II
7
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
More Distribution Theory
Assume y = Xβ + �, where {Ei } are i.i.d. N(0, σ 2 ), i.e.,
� ∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Theorem* For any (m × n) matrix A of rank m ≤ n, the random
normal vector y transformed by A,
z = Ay
is also a random normal vector:
z ∼ Nm (µz , Σz )
where
µz = AE (y) = AXβ,
and
Σz = ACov (y)AT = σ 2 AAT .
Earlier, A = (XT X)−1 XT yields the distribution of β̂ = Ay
With a different definition of A (and z) we give an easy proof of:
MIT 18.443
Regression Analysis II
8
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Theorem For the normal linear regression model
y = Xβ + �,
where
X (n × p) has rank p and
� ∼ Nn (0n , σ 2 In ).
(a) β̂ = (XT X)−1 XT y and ˆ� = y − Xβ̂ are independent r.v.s
ˆ ∼ Np (β, σ 2 (XT X)−1 )
(b) β
�n 2
2
(c)
�T ˆ� ∼ σ 2 χn−p
(Chi-squared r.v.)
i=1 Êi = ˆ
(d) For each j = 1, 2, . . . , p
t̂j =
where
β̂j −βj
σ̂Cj,j ∼ tn−p (t− distribution)
1
�n
σ̂ 2 = n−p
ˆi2
i=1 �
Cj,j = [(XT X)−1 ]j,j
MIT 18.443
Regression Analysis II
9
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Proof: Note that (d) follows immediately from (a), (b), (c)
QT
Define A =
WT
�
�
, where
A is an (n × n) orthogonal matrix (i.e. AT = A−1 )
Q is the column-orthonormal matrix in a Q-R decomposition
of X
Note: W can be constructed by continuing the Gram-Schmidt
Orthonormalization process (which was used to construct Q from
X) with X∗ = [ X In ].
Then, consider
� T � �
�
Q y
zQ
(p × 1)
=
z = Ay =
T
(n − p) × 1
zW
W y
MIT 18.443
Regression Analysis II
10
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
The distribution of z = Ay is Nn (µz , Σz )
where
� T �
Q
µz = [A][Xβ] =
[Q · R · β]
WT
� T
�
Q Q
=
[R · β]
T
� W Q �
Ip
=
[R · β]
0(n−p)×p
�
�
R·β
=
0(n−p)×p
Σz = A · [σ 2 In ] · AT = σ 2 [AAT ] = σ 2 In
since AT = A−1
MIT 18.443
Regression Analysis II
11
Regression Analysis II
�
Thus z =
=⇒
zQ
zW
�
��
∼ Nn
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Rβ
On−p
�
2
�
, σ In
zQ ∼ Np [(Rβ), σ 2 Ip ]
zW ∼ N(n−p) [(O(n−p) , σ 2 I(n−p) ]
and
zQ and zW are independent.
The Theorem follows by showing
(a*) β̂ = R−1 zQ and ˆ� = WzW ,
ˆ and ˆ� are functions of different independent vecctors).
(i.e. β
(b*) Deducing the distribution of β̂ = R−1 zQ ,
applying Theorem* with A = R−1 and “y” = zQ
(c*) ˆ�T ˆ� = zW T zW
= sum of (n − p) squared r.v’s which are i.i.d. N(0, σ 2 ).
∼ σ 2 χ2(n−p) , a scaled Chi-Squared r.v.
MIT 18.443
Regression Analysis II
12
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Proof of (a*)
β̂ = R−1 zQ follows from
β̂ = (XT X)−1 Xy and
X = QR with Q : QT Q = Ip
ˆ� =
=
=
=
=
y − ŷ = y − Xβ̂ = y − (QR) · (R−1 zQ )
y − QzQ
y − QQT y = (In − QQT )y
WWT y (since In = AT A = QQT + WWT )
WzW
MIT 18.443
Regression Analysis II
13
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.443
Regression Analysis II
14
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Maximum-Likelihood Estimation
Consider the normal linear regression model:
y = Xβ + �, where {Ei } are i.i.d. N(0, σ 2 ), i.e.,
� ∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Definitions:
The likelihood function is
L(β, σ 2 ) = p(y | X, B, σ 2 )
where p(y | X, B, σ 2 ) is the joint probability density function
(pdf) of the conditional distribution of y given data X,
(known) and parameters (β, σ 2 ) (unknown).
The maximum likelihood estimates of (β, σ 2 ) are the values
maximizing L(β, σ 2 ), i.e., those which make the observed
data y most likely in terms of its pdf.
MIT 18.443
Regression Analysis II
15
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
2
Because
�pthe yi are independent r.v.’s with yi ∼ N(µi , σ ) where
µi = j=1
βj xi,j ,
Qn
2
L(β, σ 2 ) =
p
�
� (yi | β, σ 1)
�
Qi=1
−
(y − j=1 βj xi,j )2
n
√ 1 e 2σ2 i
=
i=1
2
2πσ
− 12 (y−Xβ)T (σ 2 In )−1 (y−Xβ)
1
e
(2πσ 2 )n/2
likelihood estimates (β̂, σ̂ 2 ) maximize the
=
The maximum
log-likeliood function (dropping constant terms)
logL(β, σ 2 ) = − n2 log (σ 2 ) − 12 (y − Xβ)T (σ 2 In )−1 (y − Xβ)
= − n2 log (σ 2 ) − 2σ1 2 Q(β)
where Q(β) = (y − Xβ)T (y − Xβ) ( “Least-Squares Criterion”!)
The OLS estimate β̂ is also the ML-estimate.
The ML estimate
of σ 2 solves
2
∂log L(β̂,σ )
∂(σ 2 )
2̂ =
=⇒ σML
= 0 ,i.e., − n2 σ12 − 12 (−1)(σ 2 )−2 Q(β̂) = 0
�n
Q(β̂)/n = ( i=1
�ˆi2 )/n (biased!)
MIT 18.443
Regression Analysis II
16
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
MIT 18.443
Regression Analysis II
17
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Generalized M Estimation
For data y, X fit the linear regression model
yi = xT
i β + Ei , i = 1, 2, . . . , n.
ˆ to minimize
by specifying β =�
β
Q(β) = ni=1 h(yi , xi , β, σ 2 )
The choice of the function h( ) distinguishes different estimators.
(1) Least Squares: h(yi , xi , β, σ 2 ) = (yi − xiT β)2
(2) Mean Absolue Deviation (MAD): h(yi , xi , β, σ 2 ) = |yi − xT
i β|
(3) Maximum Likelihood (ML): Assume the yi are independent
with pdf’s p(yi | β, xi , σ 2 ),
h(yi , xi , β, σ 2 ) = −log p(yi | β, xi , σ 2 )
(4) Robust M−Estimator: h(yi , xi , β, σ 2 ) = χ(yi − xT
i β)
χ( ) is even, monotone increasing on (0, ∞).
MIT 18.443
Regression Analysis II
18
Regression Analysis II
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
(5) Quantile Estimator: ForTτ : 0 < τ < 1, a fixed quantile
τ |yi − xT
i β|, if yi ≥ xi β
h(yi , xi , β, σ 2 ) =
(1 − τ )|yi − xT
i β|, if yi < xi β
E.g., τ = 0.90 corresponds to the 90th quantile /
upper-decile.
τ = 0.50 corresponds to the MAD Estimator
MIT 18.443
Regression Analysis II
19
MIT OpenCourseWare
http://ocw.mit.edu
18.443 Statistics for Applications
Spring 2015
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download