14.387 Recitation 2 Probits, Logits, and 2SLS Fall 2014 Peter Hull

advertisement
14.387 Recitation 2
Probits, Logits, and 2SLS
Peter Hull
Fall 2014
1
Part 1 Nonlinear CEF Facts
Part 1: Probits, Logits, Tobits, and other Nonlinear CEFs
2
Part 1 Nonlinear CEF Facts
Going Latent (in Binary): Probits and Logits
Scalar bernoulli
yi ,
vector
xi .
Assume
yi∗ = xif β + νi∗
yi = 1{yi∗ ≥ 0}
yi∗
and
νi
∗ :
latent (unobserved) random variables
Whats the CEF (E [yi |xi ])?
Depends on the (conditional) CDF (of
νi∗ ):
E [yi |xi ] = P(yi∗ ≥ 0|xi )
= P(νi∗ ≥ −xif β |xi )
= 1 − Fν ∗ (−xif β )
= Fν ∗ (xif β )
Last line follows by CDF symmetry (usually assumed)
Probit
Fν ∗ () =?
Logit
Fν ∗ () =?
Must the CEF actually be nonlinear?
3
Part 1 Nonlinear CEF Facts
Nonlinear Estimation
Two ways (at least) that
(where "probably"
1
≡
β
is (probably) identifed
"given some innocuous technical conditions")
Maximum Likelihood (MLE):
β MLE = arg max ∏ fy |x (yi |xi , β )
β
i
= arg max ∏ Fν ∗ (xif β )yi (1 − Fν ∗ (xif β ))1−yi
β
since
1
P(yi = 1|xi , β ) = Fν ∗ (xif β )
i
and
P(yi = 0|xi , β ) = 1 − Fν ∗ (xif β )
Nonlinear Least Squares (NLS)
β NLS = arg min E (yi − Fν ∗ (xif β ))f
β
since
E [yi |xi ] = Fν ∗ (xif β ) =⇒ yi = Fν ∗ (xif β ) + εi
with
E [εi ] = E [yi − E [yi |xi ]] = 0
As with OLS, minimize expected squared prediction error,
E [εif ]
4
Part 1 Nonlinear CEF Facts
Maximum Likelihood
β MLE = arg max ∏
Fν ∗ (xif β )yi (1 − Fν ∗ (xif β ))1−yi
β
i
= arg max ∑
yi ln(Fν ∗ (xif β )) + (1 − yi ) ln(1 − Fν ∗ (xif β ))
β
i
F.O.C.:
0
fν ∗ (xif β MLE )
fν ∗ (xif β MLE )
x
−
(
1
−
y
)
xi
i
i
f
Fν ∗ (xif β MLE )
1 − Fν ∗ (xi β MLE )
i
�
�
yi
(1 − yi )
f MLE
∗
=
∑
−
)xi
f β MLE )
f β MLE ) fν (xi β
∗
∗
F
(x
1
−
F
(x
ν
ν
i
i
i
=
∑
yi
(yi − Fν ∗ (xif β MLE ))fν ∗ (xif β MLE )xi
=
∑
f MLE )(1 − F ∗ (x f β MLE ))
∗
ν
i
i Fν (xi β
Plug-in estimator
.
MLE
β
solves this in the sample
5
Part 1 Nonlinear CEF Facts
Nonlinear Least Squares
β NLS = arg min E (yi − Fν ∗ (xif β ))f
β
F.O.C. (ignoring
−2
0
Plug-in estimator
factor):
=E [(yi − Fν ∗ (xif β NLS ))fν ∗ (xif β NLS )xi ]
NLS
β0
=
1
N
solves this in the sample
NLS ))f (x f β
NLS )x
ν
i
∑(yi − Fν (xif βi
∗
∗
i
Look familiar?
6
Part 1 Nonlinear CEF Facts
MLE as Weighted Nonlinear Least Squares
Weighted NLS (like weighted least squares):
β wNLS = arg min E W (xi , yi )(yi − Fν ∗ (xif β ))f
β
for some (known) weight function
0
W (xi , yi ).
F.O.C.?
wNLS ))f ∗ (x f β.
wNLS )x
= ∑ W (xi , yi )(yi − Fν ∗ (xif β.
ν
i
i
i
Recall
0
=∑
i
.
MLE
β
.
.
MLE ))f ∗ (x f β
MLE )x
(yi − Fν ∗ (xif β
ν
i
i
.
.
MLE )(1 − F ∗ (x f β
MLE ))
Fν ∗ (x f β
ν
i
i
is a weighted NLLS estimator! But with what weights?
7
Part 1 Nonlinear CEF Facts
MLE as Weighted Nonlinear Least Squares (cont.)
�
�−1
.
.
MLE )(1 − F ∗ (x f β
MLE ))
W MLE (xi , yi ) = Fν ∗ (xif β
ν
i
.
MLE infeasible as one-step wNLS estimator (β MLE on both right and
left of optimization)
But recall another infeasible estimator
�
�
yi − xif β f
GLS
β
= arg min ∑
Vε (xi )
β
i
where
Vε (xi )
is the conditional variance of
εi .
(depends on
βjGLS )
We make GLS feasible by taking a frst-step consistent estimate of
Vε (xi )
(by, say OLS), then solving
yi − xif β
FGLS = arg min
β.
∑
β
.
V
i
ε (xi )
f
8
Part 1 Nonlinear CEF Facts
MLE as Weighted Nonlinear Least Squares (cont.)
W MLE (xi , yi ) =
Because
yi
1
.
.
MLE )(1 − F ∗ (x f β
MLE ))
Fν ∗ (xif β
ν
i
1
=
V.
ν ∗ (xi )
is bernoulli.
Can take frst-step consistent estimate of
W MLE (xi , yi )
(by, say NLS)
.
then solving wNLS FOC to get β MLE1
.
.
.
.
Use β MLE1 to get W MLE1 → β MLEE → W MLEE → ...
.
Iterating to convergence gives β MLE
9
Part 1 Nonlinear CEF Facts
Going Latent (with Truncation): Tobit
Assume
yi = max(0, xif β + εi )
εi ∼ N(0, σ f )
Useful normal fact: if
w ∼ N(µ, σ f )
E [w |w > c] = µ + σ
φ
Φ
µ−c
σ
µ−c
σ
and
and
c
fxed,
E [w |w < c] = µ − σ
φ
Φ
c−µ
σ
c−µ
σ
CEF:
E [yi |xi ] = E [yi |xi , yi = 0]P(yi = 0|xi ) + E [yi |xi , yi > 0]P(yi > 0|xi )
= (xif β + E [εi |xi , εi > −xif β ])P(εi > −xif β |xi )
�
�
φ (xif β /σ )
f
= xi β + σ
Φ(xif β /σ )
Φ(xif β /σ )
= xif β Φ(xif β /σ ) + σ φ (xif β /σ )
10
Part E IV/ESLS Facts
Part 2: Some Facts about IV and 2SLS
11
Part E IV/ESLS Facts
Matrix-y IV
Setup:
n × 1 vector Y , n × r "endogenous"
n × s matrix of "controls" Xf , n × t
matrix
X1
matrix of "instruments"
Z1
X1 = Z1 π1 + Xf πf + ν
(1)
Y = X1 β1 + Xf βf + ε
(2)
Terminology:
(1)
(2) the second stage.
(2) gives the reduced form:
the frst stage;
Plugging
(1)
into
y = (Z π1 + Xf πf + ν)β1 + Xf βf + ε
= Z1 (π1 β1 ) + Xf (πf β1 + βf ) + (νβ1 + ε)
t ≥ r (just-identifed if t = r )
f
restriction: E [Z ε] = 0 (weak), E [ε|Z ] = 0
Model is identifed if
Exclusion
(strong)
12
Part E IV/ESLS Facts
Matrix-y IV (cont.)
X1 = Z1 π1 + Xf πf + ν
Y = X1 β1 + Xf βf + ε
Defne:
X ≡ X1 Xf , n × (r + s)
Z ≡ Z1 Xf , n × (t + s)
Also defne:
PZ ≡ Z (Z f Z )−1 Z f , Pf ≡ Xf (Xff Xf )−1 Xf , Mf ≡ I − Pf
What�s
PZ Z =? PZ X =? Mf Xf =? PZ PZ =? PZ f =?
13
Part E IV/ESLS Facts
2SLS is an IV Estimator
IV Estimator:
IV ≡ (W f X )−1 W f Y
βj
W ≡ ZA
where
A = (t + s) × (r + s)
is some (possibly random) matrix.
Note that when we�re just-identifed (t
= r) A
is (probably) invertible, so
IV ≡ (Af Z f X )−1 Af Z f Y
βj
= (Z f X )−1 Af−1 Af Z f Y = (Z f X )−1 Z f Y
=⇒ all
IV estimators are (numerically) equivalent when just-id
Two-Stage Least Squares sets
A ≡ (Z f Z )−1 Z f X .
What�s
W?
14
Part E IV/ESLS Facts
2SLS is a second-stage WLS/OLS regression
Two-Stage Least Squares is
fSLS = ((Z (Z f Z )−1 Z f X )f X )−1 (Z (Z f Z )−1 Z f X )f Y
β.
= ((PZ X )f X )−1 (PZ X )f Y
= (X f PZ X )−1 X f PZ Y
f
= ((PZ X ) PZ X )
−1
(3)
f
(PZ X ) Y
(4)
(some kinda) Weighted Least Squares, by (3). What are the weights
doing?
(some kinda) Ordinary Least Squares, by (4). What are the regressors?
15
Part E IV/ESLS Facts
Just-I0 IV is "reduced-form over frst-stage"
fSLS
β.
is OLS of
r =1
Mf PZ X
When
Y
on
PZ X
(one endogenous regressor),
fSLS
β.
1
is bivariate OLS of
Y
on
∗
Cov (yi , x̂1∗i )
p Cov (yi , x̂1i )
fSLS →
β.
−
=
1
Var (x̂1∗i )
Cov (x1∗i , x̂1∗i )
When
t =r
(just-identifed),
Var (x̂1∗i ) = π1f Var (Z1∗i )
Cov (yi , x̂1∗i ) = Cov ((π1 β1 )Z1 + Xf (πf β1 + βf ) + (νβ1 + ε), πZi∗ )
= π1f β Var (Z1∗i )
so that
RF
f
f
p π βσ ∗
fSLS →
− 1f fZ
β.
1
π1 σZ ∗
z}|{
π β
= π11 �β
|{z}
FS
16
MIT OpenCourseWare
http://ocw.mit.edu
14.387 Applied Econometrics: Mostly Harmless Big Data
Fall 2014
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download