14.387 Recitation 2 Probits, Logits, and 2SLS Peter Hull Fall 2014 1 Part 1 Nonlinear CEF Facts Part 1: Probits, Logits, Tobits, and other Nonlinear CEFs 2 Part 1 Nonlinear CEF Facts Going Latent (in Binary): Probits and Logits Scalar bernoulli yi , vector xi . Assume yi∗ = xif β + νi∗ yi = 1{yi∗ ≥ 0} yi∗ and νi ∗ : latent (unobserved) random variables Whats the CEF (E [yi |xi ])? Depends on the (conditional) CDF (of νi∗ ): E [yi |xi ] = P(yi∗ ≥ 0|xi ) = P(νi∗ ≥ −xif β |xi ) = 1 − Fν ∗ (−xif β ) = Fν ∗ (xif β ) Last line follows by CDF symmetry (usually assumed) Probit Fν ∗ () =? Logit Fν ∗ () =? Must the CEF actually be nonlinear? 3 Part 1 Nonlinear CEF Facts Nonlinear Estimation Two ways (at least) that (where "probably" 1 ≡ β is (probably) identifed "given some innocuous technical conditions") Maximum Likelihood (MLE): β MLE = arg max ∏ fy |x (yi |xi , β ) β i = arg max ∏ Fν ∗ (xif β )yi (1 − Fν ∗ (xif β ))1−yi β since 1 P(yi = 1|xi , β ) = Fν ∗ (xif β ) i and P(yi = 0|xi , β ) = 1 − Fν ∗ (xif β ) Nonlinear Least Squares (NLS) β NLS = arg min E (yi − Fν ∗ (xif β ))f β since E [yi |xi ] = Fν ∗ (xif β ) =⇒ yi = Fν ∗ (xif β ) + εi with E [εi ] = E [yi − E [yi |xi ]] = 0 As with OLS, minimize expected squared prediction error, E [εif ] 4 Part 1 Nonlinear CEF Facts Maximum Likelihood β MLE = arg max ∏ Fν ∗ (xif β )yi (1 − Fν ∗ (xif β ))1−yi β i = arg max ∑ yi ln(Fν ∗ (xif β )) + (1 − yi ) ln(1 − Fν ∗ (xif β )) β i F.O.C.: 0 fν ∗ (xif β MLE ) fν ∗ (xif β MLE ) x − ( 1 − y ) xi i i f Fν ∗ (xif β MLE ) 1 − Fν ∗ (xi β MLE ) i � � yi (1 − yi ) f MLE ∗ = ∑ − )xi f β MLE ) f β MLE ) fν (xi β ∗ ∗ F (x 1 − F (x ν ν i i i = ∑ yi (yi − Fν ∗ (xif β MLE ))fν ∗ (xif β MLE )xi = ∑ f MLE )(1 − F ∗ (x f β MLE )) ∗ ν i i Fν (xi β Plug-in estimator . MLE β solves this in the sample 5 Part 1 Nonlinear CEF Facts Nonlinear Least Squares β NLS = arg min E (yi − Fν ∗ (xif β ))f β F.O.C. (ignoring −2 0 Plug-in estimator factor): =E [(yi − Fν ∗ (xif β NLS ))fν ∗ (xif β NLS )xi ] NLS β0 = 1 N solves this in the sample NLS ))f (x f β NLS )x ν i ∑(yi − Fν (xif βi ∗ ∗ i Look familiar? 6 Part 1 Nonlinear CEF Facts MLE as Weighted Nonlinear Least Squares Weighted NLS (like weighted least squares): β wNLS = arg min E W (xi , yi )(yi − Fν ∗ (xif β ))f β for some (known) weight function 0 W (xi , yi ). F.O.C.? wNLS ))f ∗ (x f β. wNLS )x = ∑ W (xi , yi )(yi − Fν ∗ (xif β. ν i i i Recall 0 =∑ i . MLE β . . MLE ))f ∗ (x f β MLE )x (yi − Fν ∗ (xif β ν i i . . MLE )(1 − F ∗ (x f β MLE )) Fν ∗ (x f β ν i i is a weighted NLLS estimator! But with what weights? 7 Part 1 Nonlinear CEF Facts MLE as Weighted Nonlinear Least Squares (cont.) � �−1 . . MLE )(1 − F ∗ (x f β MLE )) W MLE (xi , yi ) = Fν ∗ (xif β ν i . MLE infeasible as one-step wNLS estimator (β MLE on both right and left of optimization) But recall another infeasible estimator � � yi − xif β f GLS β = arg min ∑ Vε (xi ) β i where Vε (xi ) is the conditional variance of εi . (depends on βjGLS ) We make GLS feasible by taking a frst-step consistent estimate of Vε (xi ) (by, say OLS), then solving yi − xif β FGLS = arg min β. ∑ β . V i ε (xi ) f 8 Part 1 Nonlinear CEF Facts MLE as Weighted Nonlinear Least Squares (cont.) W MLE (xi , yi ) = Because yi 1 . . MLE )(1 − F ∗ (x f β MLE )) Fν ∗ (xif β ν i 1 = V. ν ∗ (xi ) is bernoulli. Can take frst-step consistent estimate of W MLE (xi , yi ) (by, say NLS) . then solving wNLS FOC to get β MLE1 . . . . Use β MLE1 to get W MLE1 → β MLEE → W MLEE → ... . Iterating to convergence gives β MLE 9 Part 1 Nonlinear CEF Facts Going Latent (with Truncation): Tobit Assume yi = max(0, xif β + εi ) εi ∼ N(0, σ f ) Useful normal fact: if w ∼ N(µ, σ f ) E [w |w > c] = µ + σ φ Φ µ−c σ µ−c σ and and c fxed, E [w |w < c] = µ − σ φ Φ c−µ σ c−µ σ CEF: E [yi |xi ] = E [yi |xi , yi = 0]P(yi = 0|xi ) + E [yi |xi , yi > 0]P(yi > 0|xi ) = (xif β + E [εi |xi , εi > −xif β ])P(εi > −xif β |xi ) � � φ (xif β /σ ) f = xi β + σ Φ(xif β /σ ) Φ(xif β /σ ) = xif β Φ(xif β /σ ) + σ φ (xif β /σ ) 10 Part E IV/ESLS Facts Part 2: Some Facts about IV and 2SLS 11 Part E IV/ESLS Facts Matrix-y IV Setup: n × 1 vector Y , n × r "endogenous" n × s matrix of "controls" Xf , n × t matrix X1 matrix of "instruments" Z1 X1 = Z1 π1 + Xf πf + ν (1) Y = X1 β1 + Xf βf + ε (2) Terminology: (1) (2) the second stage. (2) gives the reduced form: the frst stage; Plugging (1) into y = (Z π1 + Xf πf + ν)β1 + Xf βf + ε = Z1 (π1 β1 ) + Xf (πf β1 + βf ) + (νβ1 + ε) t ≥ r (just-identifed if t = r ) f restriction: E [Z ε] = 0 (weak), E [ε|Z ] = 0 Model is identifed if Exclusion (strong) 12 Part E IV/ESLS Facts Matrix-y IV (cont.) X1 = Z1 π1 + Xf πf + ν Y = X1 β1 + Xf βf + ε Defne: X ≡ X1 Xf , n × (r + s) Z ≡ Z1 Xf , n × (t + s) Also defne: PZ ≡ Z (Z f Z )−1 Z f , Pf ≡ Xf (Xff Xf )−1 Xf , Mf ≡ I − Pf What�s PZ Z =? PZ X =? Mf Xf =? PZ PZ =? PZ f =? 13 Part E IV/ESLS Facts 2SLS is an IV Estimator IV Estimator: IV ≡ (W f X )−1 W f Y βj W ≡ ZA where A = (t + s) × (r + s) is some (possibly random) matrix. Note that when we�re just-identifed (t = r) A is (probably) invertible, so IV ≡ (Af Z f X )−1 Af Z f Y βj = (Z f X )−1 Af−1 Af Z f Y = (Z f X )−1 Z f Y =⇒ all IV estimators are (numerically) equivalent when just-id Two-Stage Least Squares sets A ≡ (Z f Z )−1 Z f X . What�s W? 14 Part E IV/ESLS Facts 2SLS is a second-stage WLS/OLS regression Two-Stage Least Squares is fSLS = ((Z (Z f Z )−1 Z f X )f X )−1 (Z (Z f Z )−1 Z f X )f Y β. = ((PZ X )f X )−1 (PZ X )f Y = (X f PZ X )−1 X f PZ Y f = ((PZ X ) PZ X ) −1 (3) f (PZ X ) Y (4) (some kinda) Weighted Least Squares, by (3). What are the weights doing? (some kinda) Ordinary Least Squares, by (4). What are the regressors? 15 Part E IV/ESLS Facts Just-I0 IV is "reduced-form over frst-stage" fSLS β. is OLS of r =1 Mf PZ X When Y on PZ X (one endogenous regressor), fSLS β. 1 is bivariate OLS of Y on ∗ Cov (yi , x̂1∗i ) p Cov (yi , x̂1i ) fSLS → β. − = 1 Var (x̂1∗i ) Cov (x1∗i , x̂1∗i ) When t =r (just-identifed), Var (x̂1∗i ) = π1f Var (Z1∗i ) Cov (yi , x̂1∗i ) = Cov ((π1 β1 )Z1 + Xf (πf β1 + βf ) + (νβ1 + ε), πZi∗ ) = π1f β Var (Z1∗i ) so that RF f f p π βσ ∗ fSLS → − 1f fZ β. 1 π1 σZ ∗ z}|{ π β = π11 �β |{z} FS 16 MIT OpenCourseWare http://ocw.mit.edu 14.387 Applied Econometrics: Mostly Harmless Big Data Fall 2014 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.