Lecture 10 – Estimating Nonlinear Regression Models References: Greene, Econometric Analysis, Chapter 10 Consider the following regression model: yt = f(xt, β) + εt t = 1,…,T xt is kx1 for each t, β is an rx1constant vector, εt is an unobservable error process and f is a (“sufficiently well-behaved”) function - f: RkxRr →R. So, each y is a (fixed) function of x and β plus an additive error term, ε. 2 Example: yt 1 xt t The estimation problem: given f, y1,…,yT, and x1,…,xT, estimate β. The solution: estimate β by LS (NLS), ML, or GMM. The “problem”? In contrast to the linear regression case, the FOCs are nonlinear and so, in general, numerical methods must be applied to obtain (consistent) point estimates. Also, the avar matrice of β-hat will have a slightly more complicated form. Nonlinear models are commonly encountered in applied economics – largely because advances in computational mathematics and desktop/laptop computer technology have made solving nonlinear optimization problems more feasible and more reliable. Nonlinear Least Squares (NLS) Choose β-hat to minimize the SSR – SSR( ˆ ) T (y t 1 t f ( xt , ˆ )) 2 FOCs – g ( ˆ ) f ( xt , ˆ ) ˆ ( yt f ( xt , )) 0 t 1 T which form a set of r nonlinear equations in the r unknowns, ˆ1 ,..., ˆr . [In the case where f is linear in the β’s, the derivative vector df/dβ = [ x1t…xrt], r = k.] 2 y x Example: t 1 t t g ( ˆ ) T ˆ x ˆ2 )[ x ˆ2 ˆ ˆ x ˆ2 1 ]' 0 ( y t 1t t 2 1t t 1 Computing the NLS Estimator – In general, these FOCs must be solved numerically to find the NLS estimator of β, ̂ NLS . (E.g., the GaussNewton procedure described in Greene, 10.2.3.) Some issues – - choice of algorithm selecting an initial value for β-hat convergence criteria local vs. global min Asymptotic Properties of the NLS Estimator – If the x’s are weakly exogenous the errors are serially uncorrelated and homoskedastic the function f is sufficiently smooth the {xt,εt} process is sufficiently well-behaved then T 1 / 2 ( ˆT , NLS ) N (0, 2 Q 1 ) D where σ2 = var(εt) Q p lim( 1/ T )QT , T QT [f ( xt , ˆT ) / ][f ( xt , ˆT ) / ' ] t 1 The NLS estimator is (under appropriate conditions), consistent, asymptotically normal and asymptotically efficient. Inference: For large samples act as though ˆNLS ~ N ( ,ˆ 2QT 1 ) T ˆ (1 / T ) ˆt2 2 1 If the disturbances are heteroskedastic and/or serially correlated the NLS estimator will be consistent but not asymptotically efficient. Also, the correct form of the asymptotic variance matrix of the NLS estimator requires a heteroskedasticity and/or autocorrelation correction. Heteroskedasticity and HAC estimators of the variance-covariance matrix of ε can be used if the exact forms of the heteroskedasticity and autorcorrelation are not know. If the form of the heteroskedasticity and/or serial correction is known up to a small number of parameters (e.g., εt is known to be an AR(1) process with unknown ρ) then nonlinear GLS or (quasi)maximum likelihood will be asymptotically efficient estimators. Example – GNLS Suppose E(εε’) = Σ. Then the GNLS estimator of β is the value of ˆ that minimizes the weighted SSR: [y-f(x, ˆ )]’ Σ-1[y-f(x, ˆ )] If Σ then it can be replaced with a consistent estimator to obtain the FGNLS estimator. (What consistent estimator of Σ?) If the regressors are correlated with the errors, none of these estimators is consistent (even if the errors are homoskedastic and serially uncorrelated). A consistent, semi-parametrically efficient estimator that does not rely on knowledge of the form/existence of heteroskedasticity/autocorrelation and allows for endogenous regressors: Nonlinear GMM In addition, GMM provides a semi-parametric alternative to MLE for nonlinear models that do not fit the nonlinear regression format. GMM in the nonlinear regression model – Consider the population moment conditions: E[wt(yt – f(xt,β))] = 0 for all t where wt is an instrument vector. The GMM estimator: choose ˆ to make the corresponding sample moments 1 T wt ( yt f ( xt , ˆ )) T 1 close to zero. As in the linear case, this will involve minimizing an optimally weighted quadratic form in these moments. GMM in a more general nonlinear setting Hansen and Singleton’s (Econometrica, 1982) Consumption-Based Asset Pricing Model At the start of each time period t, a representative agent chooses consumption and saving to maximize expected discounted utility: E[ U (ct i ) t ] , U (ct ) i ct 1 i 0 ,0<γ<1 At the start of t, the agent can allocate income to purchase the consumption good or N assets with maturities 1,2,…,N according to the sequence of budget constraints N N j 1 j 1 ct p j ,t q j ,t rj ,t q j ,t j wt where pj,t = price of a unit of asset j (i.e., an asset that matures in t+j) in period t qj,t = units of asset j purchased in period t rj,t = payoff in period t of asset j purchased in t-j wt = labor income in period t Unknown parameters in this model- δ,γ The optimal consumption path must satisfy the sequence of Euler equations: E[ j (rj ,t j / p j ,t )(ct j / ct ) 1 t ] 0 Let zt be any vector in Ωt. Then the Euler equations imply the following set of moment conditions which form the basis for estimating δ and γ by GMM – E[ j (rj ,t j / p j ,t )(ct j / ct ) 1 zt ] 0 for all t and j=1,...N GMM: Choose δ and γ to make the j sample moments 1 T (rj ,t j / p j ,t )(ct j / ct ) 1 zt , j = 1,…,N T 1 j close to zero. The alternative – MLE: specify the joint distribution for {(rj,t+j/pj,t+j,ct+j/ct), j = 1,2,…,N} then maximize the corresponding likelihood function subject to the Euler equations. (See, e.g., Hansen and Singleton, JPE, 1983).