The Linear Regression Model with Autocorrelated Disturbances: Large Sample Theory (References– Greene, Chapter 13; White, Chapter 6) Now we will consider estimation and inference in the linear regression model with autocorrelated disturbances from the point of view of large sample theory, which is the more modern way to think about these. First, consider the problem of testing for autocorrelation. In the finite sample case we have a single approach (the DW test), which relies on strictly exogenous regressors and homoskedastic disturbances. Not surprisingly, evidence of autocorrelation is much easier to find in very large samples. Consequently, we have a variety of asymptotically valid approaches, which allow for predetermined or even, in some cases, simply orthogonal regressors and, in some cases, heteroskedastic disturbances. Testing for serially correlated disturbances in regressions with strictly exogenous regressors and homoskedastic disturbances – Durbin-Watson test Box-Pierce, Ljung-Box Q tests Fit the regression ˆt ˆt 1 ut or ˆt 1ˆt 1 ... p ˆt p u t by OLS, where the ε-hats are OLS residuals from the regression of y on x, then apply a standard t-test or F-test to test H0: ρ = 0 or H0: ρ1 =…= ρp = 0. (Heteroskedasticity-robust versions of this test exist.) Testing for serially correlated disturbances in regressions with predetermined (or strictly exogenous) regressors and homoskedastic disturbances – Durbin h-test Modified Box-Pierce Q Test (Durbin-) Breusch-Godfrey Test Breush-Godfrey Test – Fit the regression ˆt xt' b ˆt 1 ut Under the null of no serial correlation in the ε’s, the t-statistic associated with the OLS estimator of ρ is asymptotically N(0,1). (Durbin) Or, fit the regression ˆt xt' b 1ˆt 1 ... pˆt p ut Under the null of no serial correlation in the ε’s, the LM-statistic, (T-p)R2, is asymptotically χ2(p). (Heteroskedasticity-robust versions of this test exist.) Next we turn to estimation and inference. Suppose we conclude that the disturbances are autocorrelated. Then we can either apply OLS and use a properly-adjusted variance matrix for ̂ OLS or apply an asymptotically efficient procedure, i.e., FGLS. Correcting the OLS estimator for autocorrelation – When the regressors and disturbances are autocorrelated but meet appropriate moment and memory conditions (e.g., orthogonality of regressors and disturbances, stationarity, asymptotically uncorrelated disturbances, the existence of k-th moments for sufficiently large k), then T ( ˆT ) N (0, a var( ˆ )) , d where a var( ˆ ) XX1 S XX1 xx E ( xt xt' ) S= E ( t2 xt xt' ) + ' ' E [( )( x x x x t t i t t i t i t ] i 1 {Note that the difference between this and the asymptotic variance matrix we derived earlier under the i.ni.d. assumption, i.e., the assumption that the ε’s form a conditionally heteroskedastic m.d.s., is the second part of S.} To apply this result we need a consistent estimator of the asymptotic variance matrix. Under appropriate conditions we know that we can apply an LLN (e.g., the Ergodic Theorem) to show that 1 T xt xt' XX . T 1 (a.s. or plim) We need a consistent estimator of S A popular and commonly available consistent nonparametric estimator of S is the Newey-West heteroskedasticity-autocorrelation consistent (HAC) estimator: 1 T 2 ' M T ˆ S NW { ˆt xt xt wmˆt ˆt m ( xt xt' m xt m xt' )} T 1 m 1 t s 1 where wm = 1-m/(1+M), m = 1,…,m. Practical issue – selecting the proper M. (This is similar to the lag length selection problem in fitting AR’s and VAR’s) Note that this is a nonparametric approach – it does not require us to specify a parametric model of the disturbance process nor does it require exogenous regressors. However, this approach is not asymptotically efficient. An asymptotically efficient estimator when the regressors are strictly exogenous is the FGLS estimator, which relies on the (correct) specificiation of a parametric model of the disturbance process. Recall that the FGLS estimator of β is ˆ 1 X )1 X ' ˆ 1Y ˆFGLS ( X ' where ̂ is any consistent estimator of Ω (Σ=σ2 Ω = E(εε’), where σ is an arbitrary constant). Under appropriate additional conditions on the regressors and the disturbances, the FGLS estimator is consistent, asymptotically normal and asymptotically efficient with ˆ 1 X / T )1 a var( ˆFGLS ) ˆ 2 ( X ' In the FGLS approach, a parametric model of the ε’s is formulated and estimated to obtain ̂ . That is, Ω=Ω(γ), γ an unknown paramter vector, and ˆ (ˆ ). Suppose, for example, assume that the error process is a stationary AR(1) process, i.e., εt = ρεt-1 + vt , │ρ│< 1 where vt is a white noise process with variance σv2. Without loss of generality, let’s assume that there is a single regressor so that: yt = β0 + β1xt + εt εt = ρεt-1 + vt , │ρ│< 1 vt ~ wn(0,σv2) In this case, it can be shown that ( ) 1 2 1 T 1 T 2 T 1 T 2 1 T 3 and C’C = Ω-1 , where C = C(ρ) = (1 2 )1 / 2 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Consequently, the GLS estimator β is the OLS estimator applied to the transformed data: (1 2 )1/ 2 y1 y y 1 2 ~ y Cy y3 y2 y y T 1 T (1 2 )1 / 2 1 ~ x Cx 1 1 (1 2 )1 / 2 x1 x 2 x1 x3 x 2 xT xT 1 That is, we transform the data matrix by “quasidifferencing” observations 2,…,T (and the first observation is simply multiplied by sqrt(1-ρ2)). To estimate the model by FGLS, we need a consistent estimator of ρ. A consistent estimator of ρ is found from the regression of ˆ t on ˆt 1 , where ˆ t is the OLS residual from the regression of y on 1,x. So, in this case, the FGLS estimator is a sequence of three regressions – 1. Regress yt on 1,xt to obtain ˆ t 2. Regress ˆ t on ˆt 1 to obtain ̂ ~ ~ 3. Regress y ( ˆ ) on x ( ˆ ) to obtain ˆ This is also sometimes referred to as the (“two-step”) Prais-Winsten estimator of β. Notes – ~ 1. Sometimes, the first observations of y ( ˆ ) and ~ x ( ˆ ) are dropped in Step 3 for convenience. The resulting estimator is called the CochraneOrcutt estimator. It is asymptotically equivalent to the Prais-Winsten estimator but may not do as well in modest samples, especially when ρ is close to 1. 2. Iterative versions of the P-W and C-O estimators are sometimes applied: First, follows steps 1-3 to obtain ˆ . Then use ˆ to construct new ˆ ’s. Follow steps t 2 and 3. … Continue until ˆ converges. 3. These ideas extend in a straightforward way to the case where the ε’s follow a higher-order AR process (and/or additional explanatory variables). For example, if p = 2, the second step of the C-O estimator would be: Regress ˆ t on ˆt 1 and ˆt 2 to get ̂1 and ̂ 2 The third step would be: ̂1 yt-1 - ̂ 2 yt-2) on (1- ̂1 - ̂ 2 ) and (xt - ̂1 xt-1 - ̂ 2 xt-2) for Regress (yt - t = 3,…,T to get ˆ1 and ˆ 2 . For p > 2, the C-O is usually preferred in practice to the P-W estimator because of the increasingly complicated form of the transformations for the first p observations of y and x. If the regressors are predetermined but not strictly exogenous, apparently the estimator of ρ given in step 2 of the C-O and P-W estimators will be inconsistent (and, therefore, the FGLS estimator of β will be inconsistent, too). There appear to be at least a couple of ways to proceed in this case to construct a consistent and asymptotically efficient estimator of β. These involve the joint estimation of ρ and β to minimize the sum of squared v’s. 1. Nonlinear least squares 2. Maximum likelihood The NLS Estimator – Let yt = β0 + β1xt + εt, εt = ρεt-1 + vt Then ρyt-1 = ρβ0 + ρβ1xt-1 + ρεt-1 and so, yt – ρyt-1 = (1-ρ)β0 + β1xt –ρβ1xt-1 + (εt – ρεt-1) or, rearranging, yt = (1-ρ)β0 + β1xt –ρβ1xt-1 + ρyt-1 + vt T Minimize vt2 ( , 0 , 1 ) with respect to ρ, β0, β1. 2 The (quasi-) maximum likelihood estimator – 2 1 ln L = constant + 0.5ln(1-ρ ) – (T/2)lnσv 2 2 T 2 v v 2 t 1 where v1 = (1-ρ2)(y1-β0-β1x1) vt = yt - (1-ρ)β0 - β1xt +ρβ1xt-1 - ρyt-1 for t > 1 and 1 T 2 vt T 1 2 v Minimize ln L with respect to β0, β1, and ρ.