Econ107 Applied Econometrics Topic 8: Autocorrelation (Serial Correlation) (Studenmund, Chapter 9) I. Definitions and Problems 3rd in our list of common regression problems. Define this problem by starting with the ‘absence’ of serial correlation. Common problem with time series analysis. Cross sectional regression may involve spatial correlations. No serial correlation exists if the disturbances are uncorrelated for any two observations in the sample. Cov ( t , s ) = 0 for any t s We can show this situation in this diagram. If no serial correlation exists, then no ‘discernable pattern’ exists in either the disturbances or the residuals. These ‘draws’ from the distribution of disturbances appear to be independent of one another. For example, a positive residual in one observation doesn’t affect the probability that the next residual will be positive (or negative) – independent, random draws from the same distribution. Serial correlation may exist if any two disturbances are not independent. Cov ( t , s ) 0 for some t s Page2 We can represent this serial correlation in a plot of the disturbances or residuals against time. In this case, we observe an upward linear trend. The residuals are not independent of one another. Positive and negative values occur in adjacent observations. They ‘clump up’ at either end of the time series. We could reverse the order above, and get a negative linear trend. Or we could get a ‘cyclical’ pattern in the disturbances or residuals like this diagram. Serial correlation could be related to an unobserved variable that has ‘persistent effects’ on the dependent variable over a number of periods (e.g., shocks that are difficult to quantify in a macro model and may ‘spill over’ into future periods). An example would be expectations for changes in government policies toward Page3 superannuation that could affect household savings behaviour over a number of quarters. Or business confidence that could influence capital investment. The idea is that there is a lot of ‘inertia’ or ‘sluggishness’ in time series. There is a certain momentum that is built into economic aggregates like GDP. It takes time to ‘slow down’ or ‘speed up’. All of this results in serial correlation. Serial Correlation or Specification Error? True serial correlation exists when a regression model is appropriately specified, and yet the disturbances are correlated. False serial correlation exists when the disturbances are correlated in an inappropriately specified model, and yet the disturbances are uncorrelated in the ‘correct’ regression model. In other words, serial correlation comes entirely from the misspecification. EXAMPLE: Suppose we estimate: lnLt = 0 + 1 lnW t + 2 lnQt + vt where lnLt = log of aggregate employment in year t. lnWt = log wage in year t. lnQt = log output in year t. But the 'true' model is: lnLt = 0 + 1 lnW t + 2 lnQt + 3 lnr t + t where lnrt = log of the rental cost of capital in year t. The disturbance term in the estimated regression is a ‘composite’ of the omitted variable and the disturbance term from the true regression. This could be written ‘approximately’ as: vt = 3 lnr t + t We have omitted variable bias. We forgot or were unable to include the variable on the price of capital. If the omitted regressor is correlated over time, then the observed residuals may indicate serial correlation. For example, suppose it goes through a cyclical pattern, high during peaks and low during troughs in the business cycle. Page4 Obviously, the solution here is to include the omitted variable. Not really an issue of serial correlation per se. It just indicates another ‘type’ of regression problem. In practice, it’s difficult to distinguish between 'true' and 'false' serial correlation. Types of serial correlation We need to look at specific forms of serial correlation. In general, disturbances are correlated. The simplest form is where ‘adjacent’ disturbances are correlated — observations in successive time periods. Specify a 2-variable regression: Y t = 0 + 1 X t + t First-Order Serial Correlation - AR(1). This is the simplest form of serial correlation. t t 1 ut where measures the strength of the serial correlation, and ut is a stochastic disturbance term where: E ( ut ) = 0 Var ( u t ) = 2 Cov ( u t , u s ) = 0 for all t s We’ll refer to this as a classical (or ‘nicely behaved’) disturbance term. Zero mean, constant variance (i.e., no heteroskedasticity) and no serial correlation. This is known as an ‘autoregressive’ scheme because this specification of the functional form is itself a regression model. ρ is a slope coefficient. If ρ>0, then we have ‘positive’ serial correlation. If ρ<0, then we have ‘negative’ serial correlation. Alternatives to AR(1). The idea is that AR(1) may be too simplistic in some situations to capture the relationship among the disturbances. Page5 Second-Order Serial Correlation - AR(2) t = 1 t 1 + 2 t 2 + ut Note that there are 2 separate coefficients in the autoregressive scheme. Normally we’d expect the magnitude of the linear relationship to dissipate with time (i.e., |ρ1| > |ρ2|). The correlation weakens as the spread between the period increases. Nth-Order Serial Correlation - AR(n) t = 1 t -1 + 2 t -2 + ... + n t -n + ut Lagged-n relationship. The GLS Procedure Suppose you use OLS to estimate a regression model with known serial correlation. What are the consequences? (1) The estimated OLS coefficients are still unbiased. An absence of serial correlation is not a necessary condition for unbiasedness. (2) But these OLS estimators are not efficient. They’re no longer minimum variance. An absence of serial correlation is a necessary condition for estimators to be BLUE. This means that the variances or standard errors of the estimated coefficients by OLS under the classical assumptions are biased. In general, we don’t know the direction of the bias. Might be overestimating, or underestimating them. As a result, our t-ratios may be too small or too large. We might reject a null hypothesis that a particular slope coefficient is equal to zero, when we shouldn’t. Or might not reject it, when we should. Thus, statistical inference is inappropriate. Since the consequences are severe, what do we do about it? Begin by considering the alternative estimation procedure – Generalized Least Squares (GLS). Keep things simple by assuming a 2-variable model with AR(1). Y t = 0 + 1 X t + t .....(1) where Page6 t = t -1 + ut Note that the same functional form holds for period t-1. Multiply both sides of (1) by constant ρ: Y t -1 = 0 + 1 X t -1 + t -1 …(2) Now subtract (2) from (1). Y t - Y t -1 = 0 (1 - ) + 1( X t - X t -1 ) + ( t - t -1 ) or * * * Y t = 0 + 1 X t + ut ....(3) Note that disturbance term is now ut, which is ‘nicely behaved’ (i.e., zero expected value, constant variance and no serial correlation). The reason is that the disturbance term above can be written: t - t -1 = t -1 + ut - t -1 = ut where we rely on the known structure of the serial correlation. This is known as a Generalized Difference Equation. Run OLS on the transformed data (ie Equation 3). Coefficient estimates will be BLUE. GLS estimation of (1) is defined to be OLS estimation of (3). II. Detection Variety of ways of diagnosing the presence of serial correlation. Check the existing empirical literature. See it coming. Rest are ‘post-mortems’. Run OLS and see whether you shouldn’t have. 1. Graphical Methods. A. Plot residuals across time. Look for a 'detectable pattern'. B. Plot residuals and lagged values in 4-quadrant Diagram. Page7 This is just an alternative to the time sequence plot. Measure the current residuals along the vertical axis, and the lagged residuals along the horizontal. If the data points end up in mostly the first and the third quadrants, then you’ve got evidence of ‘positive’ serial correlation. Positive residuals in one period are generally followed by positive residuals in the next period. Same for negative residuals. The opposite case would be where the residuals are predominately in the second and fourth quadrants. This suggests negative serial correlation. Positive residuals are generally followed by negative values, etc. Of course, with an absence of serial correlation, the data points would be evenly distributed across all four quadrants. Page8 2. Durbin-Watson d Statistic This is the most commonly used diagnostic test of serial correlation. Define this test statistic as: 2 (e - e ) d = t =2 nt 2 t -1 t =1 et n This test statistic can only be used under a number of conditions. Most importantly, it assumes that the autoregressive structure is AR(1). In addition, the regression model must include an intercept term, and it must not include a lagged dependent variable as a regressor. Now with a little algebra and some simplifying approximations, we can show the extreme limits on the Durbin-Watson d statistic, and relate it back to the ‘structure’ of the serial correlation. Rewrite this as: d = t = 2 n 2 2 n n et + t =2 et -1 - 2 t =2 et et -1 2 n t =1 et Note the following ‘approximations’: 2 2 n n t =2 et t =2 et -1 2 2 n n t =1 et t =2 et Thus, we can rewrite this expression: d 2( tn=1 et2 - tn=2 et et -1 ) 2 n t =1 et or: ee d 2(1 - t =n2 t 2t -1 ) t =1 et n Since the estimated coefficient of autocovariance is (ρ) is: Page9 ̂ = t =n2 et e2t -1 n t =1 et we can finally write: d 2(1 - ̂ ) Since - 1 ̂ 1 then 0d 4 If ρ = 0, then d = 2 (absence of serial correlation). If ρ = 1, then d = 0 (perfect positive serial correlation). If ρ = -1, then d = 4 (perfect negative serial correlation). The test procedure entails computing the d statistic (although software packages will do this for you), and compare it to the relevant critical value. One problem is that there is no ‘unique’ critical value with the d statistic. Instead, we have 'upper' and 'lower' bounds. The reason is that the probability distribution for the d statistic is not easily derived. It depends on the values of the explanatory variables, which vary from sample to sample. This means that there is no single critical value as in the t or F tests. Three-step procedure: (1) Run OLS. Suppose you obtain d=1.73. Null Hypothesis is H0: No serial correlation. Alternative hypothesis is H1: Positive serial correlation. (2) Determine dL and dU. Suppose n=100 and k=5, and we want a 5% significance level. Tables B4 (pp 617 of Studenmund) gives us dL=1.57 and dU=1.78. (k = the number of explanatory variables excluding the constant term, 5 in this case). (3) Apply decision rule. Use the following diagram. Page10 The 2 darkened areas are often known as the ‘zones of indecision’ or ‘regions of ignorance’. Can’t reach any conclusions about serial correlation if dLddU. If d>dU, then you can’t reject the null. If d<dL, you can reject the null in favour of positive serial correlation. In this numerical example, the DW statistic falls within this ‘zone of indecision’. No conclusion is possible. Like other statistical tests these ‘areas’ and critical values depend on the choice of the significance level from the outset. If we’d chosen a 1% significance level, the critical values would be 1.44 and 1.65. Now the d statistic exceeds the upper critical value. We couldn’t reject the null hypothesis of no serial correlation. In other regression models, our alternative hypothesis may be negative, rather than positive serial correlation. This testing regime is the mirror image of the other. Here the critical values are 4-dU and 4-dL. Get these from the same tables. Follow similar procedures. III. Remedial Measures 1. GLS Structure of serial correlation is known: Run OLS on the following transformed model: * * * Y t = 0 + 1 X t + ut ρ is not known (Feasible GLS or FGLS): Page11 KEY: Almost the same as GLS. The difference is that we need an estimate of ρ. (i). The Cochrane-Orcutt Iterative Procedure. Probably the most commonly used procedure for coming up with an estimate of ρ. Part of most software packages. Again, use a 3-Step procedure: (1) Run OLS on Model (1). Retain residuals. (2) Estimate following auxiliary regression: et = et -1 + vt (3) Using the 'first estimate' of ρ from Step 2, transform the data and run OLS Y t - ˆ Y t -1 = 0 (1 - ˆ ) + 1( X t - ˆ X t -1 ) + ut Repeat the second step, where: * * * et = Y t - ˆ 0 - ˆ1 X t Step 2' is now: * * et = et -1 + wt Step 3' is now: Y t - ˆ 2 Y t -1 = 0 (1 - ˆ 2 ) + 1( X t - ˆ 2 X t -1 ) + ut where we use the 'second estimate' of ρ from Step 2'. This is an 'iterative' procedure. Begin with a relatively inefficient estimate of β coefficients and ρ. Use this to transform data and get a ‘better’ estimate of both βs and ρ. Each iteration keeps improving our estimates. We keep going until successive estimates of ρ change by something less than some predetermined value (e.g., .001). Idea is that the estimates eventually ‘settle Page - 12 down’ to some specific value. (ii). Simultaneous estimation of 0 , 1 , . 2. Using OLS with Newey-West standard errors. Serial correlation does not cause bias of OLS estimates but impacts the standard errors. Newey-West technique directly adjusts the standard errors of OLS estimates to take account of serial correlation. IV. Questions for Discussion: Q9.11, Q9.13 V. Computing Exercise: Johnson, Ch 9