Serial Autocorrelation: Why it’s there, what it does, and how to get rid of it. By Ronald U. Mendoza Fordham Department of Economics What is serial autocorrelation? When using time series data in regressions, one must always check to make sure that all the assumptions of the classical linear regression model are satisfied. One of these assumptions is that the disturbance term relating to any observation is not influenced by the disturbance term relating to any other observation. Put simply, the error terms of the Ordinary Least Squares (OLS) equation estimate must be independently distributed of each other and hence the covariance between any pair of error or residual terms must be zero. Should this covariance be non-zero, then the residuals are said to be autocorrelated and a relationship between present and past values can be observed. Serial autocorrelation therefore refers to the existence of a linear equation involving the residuals of the regression. In a typical regression of Y regressed on X as in Equation 1, the presence of first order serial autocorrelation in the residuals is expressed by Equation 2. Notice that because v is not independently distributed, then a necessary assumption for the OLS procedure has been violated. Yt = α + βX t + vt (1) vt = ρvt −1 + ε t (2) What causes serial autocorrelation? There are many causes for serial autocorrelation in regressions involving time series data. The most significant cause is that there is usually momentum built into most time series. For instance, data for the GNP can have high interdependence in between successive values because total national output tends to follow a so-called business cycle. Because of this, a regression involving GNP could result in error terms which are also highly interdependent. Other examples of time series that exhibit high interdependence include the Consumer Price Index (CPI), production, employment, unemployment, exports, imports, etc.1 1 . If you are interested in an intuitive explanation of other causes for serial autocorrelation, Gujarati’s Basic Econometrics, 3rd Edition (1995) is a good reference. 1 What does serial autocorrelation do to the OLS regression? The presence of autocorrelation in the error terms of an OLS regression still results in unbiased coefficient estimates. However, these estimates are not the most efficient in the class of all linear unbiased estimators. In other words, another unbiased and more efficient estimator can be found. This inefficiency of these estimates manifests itself in the t-statistics generated from the coefficients. Because the t-stat is nothing more than the coefficient estimate divided by the square root of the variance of that coefficient estimate, then serial autocorrelation results in dubious t-statistics. Ergo, reliable inferences cannot be made based on the regression results. How does one test for serial auto-correlation? There is a large body of literature on tests for serial autocorrelation. However, the standard test included in most software packages is the Durbin-Watson test. It is defined simply as: n d= ∑ (v t =2 t − vt −1 ) 2 n ∑v t =1 2 t where v is the estimated residual from the OLS regression and n is the sample regression size. Once, d is calculated from the residuals of a typical OLS regression (and this can be done easily in Excel), one can then use the following rules in order to test for the presence of serial autocorrelation: Durbin-Watson d test: Decision Rules Null Hypothesis Condition* Decision No positive autocorrelation. 0 < d < d(lower) Reject null. (Autocorrelation!) No positive autocorrelation. d(lower) <= d <= d(upper) No decision. No negative correlation. 4 - d(lower) < d < 4 Reject null. (Autocorrelation!) No negative correlation. 4 - d(upper) <= d <= 4 - d(lower) No decision. No autocorrelation. d(upper) < d < 4 - d(upper) Do not reject null. (positive or negative) *The upper bound critical is d(upper) and the lower bound critical is d(lower). Also "<=" is read as "less than or equal to." 2 Attached are the Durbin-Watson critical values for models with up to five regressors. This was taken from the appendix of Greene's Econometric Analysis, 3rd Edition (1997). Note that the appropriate sample size is used in order to identify the relevant critical values for the upper and lower bounds of the DW statistic. How does one correct for serial autocorrelation? Most statistical and/or econometrics software packages now enable the user to automatically correct for serial autocorrelation. One such package, SAS, allows for the correction using a simple two-step procedure. First, an estimate for ρ in equation 2 had to be made using a maximum likelihood procedure. The intuition behind this first step is that an estimate of ρ is made so that the resulting error terms are independently distributed. The second step involves the incorporation of ρ into the estimation of equation 1. The objective of this augmentation is for modified equation 1 to exhibit independently and identically distributed residuals. The derivation of the corrected model is relatively straightforward. First, we get the one-period lag of equation 1. This is shown below as equation 3. Yt −1 = α + βX t −1 + vt −1 (3) Rho is then multiplied on both sides in order to get equation 4. ρYt −1 = αρ + βρX t −1 + ρvt −1 (4) Equation 4 is then subtracted from equation 1 in order to arrive at a white noise independently and identically distributed residual ε t . Equation 5 represents equation 1 corrected for serial autocorrelation of the first order. Note that by renaming these terms, then equation 5 can be expressed as equation 6 which provides best linear unbiased estimates (BLUE) using OLS. Yt − ρYt −1 = α (1 − ρ ) + β ( X t − ρX t −1 ) + ε t (5) Yt ∗ = α ∗ + β ∗ X t∗ + ε t (6) 3 The complete SAS program for a typical export demand function2 in Dr. Schwalbenberg's International Economic Policy class is written below: SAS PROGRAM: EXPLANATION: data temp; Gives the entire data set a name. infile 'a:\data.prn'; Uses the data named "data" which is saved in a disk. Note that it's saved as a space delimited file in Excel. input lnrx lnrr lngdp; Gives each column of data names. Be sure to remember which one is which in Excel! proc reg; Calls for the regression procedure. model lnrx=lnrr lngdp; Models the regression. At this point SAS gives you results which are possibly autocorrelated. You have to check the Durbin-Watson statistic in order to test for the presence of serial autocorrelation. proc autoreg data=temp; Calls for the autoregressive correction procedure. model lnrx=lnrr lngdp /nlag=1 dw=1 dwprob; Models the regression with the information that the residuals have first order autocorrelation.3 The results will include those for the uncorrected and the corrected versions of the regression. The DW statistic of the corrected regression as well as its probability will also be shown. run; References: Greene, William H. Econometric Analysis, 3rd Edition. Prentice Hall, Upper Saddle River NJ.1997. Gujarati, Damodar. Basic Econometrics, 3rd Edition. McGraw-Hill. NewYork. 1995. SAS/ETS Software Applications Guide 2: Econometric Modeling, Simulation, and Forecasting. Version 6, 1st Edition. SAS Institute. Cary NC. 1993. 2 The export demand function is the regression of the country's real exports (in log form) against the logtransform of the real exchange rate and the log transform of the real GDP of the country's trading partner. 3 Note that first order serial autocorrelation refers to the existence of a relationship between the present and only the first lagged value. The possibility of higher order serial autocorrelation, though less common, must still be considered. The DW statistic of the corrected regression should show a rejection of autocorrelation. If not, another correction is necessary. 4 5