Regression with time series errors Regression in matrix form: Y X e . Here e is usually assumed to be white noise but in time series there is often autocorrelation. Example: One row of X and Y per month, Y = column of monthly sales, X columns: (1) Column of all 1s, (2) column of advertising expenditures A, (3) column of competitor’s sales C column vector ’ =(intercept, coefficient of advertising, coefficient of competitor sales)= (0, 1, 2) Y1 1 A1 Y2 1 A2 | | | Yn 1 An C1 z 0 1 C 2 z 2 1 | | 2 Cn z n V = variance matrix of error vector (variances on diagonal, covariance between zi and zj in positions (i,j) and (j,i). Here are some examples for 4 errors: Example: AR(1) zt=zt-1 + et where et is white noise V= 8 4 = variance of e divided by 1-2. Example: V= 2 1 Example: MA(1) zt = et -et-1 1 V= 0 0 0 1 1 0 = -/(1+2) is the lag 1 autocorrelation. 4 2 8 4 4 8 2 4 1 2 3 1 2 2 (2=variance of z 2 1 3 2 1 1 2 , so 2=8, =0.5 4 8 0 0 2 where 2=variance of z = (1+2)(variance of e) and 1 10 4 0 0 4 10 4 0 Example: V= implies 2=10, =4/10=.5/1.25 so = -0.5 . 0 4 10 4 0 0 4 10 1 0 Example: White noise V= 0 0 0 0 0 1 0 0 2 where 2=variance of z (the error variance) . For V= 0 1 0 0 0 1 10 0 0 0 0 10 0 0 2 0 0 10 0 we have =10. 0 0 0 10 1 1 1 The best estimate of is always ˆ ( X 'V X ) ( X 'V Y ) which is called the vector of “generalized least squares estimates,” and the variance-covariance matrix of these estimates is always ( X 'V 1 X ) 1 . The standard errors of the estimates (denominators for t) are the square roots of the diagonals of this ( X 'V 1 X ) 1 matrix (with V estimated). In the special case of white noise, V-1 is just 2 times an 1 identity matrix and the formulas reduce to the “least squares estimates” ˆ ( X ' X ) ( X ' Y ) and the square roots of the diagonal elements of ( X ' X ) 1 2 give the standard errors where 2 is estimated by the error mean square. These are the formulas that least squares regression programs all use but in time series it is often the case that we see evidence of autocorrelation. Mistakenly using the least squares formulas that assume independent errors e when in reality you have autocorrelated errors z would give (1) Less than the best estimates (but still unbiased and still pretty good) (2) Incorrect standard errors and thus (3) Incorrect t tests and (4) Wrong p-values If you ignore autocorrelation and just run least squares (e.g. PROC REG) then the computer will assume no correlation and use the diagonal elements of ( X ' X ) 1 2 (replacing 2 with the error mean square) to get the standard errors etc. [The least squares estimates actually have variance matrix ( X ' X ) 1 ( X 'VX )( X ' X ) 1 which is quite different from what the least squares program gives unless V is an identity matrix times 2 in which case ( X ' X ) 1 ( X ' I 2 X )( X ' X ) 1 ( X ' X ) 1 ( X ' IX )( X ' X ) 1 2 ( X ' X ) 1 2 . The formula ( X ' X ) 1 ( X 'VX )( X ' X ) 1 is also different from what you would get if you had used the correct 1 1 1 estimate ˆ ( X 'V X ) ( X 'V Y ) .] How do we solve this problem? PROC AUTOREG does this in steps. AUTOREG is limited to autoregressive error terms and no differencing within the procedure. The process is (1) Fit the ordinary least square estimates ˆ ( X ' X ) 1 ( X ' Y ) , which are good but not the best, and output residuals. Do not trust any of the tests at this point. (2) Fit a possibly long autoregressive model to those residuals, optionally eliminating insignificant lags. (3) Compute the V matrix associated with the step 2 model. (4) Compute the generalized least squares estimates and standard errors using that estimated V matrix. Now you can trust the test statistics and p-values. Syntax PROC AUTOREG; MODEL Y = X1 X2 X3 X4/NLAG=20 BACKSTEP; (etc.) PROC AUTOREG will automatically do the 4 step method described above. Just be careful not to trust the p-values when looking at the step 1 output. PROC ARIMA approach: You can also do this in PROC ARIMA and use moving error components as well as differencing, neither of which is available in PROC AUTOREG. Further, PROC ARIMA has an outlier detector. A nice example of this is energy usage at NC State University (personal communication from the facilities operations division). Indicator variables distinguish 3 kinds of days: class days, work days that are not class days, and non-work days (weekends & holidays). Temperature variables and a sine and cosine that give one cycle through the year are also used as predictors. Day of the week adds about .0021 to R2 and is ignored here.