e XY + = β β β β σ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ 8421 4842 24

advertisement
Regression with time series errors
Regression in matrix form: Y = Xβ + e . Here e is usually assumed to be white noise but in time series
there is often autocorrelation.
Example:
One row of X and Y per month,
Y = column of monthly sales,
X columns: (1) Column of all 1s, (2) column of advertising expenditures A, (3) column of competitor’s
sales C
β column vector
β’ =(intercept, coefficient of advertising, coefficient of competitor sales)= (β0, β1, β2)
 Y1  1 A1
  
 Y2  1 A2
 | =| |
  
 Yn  1 An
C1 
z 
 β 0   1 
C 2   z 2
 β1  +  
|    | 
 β2
 
Cn    z n 
V = variance matrix of error vector (variances on diagonal, covariance between zi and zj in positions (i,j)
and (j,i). Here are some examples for 4 errors:
Example: AR(1)
zt=ρzt-1 + et where et is white noise V=
8

4
= variance of e divided by 1-ρ2. Example: V= 
2

1
Example: MA(1) zt = et -θet-1
1

ρ
V= 
0

0
ρ
0
1
ρ
ρ
1
0
ρ
ρ = -θ/(1+θ2) is the lag 1 autocorrelation.
4
8
4
2
2
4
8
4
 1
ρ ρ2 ρ3


ρ ρ2 2
1
ρ
σ (σ2=variance of z
ρ2 ρ

ρ
1

ρ3 ρ2 ρ
1 

1

2
, so σ2=8, ρ=0.5
4

8 
0

0 2
σ where σ2=variance of z = (1+θ2)(variance of e) and
ρ

1 
10 4 0 0 


 4 10 4 0 
Example: V= 
implies σ2=10, ρ=4/10=.5/1.25 so θ = -0.5 .

0 4 10 4


0
0
4
10


Example:
10 0

 0 10
0 0

0 0
1 0 0

0 1 0
White noise V= 
0 0 1

0 0 0
0 0

0 0
we have σ2=10.
10 0 

0 10 
0

0 2
σ where σ2=variance of z (the error variance) . For V=
0

1 
The best estimate of β is always βˆ = ( X 'V −1 X ) −1 ( X 'V −1Y ) which is called the vector of “generalized
least squares estimates,” and the variance-covariance matrix of these estimates is always ( X 'V −1 X ) −1 .
The standard errors of the estimates (denominators for t) are the square roots of the diagonals of this
( X 'V −1 X ) −1 matrix (with V estimated). In the special case of white noise, V-1 is just σ2 times an
identity matrix and the formulas reduce to the “least squares estimates” βˆ = ( X ' X ) −1 ( X ' Y ) and the
square roots of the diagonal elements of ( X ' X ) −1σ 2 give the standard errors where σ2 is estimated by
the error mean square. These are the formulas that least squares regression programs all use but in
time series it is often the case that we see evidence of autocorrelation. Mistakenly using the least
squares formulas that assume independent errors e when in reality you have autocorrelated errors z
would give
(1) Less than the best estimates (but still unbiased and still pretty good)
(2) Incorrect standard errors and thus
(3) Incorrect t tests and
(4) Wrong p-values
If you ignore autocorrelation and just run least squares (e.g. PROC REG) then the computer will assume
no correlation and use the diagonal elements of ( X ' X ) −1σ 2 (replacing σ2 with the error mean square)
to get the standard errors etc.
[The least squares estimates actually have variance matrix ( X ' X ) −1 ( X 'VX )( X ' X ) −1 which is quite
different from what the least squares program gives unless V is an identity matrix times σ2 in which case
( X ' X ) −1 ( X ' Iσ 2 X )( X ' X ) −1 = ( X ' X ) −1 ( X ' IX )( X ' X ) −1σ 2 = ( X ' X ) −1σ 2 . The formula
( X ' X ) −1 ( X 'VX )( X ' X ) −1 is also different from what you would get if you had used the correct
estimate βˆ = ( X 'V −1 X ) −1 ( X 'V −1Y ) .]
How do we solve this problem? PROC AUTOREG does this in steps. AUTOREG is limited to
autoregressive error terms and no differencing within the procedure. The process is
(1) Fit the ordinary least square estimates βˆ = ( X ' X ) −1 ( X ' Y ) , which are good but not the best, and
output residuals. Do not trust any of the tests at this point.
(2) Fit a possibly long autoregressive model to those residuals, optionally eliminating insignificant lags.
(3) Compute the V matrix associated with the step 2 model.
(4) Compute the generalized least squares estimates and standard errors using that estimated V matrix.
Now you can trust the test statistics and p-values.
Syntax
PROC AUTOREG;
MODEL Y = X1 X2 X3 X4/NLAG=20 BACKSTEP;
(etc.)
PROC AUTOREG will automatically do the 4 step method described above. Just be careful not to trust
the p-values when looking at the step 1 output.
PROC ARIMA approach:
You can also do this in PROC ARIMA and use moving error components as well as differencing, neither of
which is available in PROC AUTOREG. Further, PROC ARIMA has an outlier detector.
A nice example of this is energy usage at NC State University (personal communication from the facilities
operations division). Indicator variables distinguish 3 kinds of days: class days, work days that are not
class days, and non-work days (weekends & holidays). Temperature variables and a sine and cosine that
give one cycle through the year are also used as predictors. Day of the week adds about .0021 to R2 and
is ignored here.
Download