Gujarati Chapter 6-- Statistical Inference and Hypothesis Testing

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Regression Analysis--Autocorrelation and WLS
Recall that one of the assumptions of the OLS method is that the errors for the individuals in the
population (and therefore, in the sample) are independent of one another. That is, the size of one
individual’s error does not affect the size of another individual’s error. The Autocorrelation Problem
occurs when this assumption is violated and the errors are somehow dependent on one another; that is, the
errors affect one another in some way.
Autocorrelation occurs more often in time-series data than in cross-section data, because often a large
error in one time period will have "lingering effects" on later time periods, causing the errors in the two
time periods to be related to one another (instead of being independent of one another).
There are many forms of autocorrelation, but we will focus in this handout on one of the most commonlyencountered types of autocorrelation “First Order Serial Autocorrelation.” First Order Serial
Autocorrelation typically occurs in Time Series datasets. In Time Series datasets, each
observation/individual/row of data corresponds to a particular point in time. For example, each row of
data may refer to a particular month, or quarter, or year. In First Order Serial Autocorrelation, the error in
one time period (row of data) lingers to affect the error in the next time period (row of data). The term
“First Order” means that the lingering effect lasts for just one time period (In Second Order Serial
Autocorrelation, the lingering effect can last for two time periods, etc.). The term “Serial” describes how
the lingering effect occurs over and over again, one error affecting the next, for all time periods in the
data set. The term “Autocorrelation” refers to the fact that the errors are affecting themselves (the “Auto”
or “self-effecting” part), and this causes the errors to be correlated with one another.
Mathematically, we can represent First Order Serial Autocorrelation as follows:
e t    e t 1  v t , where vt is a random error term,
The equation above says that the error in time period t, et, is equal to a random error, vt, plus a fraction, ρ,
of the error from the preceding time period et-1. The key thing is that a fraction, ρ, of the error from the
preceding time period gets incorporated into the error of the current time period—THIS IS WHAT
MAKES THE TWO ERRORS CORRELATED AND WHAT CAUSES THE AUTOCORRELATION
PROBLEM. (Recall that one of the assumptions of OLS is that the errors are NOT correlated in this
way.)
In the equation above, rho (“ρ”) is the COEFFICIENT OF AUTOCORRELATION. The value of rho
lies between -1 and 1, that is: -1 < ρ < 1.
 Positive Autocorrelation: If rho is positive, a large (positive) error in one period increases the
error in the next period.
 Negative Autocorrelation: If rho is negative, a large (positive) error in one period decreases the
error in the next period.
Problems Caused by Autocorrelation
1. Although the estimates of the coefficients (𝛽̂’s) are still unbiased, the estimates of the s.e.’s of
the 𝛽̂’s are biased downward; as a result, we may incorrectly conclude that X variables affect
Y when in fact they do not.
2. The S.E.R. is biased downward; as a result, we may conclude that the model fits the data
better than it actually does.
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Detecting Autocorrelation
We can detect Autocorrelation using:
1. A Residual Plot of the regression residuals, the “ehats,” against time. If Autocorrelation is
present, the variation in the ehats will not be the same for all values of X. Figure 1 shows an
example of no Autocorrelation. Figure 2 shows and example of Autocorrelation.
ehats
+
Figure 1. No Autocorrelation
0
ehats
+
Figure 2. Autocorrelation Present
0
time
-
time
-
Variation of ehats remains the
same over time.
Variation of ehats changes (in this
example, cycles up and down) over time.
2. The Durbin-Watson (DW) “d” statistic test can also be used to detect Autocorrelation,
especially when it is difficult to determine whether Autocorrelation is present from looking at
the residual (ehat, 𝑒̂ ) plots against time. The Durbin-Watson “d” statistic tests the following
null hypothesis:
H0: autocorrelation is not present
H1: autocorrelation is present
The Durbin-Watson “d” statistic is calculated from the regression residuals, the “ehats,” ( 𝑒̂ 's)
according to the following formula:
0 < dtest < 4
n
d test 
2
 (ê t  ê t 1 )
t 2
n
2
 ê t
t 1
,
dtest = 0 ==> perfect POSITIVE autocorrelation
dtest = 2 ==> no autocorrelation
dtest = 4 ==> perfect NEGATIVE autocorrelation
where “t” denotes the observation (the time period) and there are n total observations. IT IS
ASSUMED THAT THE DATA ARE A TIME SERIES STARTING AT t = 1 AND ENDING
AT t = n. Notice that in the numerator the difference between each residual at time period "t"
and the residual one time period before it (at time period "t -1") is squared. Because the
residual at time t = 1 has no residual “before” it in time, it is not included in the summation in
the numerator. In the denominator, each residual is simply squared before being summed,
including the residual at t = 1.
2
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
The dtest statistic calculated using the formula above is compared to “dcritical” values from the
Durbin-Watson d-statistic table (handed out in lecture). The Durbin-Watson test actually uses
two “dcritical” values, a “dcritical-upper” and a “dcritical-lower.” Both critical values depend on
sample size, n, and the number of X variables in your model, (k-1). The two dcrit values from
the table are used to calculate two more critical values: (4 - dcrit-lower) and (4 – dcrit-upper).
So, a total of four critical values are used in the DW test. See the example below.
Example of the Durbin-Watson Test: Suppose that in a study of time series data we have a
sample size n = 40, the number of X variables in the model is (k-1) = 4, and you choose an α =
5% significance level for the Durbin-Watson test. The Durbin-Watson d-table shows that “dcritlower” is 1.285" and “dcrit-upper” = 1.721. The dtest value is calculated using the formula above
and is compared to dcrit-upper, dcrit-lower, (4 – dcrit-upper), and (4 - dcrit-lower) on the d-axis scale
below to determine whether to accept or reject Ho. The d-axis scale divided into several regions.
Several outcomes are possible, depending on the region into which dtest falls.
d axis
4
negative autocorrelation
2.715 = (4 - dcrit-lower)
test inconclusive
2.279 = (4 - dcrit-upper)
2
no autocorrelation
1.721 = dcrit-upper
test inconclusive
1.285 = dcrit-lower
positive autocorrelation
0
For example, if dtest = 0.83, then we would conclude that we have positive autocorrelation.
If dtest = 3.24, then we would conclude that we have negative autocorrelation.
If dtest = 1.88, then we would conclude that we have no autocorrelation.
If dtest = 2.53, then the test is inconclusive, and we remain unsure about whether or not there is
autocorrelation.
The following additional assumptions must be met for the d statistic to be valid:
1. the regression model includes an intercept
2. the regression does not use lagged values of Y as an X variable
3
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Correcting Autocorrelation Using Weighted Least Squares (WLS) Regression:
Okay, assuming that the autocorrelation is FIRST-ORDER SERIAL AUTOREGRESSION, we
can estimate the Coefficient of Autocorrelation (rho) from the Durbin-Watson dtest statistic using
the following formula:
d 
ˆ  1   test  ,
 2 
where ̂ , “rho hat,” is our estimate of rho, based on our data. (That is, we calculate ̂ based on
dtest, but recall that we calculated dtest based on our ehats, and the ehats are based on our X and Y
data.)
We use ̂ to weight (adjust) our X and Y data and the intercept of the model as follows:
For time period t = 2 and all later time periods:
Yt*  (Yt  ˆ Yt 1 )
X *t  (X t  ˆ X t 1 )
*0  (1  ˆ )   0
We need to use special formulas for the first time period, because there is no time period t – 1
before the first time period in our data set.
For time period t = 1 only:
Y1*  ( 1  ˆ 2 )Y1
X1*  ( 1  ˆ 2 )X1
Finally, run the regression:
Yt*  *0   x  X *t  e t
This is another example of Weighted Least Squares (WLS) regression. In this example, we are
using ̂ , “rho hat,” to weight the data in such a way as to remove the effects of autocorrelation.
Autocorrelation in SAS:
Suppose we have a time series data set with three variables: TIME (year), RWAGES (real wages) and
PRODUCT (national product as measured by GDP). We want to run a regression to determine the
relationship between real wages and national product. However, because we are working with a time
series data set, we want to test for autocorrelation and correct for it, if it is present. If autocorrelation is
present, we can use PROC AUTOREG in SAS to conduct a WLS regression to correct for the
autocorrelation.
4
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
/* SOFTWARE: SAS Statistical Software program, version 9.2
AUTHOR: Dr. Chris Dumas, UNC-Wilmington, Spring, 2013.
TITLE: Program to perform weighted least squares (WLS) regression
to correct for autocorrelation. */
proc import datafile="v:\ECN377\timeseriesdata.xls" dbms=xls out=dataset01
replace;
run;
/* proc reg below conducts a regression of variable RWAGES (real wages)
against variable PRODUCT (national product as measured by GDP).The "dw"
option on the model command requests that SAS calculate the durbin-watson
statistic. */
proc reg data=dataset01;
model rwages = product / dw;
output out=dataset02 r=ehat;
run;
/* The proc plot below graphs the residuals (ehat's) against time to check
for autocorrelation. The pattern in the residuals indicates that
autocorrelation DOES appear to be present. Also, the durbin-watson dtest
statistic calculated above is 0.214, which indicates positive
autocorrelation. */
proc plot data=dataset02;
plot ehat*time;
run;
/* The PROC AUTOREG command below corrects for autocorrelation. First, PROC
AUTOREG gives the results for an uncorrected OLS regression (this repeats
what we did above). Next, PROC AUTOREG estimates rho based on the residuals
from the OLS regression. Under "Estimates of Autoregressive Parameters" in
the output window, SAS shows rho to be -0.814743, BUT SAS ALWAYS GIVES THE
NEGATIVE OF THE TRUE RHO, SO THE ACTUAL ESTIMATE OF RHO IS +0.814743. Next,
SAS uses rho to weight the Y and X variables. The "nlag=1" option in the
model command below tells SAS that we are correcting for FIRST ORDER serial
autocorrelation--the error in period t depends on the error one time period
earlier. (For other types of autocorrelation, nlag is greater than 1, but we
won't get into that here.) Finally, the "output" command saves the ehats from
the autocorrelation-corrected regression and names them "ehatnew". We want
to save the ehatnew's so that we can plot them against time and check whether
the autocorrelation correction worked. */
proc autoreg data=dataset01;
model rwages = product / nlag=1;
output out=dataset03 residual=ehatnew;
run;
/* The proc plot command below graphs the ehatnew’s against time to
see whether a pattern still exists after we adjusted for autocorrelation.
Very little pattern remains, so it looks as if the autocorrelation correction
worked pretty well. */
proc plot data=dataset03;
plot ehatnew*time;
run;
5
Download