Econ107 Applied Econometrics
Topic 8: Autocorrelation (Serial Correlation)
(Studenmund, Chapter 9)
Definitions and Problems
3rd in our list of common regression problems. Define this problem by starting
with the ‘absence’ of serial correlation. Common problem with time series
analysis. Cross sectional regression may involve spatial correlations.
No serial correlation exists if the disturbances are uncorrelated for any two
observations in the sample.
Cov (  t ,  s ) = 0
for any t  s
We can show
this situation in
this diagram.
If no serial
correlation exists,
then no
pattern’ exists in
either the
disturbances or
the residuals.
These ‘draws’
from the
distribution of
appear to be
independent of
one another. For
example, a positive residual in one observation doesn’t affect the probability that
the next residual will be positive (or negative) – independent, random draws from
the same distribution. Serial correlation may exist if any two disturbances are not
Cov (  t ,  s )  0
for some t  s
We can
represent this
serial correlation
in a plot of the
disturbances or
residuals against
In this case, we
observe an
upward linear
The residuals are
not independent
of one another.
Positive and
negative values
occur in adjacent
They ‘clump up’
at either end of
the time series.
We could
reverse the order
above, and get a
negative linear
Or we could get
a ‘cyclical’
pattern in the
disturbances or
residuals like
this diagram.
Serial correlation could be related to an unobserved variable that has ‘persistent
effects’ on the dependent variable over a number of periods (e.g., shocks that are
difficult to quantify in a macro model and may ‘spill over’ into future periods).
An example would be expectations for changes in government policies toward
superannuation that could affect household savings behaviour over a number of
quarters. Or business confidence that could influence capital investment. The
idea is that there is a lot of ‘inertia’ or ‘sluggishness’ in time series. There is a
certain momentum that is built into economic aggregates like GDP. It takes time
to ‘slow down’ or ‘speed up’. All of this results in serial correlation.
Serial Correlation or Specification Error?
True serial correlation exists when a regression model is appropriately specified,
and yet the disturbances are correlated.
False serial correlation exists when the disturbances are correlated in an
inappropriately specified model, and yet the disturbances are uncorrelated in the
‘correct’ regression model. In other words, serial correlation comes entirely from
the misspecification.
EXAMPLE: Suppose we estimate:
lnLt =  0 + 1 lnW t +  2 lnQt + vt
where lnLt = log of aggregate employment in year t.
lnWt = log wage in year t.
lnQt = log output in year t.
But the 'true' model is:
lnLt =  0 +  1 lnW t +  2 lnQt +  3 lnr t +  t
where lnrt = log of the rental cost of capital in year t.
The disturbance term in the estimated regression is a ‘composite’ of the omitted
variable and the disturbance term from the true regression. This could be written
‘approximately’ as:
vt =  3 lnr t +  t
We have omitted variable bias. We forgot or were unable to include the variable
on the price of capital. If the omitted regressor is correlated over time, then the
observed residuals may indicate serial correlation. For example, suppose it goes
through a cyclical pattern, high during peaks and low during troughs in the
business cycle.
Obviously, the solution here is to include the omitted variable. Not really an issue
of serial correlation per se. It just indicates another ‘type’ of regression problem.
In practice, it’s difficult to distinguish between 'true' and 'false' serial correlation.
Types of serial correlation
We need to look at specific forms of serial correlation. In general, disturbances
are correlated. The simplest form is where ‘adjacent’ disturbances are correlated
— observations in successive time periods.
Specify a 2-variable regression:
Y t =  0 + 1 X t +  t
First-Order Serial Correlation - AR(1). This is the simplest form of serial
 t   t 1  ut
where measures the strength of the serial correlation, and ut is a stochastic
disturbance term where:
E ( ut ) = 0
Var ( u t ) =  2
Cov ( u t , u s ) = 0 for all t  s
We’ll refer to this as a classical (or ‘nicely behaved’) disturbance term. Zero mean,
constant variance (i.e., no heteroskedasticity) and no serial correlation.
This is known as an ‘autoregressive’ scheme because this specification of the
functional form is itself a regression model. ρ is a slope coefficient.
If ρ>0, then we have ‘positive’ serial correlation.
If ρ<0, then we have ‘negative’ serial correlation.
Alternatives to AR(1).
The idea is that AR(1) may be too simplistic in some situations to capture the
relationship among the disturbances.
Second-Order Serial Correlation - AR(2)
 t =  1  t 1 +  2  t 2 + ut
Note that there are 2 separate coefficients in the autoregressive scheme. Normally
we’d expect the magnitude of the linear relationship to dissipate with time (i.e.,
|ρ1| > |ρ2|). The correlation weakens as the spread between the period increases.
Nth-Order Serial Correlation - AR(n)
 t =  1  t -1 +  2  t -2 +
... +  n  t -n + ut
Lagged-n relationship.
The GLS Procedure
Suppose you use OLS to estimate a regression model with known serial
correlation. What are the consequences?
(1) The estimated OLS coefficients are still unbiased. An absence of serial
correlation is not a necessary condition for unbiasedness.
(2) But these OLS estimators are not efficient. They’re no longer minimum
variance. An absence of serial correlation is a necessary condition for estimators
to be BLUE.
This means that the variances or standard errors of the estimated coefficients by
OLS under the classical assumptions are biased. In general, we don’t know the
direction of the bias. Might be overestimating, or underestimating them. As a
result, our t-ratios may be too small or too large. We might reject a null hypothesis
that a particular slope coefficient is equal to zero, when we shouldn’t. Or might
not reject it, when we should. Thus, statistical inference is inappropriate.
Since the consequences are severe, what do we do about it?
Begin by considering the alternative estimation procedure – Generalized Least
Squares (GLS).
Keep things simple by assuming a 2-variable model with AR(1).
Y t =  0 +  1 X t +  t .....(1)
 t =   t -1 + ut
Note that the same functional form holds for period t-1. Multiply both sides of (1)
by constant ρ:
 Y t -1 =   0 +   1 X t -1 +   t -1 …(2)
Now subtract (2) from (1).
Y t -  Y t -1 =  0 (1 -  ) +  1( X t -  X t -1 ) + (  t -   t -1 )
Y t =  0 +  1 X t + ut ....(3)
Note that disturbance term is now ut, which is ‘nicely behaved’ (i.e., zero expected
value, constant variance and no serial correlation).
The reason is that the disturbance term above can be written:
 t -   t -1 =   t -1 + ut -   t -1 = ut
where we rely on the known structure of the serial correlation.
This is known as a Generalized Difference Equation. Run OLS on the
transformed data (ie Equation 3). Coefficient estimates will be BLUE.
GLS estimation of (1) is defined to be OLS estimation of (3).
Variety of ways of diagnosing the presence of serial correlation. Check the
existing empirical literature. See it coming. Rest are ‘post-mortems’. Run OLS
and see whether you shouldn’t have.
1. Graphical Methods.
A. Plot residuals across time. Look for a 'detectable pattern'.
B. Plot residuals and lagged values in 4-quadrant Diagram.
This is just an
alternative to the
time sequence
plot. Measure
the current
residuals along
the vertical axis,
and the lagged
residuals along
the horizontal.
If the data points
end up in mostly
the first and the
third quadrants,
then you’ve got
evidence of
‘positive’ serial
correlation. Positive residuals in one period are generally followed by positive
residuals in the
next period.
Same for
The opposite
case would be
where the
residuals are
predominately in
the second and
fourth quadrants.
This suggests
negative serial
residuals are
generally followed by negative values, etc. Of course, with an absence of serial
correlation, the data points would be evenly distributed across all four quadrants.
2. Durbin-Watson d Statistic
This is the most commonly used diagnostic test of serial correlation.
Define this test statistic as:
(e - e )
d = t =2 nt 2 t -1
t =1 et
This test statistic can only be used under a number of conditions. Most
importantly, it assumes that the autoregressive structure is AR(1). In addition, the
regression model must include an intercept term, and it must not include a lagged
dependent variable as a regressor.
Now with a little algebra and some simplifying approximations, we can show the
extreme limits on the Durbin-Watson d statistic, and relate it back to the
‘structure’ of the serial correlation. Rewrite this as:
d = t = 2
et + t =2 et -1 - 2 t =2 et et -1
t =1 et
Note the following ‘approximations’:
t =2 et  t =2 et -1
t =1 et  t =2 et
Thus, we can rewrite this expression:
2( tn=1 et2 - tn=2 et et -1 )
t =1 et
d  2(1 - t =n2 t 2t -1 )
t =1 et
Since the estimated coefficient of autocovariance is (ρ) is:
̂ = t =n2 et e2t -1
t =1 et
we can finally write:
d  2(1 - ̂ )
- 1  ̂  1
0d 4
If ρ = 0, then d = 2 (absence of serial correlation).
If ρ = 1, then d = 0 (perfect positive serial correlation).
If ρ = -1, then d = 4 (perfect negative serial correlation).
The test procedure entails computing the d statistic (although software packages
will do this for you), and compare it to the relevant critical value.
One problem is that there is no ‘unique’ critical value with the d statistic. Instead,
we have 'upper' and 'lower' bounds. The reason is that the probability distribution
for the d statistic is not easily derived. It depends on the values of the explanatory
variables, which vary from sample to sample. This means that there is no single
critical value as in the t or F tests.
Three-step procedure:
(1) Run OLS. Suppose you obtain d=1.73. Null Hypothesis is H0: No serial
correlation. Alternative hypothesis is H1: Positive serial correlation.
(2) Determine dL and dU. Suppose n=100 and k=5, and we want a 5% significance
level. Tables B4 (pp 617 of Studenmund) gives us dL=1.57 and dU=1.78. (k = the
number of explanatory variables excluding the constant term, 5 in this case).
(3) Apply decision rule. Use the following diagram.
The 2 darkened areas
are often known as the
‘zones of indecision’
or ‘regions of
ignorance’. Can’t
reach any conclusions
about serial correlation
if dLddU. If d>dU,
then you can’t reject
the null. If d<dL, you
can reject the null in
favour of positive
serial correlation.
In this numerical
example, the DW
statistic falls within this ‘zone of indecision’. No conclusion is possible.
Like other statistical tests these ‘areas’ and critical values depend on the choice of
the significance level from the outset. If we’d chosen a 1% significance level, the
critical values would be 1.44 and 1.65. Now the d statistic exceeds the upper
critical value. We couldn’t reject the null hypothesis of no serial correlation.
In other regression models, our alternative hypothesis may be negative, rather than
positive serial correlation. This testing regime is the mirror image of the other.
Here the critical values are 4-dU and 4-dL. Get these from the same tables. Follow
similar procedures.
III. Remedial Measures
1. GLS
Structure of serial correlation is known:
Run OLS on the following transformed model:
Y t =  0 +  1 X t + ut
ρ is not known (Feasible GLS or FGLS):
KEY: Almost the same as GLS. The difference is that we need an estimate of ρ.
(i). The Cochrane-Orcutt Iterative Procedure.
Probably the most commonly used procedure for coming up with an estimate of
ρ. Part of most software packages.
Again, use a 3-Step procedure:
(1) Run OLS on Model (1). Retain residuals.
(2) Estimate following auxiliary regression:
et =  et -1 + vt
(3) Using the 'first estimate' of ρ from Step 2, transform the data and run OLS
Y t - ˆ Y t -1 =  0 (1 - ˆ ) +  1( X t - ˆ X t -1 ) + ut
Repeat the second step, where:
et = Y t - ˆ 0 - ˆ1 X t
Step 2' is now:
et =  et -1 + wt
Step 3' is now:
Y t - ˆ 2 Y t -1 =  0 (1 - ˆ 2 ) +  1( X t - ˆ 2 X t -1 ) + ut
where we use the 'second estimate' of ρ from Step 2'.
This is an 'iterative' procedure. Begin with a relatively inefficient estimate of β
coefficients and ρ. Use this to transform data and get a ‘better’ estimate of both
βs and ρ. Each iteration keeps improving our estimates.
We keep going until successive estimates of ρ change by something less than
some predetermined value (e.g., .001). Idea is that the estimates eventually ‘settle
down’ to some specific value.
(ii). Simultaneous estimation of  0 , 1 ,  .
2. Using OLS with Newey-West standard errors.
Serial correlation does not cause bias of OLS estimates but impacts the standard
errors. Newey-West technique directly adjusts the standard errors of OLS
estimates to take account of serial correlation.
IV. Questions for Discussion: Q9.11, Q9.13
V. Computing Exercise: Johnson, Ch 9