Lecture 14: Heteroskedasticity and Serial Correlation

Lecture 14:
Heteroskedasticity
and Serial
Correlation
Heteroskedasticity
(Chapter 10.6–10.7)
Serial Correlation
(Chapter 11.1–11.3)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
Agenda
• Review
• Feasible GLS (Chapter 10.6)
• White Robust Standard Errors (Chapter 10.7)
• Serial Correlation (Chapter 11.1)
• OLS and Serial Correlation (Chapter 11.2)
• Newey–West Estimated Standard Errors
(Chapter 11.2)
• Testing for Serial Correlation (Chapter 11.3)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-2
Review
• In the last lecture, we began relaxing the
Gauss–Markov assumptions, starting with the
assumption of homoskedasticity.
• Under heteroskedasticity, Var(i )   2 di
– OLS is still unbiased.
– OLS is no longer efficient.
– OLS e.s.e.’s are incorrect, so C.I., t-, and
F- statistics are incorrect.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-3
Review (cont.)
• Under heteroskedasticity,
2
2 2
ˆ
Var (  )   w d
i
i
• For a straight line through the origin,
2 2

X
OLS
2
i di
ˆ
Var (  )  
(X i 2 )2
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-4
Review (cont.)
• We can use squared residuals to test
for heteroskedasticity.
• In the White test, we regress the
squared residuals against all
explanators, squares of explanators,
and interactions of explanators. The
nR2 of the auxilliary equation is
distributed Chi-squared.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-5
Review (cont.)
• The Breusch–Pagan test is similar, but
the econometrician chooses the
explanators for the auxilliary equation.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-6
Review (cont.)
• Under heteroskedasticity, the BLUE Estimator
is Generalized Least Squares.
• To implement GLS:
1. Divide all variables by di
2. Perform OLS on the transformed variables.
• If we have used the correct di , the
transformed data are homoskedastic.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-7
Review (cont.)
• For example, consider the relationship
renti  0  1incomei   i
• We are concerned that Var(i) may vary
with income.
• We need to make an assumption about
how Var(i) varies with income.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-8
Review (cont.)
2
2
Var
(

)


·
income
• An initial guess:
i
i
• di = incomei
• If we have modeled heteroskedasticity
correctly, then the BLUE Estimator is
rent
1
 0
 1  vi
income i
incomei
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-9
Review (cont.)
• If we have the correct model of
heteroskedasticity, then OLS with the
transformed data should be homoskedastic.
rent
1
 0
 1  vi
income i
incomei
• Using a White test, we reject the null
hypothesis of homoskedasticity of the model
with transformed data.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-10
GLS: An Example
• Our first guess didn’t work very well.
2
Var
(

)


·incomei
• Let’s try
i
rent
income i
 0
1
income i
 1 incomei  vi
• This time, we fail to reject the null
hypothesis of homoskedasticity.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-11
Feasible GLS (Chapter 10.6)
• We usually do NOT know di, so GLS
is infeasible.
• We can, however, ESTIMATE di
• We call GLS with estimates of di
“Feasible Generalized Least Squares.”
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-12
Feasible GLS (cont.)
• To begin, we need to assume some
model for the heteroskedasticity.
• Then we estimate the parameter/s of
the model.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-13
Feasible GLS (cont.)
• One reasonable model for the error terms
could be that the variance is proportional to
some power of the explanator.
Var(i )   X
2
h
i
• For example, in the rent-income example, we
tried both
Var( i )   income (h  2)
2
2
i
and Var( i )   2 incomei (h  1)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-14
Feasible GLS (cont.)
• To implement FGLS, we have assumed
Var(i )   2 X ih
• To estimate this equation using linear
regression methods, we can take
advantage of the properties of logs:
ln(ab )  b·ln(a) AND ln(ab)  ln(a)  ln(b)
• Regress
ln(ei )  ln( )  hln( X i )  i
2
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
2
14-15
Feasible GLS (cont.)
1. Estimate the regression with OLS.
2. Regress
2
2
ln(ei )  ln( )  hln( X i )  i
3. Divide every variable by:
hˆ
di  X i  X i
hˆ
2
4. Apply OLS to the transformed data.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-16
Feasible GLS (cont.)
• FGLS is not a mechanistic procedure.
• The econometrician may prefer other
methods of estimating di
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-17
Feasible GLS (cont.)
• Applying FGLS to the rent-income
example, our estimated h value is 1.21
• We should divide all our variables by
incomei0.605. This is very close to dividing
by the square root of income, as we did
in the second part of the example.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-18
TABLE 10.7 ln(Squared Residual) vs.
ln(Income) Following RENT vs. INCOME by OLS
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-19
White Robust Standard Errors
(Chapter 10.7)
• Heteroskedasticity is a common problem.
• We may not always be happy making the
FGLS assumptions, especially if we don’t
really need that extra efficiency.
• OLS is unbiased. OLS may yield a sufficiently
small standard error to allow reasonably
precise estimates.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-20
White Robust Standard Errors (cont.)
• The main problem in applying OLS
under heteroskedasticity is that our
e.s.e. formula is incorrect
• White’s brilliant idea: use OLS and fix
the estimated standard errors!
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-21
White Robust Standard Errors (cont.)
• For OLS with an intercept and a single
explanator, Yi  0  1 X i   i , we have derived the
formula for the e.s.e:
2

e
i
e.s.e.( ˆ1 ) 
(n  2)xi 2
• However, we really used the homoskedasticity
assumption only to simplify this formula.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-22
White Robust Standard Errors (cont.)
• If we do not impose homoskedasticity,
we get a slightly more complicated
formula:
White e.s.e.( ˆ1 ) 
xi ei
2 2
(xi )
2
2
• The computer can easily perform
this calculation.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-23
White Robust Standard Errors (cont.)
• White Heteroskedastic Consistent standard
errors (commonly called “robust” standard
errors) correct for possible heteroskedasticity
• Software packages often provide White
e.s.e.’s as an option
• If errors are homoskedastic, White e.s.e.’s
are less efficient than OLS e.s.e.’s
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-24
TABLE 10.8 OLS Estimates of the Rent–
Income Relationship with Robust Standard
Errors
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-25
White Robust Standard Errors
• Applying White estimated standard errors is a
very easy fix for possible heteroskedasticity.
• Some economists simply use White
e.s.e.’s routinely.
• This fix comes with a cost in efficiency:
– OLS is not BLUE under heteroskedasticity.
– White e.s.e.’s are inefficient under
homoskedasticity.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-26
White Robust Standard Errors (cont.)
• Note: It is CRUCIAL, when you
present your own results, that you clarify
which e.s.e. you have used. If you do use
White standard errors, you MUST say so.
• For example, many tables of results include
the footnote “White standard errors in
parentheses” or “Robust standard errors
in parentheses.”
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-27
Heteroskedasticity
• Heteroskedasticity is not, in practice, a
burdensome complication.
• Econometricians have easy-to-apply tests
to detect heteroskedasticity (White tests,
Breusch–Pagan tests, or Goldfeld–
Quandt tests).
• If there is heteroskedasticity, econometricians
have a number of options available.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-28
Heteroskedasticity (cont.)
• If econometricians know the exact
nature of the heteroskedasticity (i.e. if
they know the di), then they can simply
divide all variables by di and apply GLS.
• If the di are unknown, but
econometricians are willing to make
some assumptions about their functional
form, then the di can be estimated by
FGLS.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-29
Heteroskedasticity (cont.)
• If econometricians are unwilling to
make assumptions about the nature of
the heteroskedasticity, they can
implement OLS to get unbiased, but
inefficient, estimates.
• Then they must correct the estimated
standard errors using White Robust
Standard Errors.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-30
Serial Correlation (Chapter 11.1)
• Now let’s relax a different Gauss–
Markov assumption.
• What if the error terms are correlated with
one another?
• If I know something about the error term for
one observation, I also know something about
the error term for another observation.
• Our observations are NOT independent!
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-31
Serial Correlation (cont.)
• Serial Correlation frequently arises when
using time series data (so we will index our
observations with t instead of i)
• The error term includes all variables not
explicitly included in the model.
• If a change occurs to one of these
unobserved variables in 1969, it is quite
plausible that some of that change will still be
evident in 1970.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-32
Serial Correlation (cont.)
• In this lecture, we will consider a
particular form of correlation among
error terms.
• Error terms are correlated more heavily
with “nearby” observations than with
“distant” observations.
• E.g., cov(1969,1970) > cov(1969,1990)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-33
Serial Correlation (cont.)
• For example, inflation in the United States
has been positively serially correlated for at
least a century. We expect above average
inflation in a given period if there was above
average inflation in the preceding period.
• Let’s look at DEVIATIONS in US inflation
from its mean from 1923–1952 and from
1973–2002. There is greater serial correlation
in the more recent sample.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-34
Figure 11.1
U.S. Inflation’s
Deviations from
Its Mean
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
Serial Correlation: A DGP
• We assume that covariances depend only on
the distance between two time periods, |t-t’|
Yt  0  1 X 1t 1 .. k X kt   t
E( t )  0
Var( t )   2
Cov( t , t ' )   tt ' ,  tt '  0 for some t  t '
Specifically:  tt '   |t t '| for all t, t '
X's fixed across samples
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-36
OLS and Serial Correlation (Chapter 11.2)
• The implications of serial correlation
for OLS are similar to those of
heteroskedasticity:
– OLS is still unbiased
– OLS is inefficient
– The OLS formula for estimated standard
errors is incorrect
• “Fixes” are more complicated
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-37
OLS and Serial Correlation (cont.)
• The Gauss–Markov covariance assumption:
Cov( t ,  t’ )  0 for t  t’
• The expectation of a linear estimator is NOT
affected by the covariance assumption.
• The variance of a linear estimator is greatly
affected by the covariance assumption.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-38
OLS and Serial Correlation (cont.)
Var ( ˆS )  Var (wtYt )
T
 Var ( wtYt )   Cov( wtYt , wt 'Yt ' )
t 1 t ' t
T
 wt Var (Yt )   wt wt 'Cov(Yt , Yt ' )
2
t 1 t ' t
T
  wt   wt wt '  |t t '|
2
2
t 1 t ' t
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-39
OLS and Serial Correlation (cont.)
• We can write Var(t ) as Cov(t ,t ) = 0
• This notation lets us simplify the expression
for the variance of a linear estimator:
T
T
Var ( ˆ )   wt wt '  |t t '|
t 1 t '1
• Note the change in the bounds of the sums.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-40
OLS and Serial Correlation (cont.)
• The BLUE Estimator solves
T
min wt
T
 w w
t 1 t '1
t
t'
 |t t '|
such that
wt X Rt  0 for R  S and wt X St  1
• OLS is NOT the solution to this problem.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-41
OLS and Serial Correlation (cont.)
• As with heteroskedasticity, we have
two choices:
1. We can transform the data so that the
Gauss–Markov conditions are met, and
OLS is BLUE; OR
2. We can disregard efficiency, apply OLS
anyway, and “fix” our formula for
estimated standard errors.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-42
Newey–West E.S.E.’s
• We will first consider the strategy of
“fixing” the estimated standard errors
from OLS.
• Newey–West Serial Correlation
Consistent Standard Errors
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-43
Newey–West E.S.E.’s (cont.)
T
T
Var ( ˆ )   wt wt '  |t t '|
t 1 t '1
For the case Yi   0  1 X i   i ,
wˆ i 
1
xt
T
2
x
 s
s 1
T
T
Var ( ˆ1 )  
t 1 t '1
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
xt xt '

2
  xs 
 s 1 
T
 |t t '|
14-44
Newey–West E.S.E.’s (cont.)
To estimate Var ( ˆ ) 
1
T
T
 
t 1 t '1
xt xt '

  xs 
 s 1 
T
 |t t '|
2
we want to replace  |t t '| with an estimate, et et '
However, there are far too many covariances to
estimate.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-45
Newey–West E.S.E.’s (cont.)
T
T
We need to estimate Var ( ˆ1 )  
t 1 t '1
xt xt '

2
  xs 
 s 1 
T
 |t t '|
• Whitney Newey and Ken West
suggested a simplification. Instead of
estimating ALL the covariances, Newey
and West suggested estimating only the
most important covariances.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-46
Newey–West E.S.E.’s (cont.)
T
T
We need to estimate Var ( ˆ1 )  
t 1 t '1
xt xt '

2
  xs 
 s 1 
T
 |t t '|
• Remember, observations are more correlated
with each other the closer they are.
• Cov(1969 ,1970) > Cov(1969 ,1990)
• As |t-t’| grows large, |t-t’| approaches 0.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-47
Newey–West E.S.E.’s (cont.)
T
T
We need to estimate Var ( ˆ1 )  
t 1 t '1
xt xt '

2
  xs 
 s 1 
T
 |t t '|
• As |t-t’| grows large, |t-t’| approaches 0
• Newey–West e.s.e.’s require econometricians
to make a judgment about the distance |t-t’|
after which they can ignore |t-t’|
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-48
Newey–West E.S.E.’s (cont.)
T
T
We need to estimate Var ( ˆ1 )  
t 1 t '1
xt xt '

2
  xs 
 s 1 
T
 |t t '|
• The first step in estimating Newey–West
Standard Errors is to choose a lag, L
• Then we assume |t-t’| ≈ 0 for all |t-t’| > L
• The choice of L is a judgment call.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-49
Checking Understanding
• You are working with time series data about a
company’s investment and profits. You have
quarterly data (adjusted for seasonality). You
are worried that a shock to the company’s
profitability in one quarter could continue into
the next several quarters, so you decide to
use Newey–West Standard Errors.
• What L should you choose if you believe any
shocks will dissipate within one year? Within
two years?
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-50
Checking Understanding (cont.)
• You are working with quarterly data
(adjusted for seasonality).
• If shocks will dissipate within one year:

With quarterly data, a shock will dissipate
within 4 periods, so you need to set L = 4
(You could set L > 4 to be safe, but each
lag makes your e.s.e. slightly less efficient).
• If shocks will dissipate within two years:

L = 8 (at least)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-51
Newey–West E.S.E.’s
T
T
We need to estimate Var ( ˆ1 )  
t 1 t '1
xt xt '

2
  xs 
 s 1 
T
 |t t '|
• We assume |t-t’| ≈ 0 for all |t-t’| > L
• The choice of L is a judgment call.
• L = 4, L = 8, and L = 12 are typical choices.
• Choosing L requires you to consider the
ECONOMICS of the problem.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-52
Newey–West E.S.E.’s (cont.)
To estimate Var ( ˆ ) 
1
T
T
 
t 1 t '1
xt xt '

  xs 
 s 1 
T
 |t t '|
2
replace  |t t '| with et et ' if | t - t ' |  L,
replace  |t t '| with 0 if | t - t ' |  L.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-53
Newey–West E.S.E.’s (cont.)
Newey-West e.s.e.'s correct for serial correlation :
e.s.e.( ˆ1 ) 
T
tL

xt xt '
et et '

2
x
 s 
 s 1 
White Robust e.s.e.'s correct for heteroskedasticity:
t 1 t 't  L
e.s.e.( ˆ1 ) 
T
xt 2 et 2
 x 
2 2
t
Question: How can you correct e.s.e.'s for
BOTH serial correlation AND heteroskedasticity?
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-54
Newey–West E.S.E.’s (cont.)
Question: How can you correct e.s.e.'s for
BOTH serial correlation AND heteroskedasticity?
Answer: Use Newey–West e.s.e.'s.
Newey–West ALSO corrects for heteroskedasticity!
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-55
Checking Understanding
To see that Newey–West e.s.e.'s ALSO correct
for heteroskedasticity, consider the case L  0:
e.s.e.( ˆ1 ) 

T
tL

t 1 t 't  L
T

xt xt '

2
x
 s 
 s 1 
T
et et ' 
T
t
 
t 1 t 't
xt xt '
2
x
 s 
 s 1 
T
et et '
xt 2 et 2
2

2
x
 s 
 s 1 
which is the formula for White Robust e.s.e.'s!
t 1
T
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-56
Newey–West E.S.E.’s
• How do we implement Newey–West
e.s.e.’s using our software?
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-57
Newey–West E.S.E.’s (cont.)
• Using Newey–West e.s.e.’s, we simply
conduct OLS as before, but tell the computer
to use the Newey–West formula for
estimating standard errors.
• One drawback: OLS is not efficient. There
exists an unbiased linear estimator with a
lower variance.
• Another drawback: we have to choose the
number of lags, L, to include in the model.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-58
Durbin–Watson Test (Chapter 11.3)
• How do we test for serial correlation?
• As with Newey–West e.s.e.’s, we need to
limit the number of correlations we handle.
• James Durbin and G.S. Watson proposed
testing for correlation in the error terms
between adjacent observations.
• In our DGP, we assume the
strongest correlation exists between
adjacent observations.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-59
Durbin–Watson Test (cont.)
• Correlation between adjacent disturbances is
called “first-order serial correlation.”
• To test for first-order serial correlation, we ask
whether adjacent ’s are correlated.
• As usual, we’ll use residuals to proxy for 
• The trick is constructing a test statistic for
which we know the distribution (so we can
calculate the probability of observing the data,
given the null hypothesis).
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-60
Durbin–Watson Test (cont.)
• We end up with a somewhat
opaque test statistic for first-order
serial correlation
T
d 
t 2
(et  et 1 )
2
T
e
2
t
t 1
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-61
Durbin–Watson Test (cont.)
• To see the test statistic, it is helpful to
expand the numerator:
T
d 
(et  et 1 )2
T
e
t 2
2
t
t 1
T
T
e  e
2

2
t 1
t
t 2
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
T
 2 et et 1
t 2
t 2
T
2
e
t
t 1
14-62
Durbin–Watson Test (cont.)
• In large samples, we can divide the
numerator by (T-k-2) and the
denominator by (T-k-1) without creating
much of a bias.
d
T
T
T
1
1
1
2
2
e

e
2
et et 1



t
t 1
T  k  2 t 2
T  k  2 t 2
T  k  2 t 2
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
T
1
2
e

T  k 1 t 1 t
14-63
Durbin–Watson Test (cont.)
In large samples,
T
T
T
1
1
1
2
2
2
e
,
e
,
e



t
t 1
T  k  2 t 2
T  k  2 t 2
T  k 1 t 1 t
all approximate s2, an estimate of
the variance of the error term.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-64
Durbin–Watson Test (cont.)
• In large samples,
T
1
et et 1

T  k  2 t 2
approximately estimates the covariance
between adjacent error terms. If there is
no first-order serial correlation, this term
will collapse to 0.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-65
Durbin–Watson Test (cont.)
• In large sample, the Durbin–Watson
statistic approximates
2  2Cov(t ,t 1 )
2

2
 2
2Cov(t ,t 1 )
2
• Under the null hypothesis of no
first-order serial correlation, d ≈ 2
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-66
Durbin–Watson Test (cont.)
• When the Durbin–Watson statistic, d, gives
a value far from 2, then it suggests the
covariance term is not 0 after all
• i.e., a value of d far from 2 suggests the
presence of first-order serial correlation
• At the most extreme, Cov(t ,t-1) is bounded
by – 2 and  2
• d is bounded between 0 and 4
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-67
Durbin–Watson Test (cont.)
• It seems a bit roundabout to estimate
d  2
2Cov( t , t 1 )
2
when what we care about directly is
Cov( t , t 1 )
• We estimate d because we know
something about its distribution.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-68
Durbin–Watson Test (cont.)
• Next time, we will see how to interpret
the Durbin–Watson statistic.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-69
Review (cont.)
• Under heteroskedasticity,
2
2 2
ˆ
Var (  )   w d
i
i
• For a straight line with an unknown intercept,
2 2

x
OLS
2
i di
ˆ
Var (  )  
2 2
(xi )
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-70
Review (cont.)
• We can correct for heteroskedasticity by
dividing our variables through by di
(implementing Generalized Least Squares).
• The catch is that we don’t observe di.
• We can guess what di is, and support our
conjecture using White or Breusch–Pagan
tests on the GLS model.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-71
Review (cont.)
• Alternatively, we can estimate the
di through Feasible Generalized
Least Squares.
• FGLS requires us to write down a
specific model for the heteroskedastic
error terms, but we let the data choose
the key parameter/s.
• We learned one important FGLS model.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-72
Review (cont.)
• To implement FGLS, we have assumed
Var(i )   2 X ih
• To estimate this equation using linear
regression methods, we can take advantage
of the properties of logs:
ln(ab )  b·ln(a) AND ln(ab)  ln(a)  ln(b)
• Regress
ln(ei 2 )  ln( 2 )  hln( X i )  i
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-73
Review (cont.)
1. Estimate the regression with OLS.
2. Regress
2
2
ln(ei )  ln( )  hln( X i )  i
3. Divide every variable by
hˆ
di  X i  X i
hˆ
2
4. Apply OLS to the transformed data.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-74
Review (cont.)
• Heteroskedasticity is pretty common.
• We may not always be happy making the
FGLS assumptions, especially if we don’t
really need that extra efficiency.
• OLS is unbiased. With a reasonable
sample size, OLS may yield a sufficiently
small standard error to allow reasonably
precise estimates, if we use White robust
standard errors.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-75
Review (cont.)
• For example, for the case of a line with
only one explanator and an intercept,
2 2

x
i ei
ˆ
White e.s.e.( 1 ) 
2 2
(xi )
• The computer can easily perform this
calculation instead of the simpler,
homoskedastic version.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-76
Review (cont.)
• Our serial correlation DGP assumes
that covariances depend only on |t-t’|
Yt  0  1 X 1t  ... k X kt   t
E( t )  0
Var( t )   2
Cov( t , t ' )   tt ' ,  tt '  0 for some t  t '
Specifically:  tt '   |t t '| for all t, t '
X's fixed across samples
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-77
Review (cont.)
• The implications of serial correlation
for OLS are similar to those of
heteroskedasticity:
– OLS is still unbiased.
– OLS is inefficient.
– The OLS formula for estimated standard
errors is incorrect.
• “Fixes” are more complicated.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-78
Review (cont.)
• As with heteroskedasticity, we have
two choices:
1. We can transform the data so that the
Gauss–Markov conditions are met, and
OLS is BLUE; OR
2. We can disregard efficiency, apply OLS
anyway, and “fix” our formula for
estimated standard errors.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-79
Review (cont.)
• We first consider the strategy of
“fixing” the estimated standard errors
from OLS.
• We can get “correct” e.s.e.’s by
estimating “Newey–West Serial
Correlation Consistent Standard Errors.”
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-80
Review (cont.)
For the case Yi   0  1 X i   i ,
T
T
To estimate Var ( ˆ1 )  
 |t t '|

2
  xs 
 s 1 
with an estimate, et et '
t 1 t '1
we want to replace  |t t '|
xt xt '
T
However, there are far too many covariances to
estimate.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-81
Review (cont.)
T
T
We need to estimate Var ( ˆ1 )  
t 1 t '1
xt xt '

2
  xs 
 s 1 
T
 |t t '|
• The first step in estimating Newey–West
Standard Errors is to choose a lag, L
• Then we assume |t-t’| ≈ 0 for all |t-t’| > L
• The choice of L is a judgment call.
• L = 4, L = 8, and L = 12 are typical choices.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-82
Review (cont.)
• The Durbin–Watson test checks for
first-order serial correlation:
T
d 
t 2
(et  et 1 )
2
T
e
2
t
t 1
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-83
Review (cont.)
• In large sample, the Durbin–Watson
statistic approximates
2  2Cov(t ,t 1 )
2

2
 2
2Cov(t ,t 1 )

2
• Under the null hypothesis of no
first-order serial correlation, d ≈ 2
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
14-84