Chapter 9
Heteroskedasticity
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Learning Objectives
• Understand methods for detecting
heteroskedasticity
• Correct for heteroskedasticity
9-2
What is Heteroskedasticity?
Heteroskedasticity is when the error term has a nonconstant variance or π‘‰π‘Žπ‘Ÿ πœ€ = πœŽπ‘–2 .
Homoskedasticity is when the error term has a nonconstant variance π‘‰π‘Žπ‘Ÿ πœ€ = 𝜎 2 .
Notice that for homoskedasticity there is no i
subscript so that the variance is constant while for
heteroskedasticity the i subscript denotes that the
variance changes for each observation
9-3
A Picture of Homoskedasticity
Versus Heteroskedasticity
9-4
The Issues And Consequences
Associated With Heteroskedastic Data
Problem:
Heteroskedasticity violates assumption M6, which
states that the error term must have constant variance.
Consequences:
Under heteroskedasticity parameter estimates are
unbiased.
Parameter estimates are not minimum variance among
all unbiased estimators.
Estimated standard errors are incorrect and all
measures of precision based on the estimated standard
errors are also incorrect.
9-5
Goals of this Chapter
9-6
An Important Caveat before Continuing
• With more advanced statistical packages, many
researchers include a very simple command asking
their chosen statistical program to provides standard
error estimates that automatically correct for
heteroskedasticity (White’s heteroskedastic consistent
standard errors)
• Even though correcting for heteroskedasticity is
straightforward, it important to first work through the
more “old-school” examples that we do below before
learning how to calculate White’s heteroskedastic
consistent standard errors.
9-7
Understand Methods For
Detecting Heteroskedasticity
Informal methods
- Graphs
Formal methods using statistical tests
- Breusch-Pagan test
- General White’s Test
- Modified White’s Test
- Goldfeld-Quandt Test
9-8
Informal Method
Either graph:
(1) The dependent variable against each
independent variable…
(2) The residuals against each independent
variable…
(3) The residuals squared against each independent
variable…
(4) The standardized residuals against each
independent variable…
and look for a pattern in the dispersion of the
observations. If a pattern exists then that is
evidence of heteroskedasticity.
9-9
Regression of Number of Olympic Medals on
per capita GDP by Country
9-10
Notice how the variance increases
as the independent variable
increases. This is evidence of
heteroskedasticity.
9-11
This residual plot is obtained by
checking the residual plot option
in Excel when running a
regression.
As in the previous slide, notice
how the variance increases as the
independent variable (GDP per
Capita) increases. This is evidence
of heteroskedasticity.
9-12
The primary drawback of the informal method is
that it is not clear how much of a pattern needs
to exist to lead us to the conclusion that the
model is heteroskedastic.
This leads us to the need for formal tests of
heteroskedasticity.
9-13
Formal Methods for Detecting
Heteroskedasticity
The formal methods that we consider are all
based on statistical tests of the following general
null and alternative hypotheses
𝐻0 : the error term is homoskedastic
𝐻1 : the error term is heteroskedastic
9-14
Testing for Heteroskedasticity
(1) Breusch - Pagan
(2) Modified White’s Test
(3) Goldfeld-Quandt Test
9-15
Breusch-Pagan Test
How to do it:
(1) Estimate the population regression model 𝑦𝑖 =
𝛽0 + 𝛽1 π‘₯1𝑖 + 𝛽2 π‘₯2𝑖 + β‹― + π›½π‘˜ π‘₯π‘˜π‘– + πœ€π‘– and obtain
the residuals, 𝑒𝑖 .
(2) Square the residuals or 𝑒𝑖2 .
(3) Estimate the population regression model 𝑒𝑖2 =
𝛾0 + 𝛾1 π‘₯1𝑖 + 𝛾2 π‘₯2𝑖 + β‹― + π›Ύπ‘˜ π‘₯π‘˜π‘– + πœ‘
(4) Perform an F-test for overall significance to see if
the squared residuals are statistically related to any
of the independent variables.
9-16
Breusch-Pagan Test
Why It Works:
If the squared residuals are found to be
statistically related to the independent variables
then we conclude that the data are
heteroskedastic and we should take the
appropriate steps to correct for the problem.
9-17
Breusch-Pagan Test for Olympic Medal vs GDP per
Capita Data Dependent Variable is Residuals Squared
The significant F is much less than
0.05 (or 0.01 for that matter) so we
reject the null hypothesis of
homoskedasticity and conclude
model is heteroskedastic.
9-18
Modified White’s Test
How to do it:
(1) Estimate the population regression model 𝑦𝑖 =
𝛽0 + 𝛽1 π‘₯1𝑖 + 𝛽2 π‘₯2𝑖 + β‹― + π›½π‘˜ π‘₯π‘˜π‘– + πœ€π‘– and obtain
the residuals, 𝑒𝑖 , and predicted values.
(2) Square the residuals.
(3) Estimate the population regression model
𝑒𝑖2 = 𝛿0 + 𝛿1 𝑦𝑖 + 𝛿2 𝑦𝑖2 + 𝑒𝑖
(4) Perform an F-test for overall significance to see if
the squared residuals are statistically related to the
𝑦𝑖 and 𝑦𝑖2 variables.
9-19
Modified White’s Test
Why It Works:
This test works for the same reason that that
Breusch-Pagan test works. The primary difference is
that the 𝑦𝑖 and 𝑦𝑖2 variables are a function of the
independent variables, the independent variables
squared, and the cross-products of the independent
variables, meaning that including those terms in the
squared residual regression tests whether the
squared residuals are a function of all of those
terms rather than a function of the independent
variables alone.
9-20
Modified White’s Test for Olympic Medal vs GDP per
Capita Data Dependent Variable is Residuals Squared
The significant F is much less than
0.05 (or 0.01 for that matter) so we
reject the null hypothesis of
homoskedasticity and conclude the
model is heteroskedastic.
9-21
Goldfeld-Quandt Test
How to do it:
(1) Identify which independent variable is suspected of
contributing towards heteroskedasticity and sort the data
from smallest to largest on that variable.
(2) Omit the middle 𝑐 observations.
(3) Run two regressions with the remaining (𝑛 − 𝑐) observations.
π‘ˆπ‘†π‘†2
,
π‘ˆπ‘†π‘†1
(4) Form the test statistic 𝐺𝑄 =
where π‘ˆπ‘†π‘†2 is the larger
value (because the 𝐹 − π‘ π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘ must be greater than or
equal to 1).
(5) Reject the null hypothesis of homoskedasticity if GQ >
𝐹𝑛1−π‘˜1,𝑛2−π‘˜2,.05 .
9-22
Goldfeld-Quandt Test
Why It Works:
This test works when the suspected
heteroskedasticity is of the type that the error
variances either increase (or decrease) with the
value of a given independent variable. If we find
that the unexplained sum of squares for the largest
values is “large” relative to the unexplained sum of
squares for the smallest values, then we conclude
that the error variance changes significantly with
the value of the independent variable, suggesting
that the data are heteroskedastic.
9-23
Goldfeld-Quandt Test
How to do it:
For the Olympic Medal Data, there are 408
observations. Dividing the data into thirds, the first
regression should contain the smallest 136 (408/3)
GDP per capita data, and the second regression
should contain the largest 136 GDP per capita data.
9-24
USS1
9-25
USS2
9-26
Goldfeld-Quandt Test Example
𝐺𝑄 =
π‘ˆπ‘†π‘†2
π‘ˆπ‘†π‘†1
=
63,534.37
=3.4259
18,545.19
Critical Value = 𝐹∞,∞,0.05 = 1
Because 3.4259 > 1 we reject the null hypothesis of
homoskedasticity and conclude that the model is
heteroskedastic.
9-27
Correcting for Heteroskedasticity
(1) Weighted least squares
(2) White’s heteroskedastic consistent standard
errors
9-28
Weighted Least Squares
How to Do It:
(1) Assume the form of heteroskedasticity, say
π‘‰π‘Žπ‘Ÿ πœ€ = 𝜎 2 β„Ž(π‘₯).
(2) Create new variables by dividing through by the
square root of β„Ž(π‘₯)
∗
𝑦𝑖∗ = 𝑦𝑖 β„Ž(π‘₯), π‘₯0∗ = 1 β„Ž(π‘₯) , π‘₯1𝑖
= π‘₯1𝑖 β„Ž π‘₯ ,
∗
∗
π‘₯2𝑖
= π‘₯2𝑖 β„Ž(π‘₯), …, π‘₯π‘˜π‘–
= π‘₯π‘˜π‘– β„Ž(π‘₯).
(3) Estimate the population regression model 𝑦𝑖∗ =
∗
𝛽0 π‘₯0∗ + 𝛽1 π‘₯1𝑖
+ πœ€π‘–∗ .
9-29
Weighted Least Squares
Why It Works:
Weighted least squares changes the model from one
that was initially heteroskedastic into one that is
homoskedastic.
The new error term πœ€ ∗ = πœ€/ β„Ž(π‘₯) has variance
π‘‰π‘Žπ‘Ÿ πœ€ ∗ = 𝜎 2 β„Ž(π‘₯)/( β„Ž π‘₯ )2 = 𝜎 2 .
This only works as long as the assumed form of
heteroskedasticity is correct.
9-30
Weighted Least Squares Example
Assume that the form of heteroskedasticity is
π‘‰π‘Žπ‘Ÿ πœ€ = 𝜎 2 πΊπ·π‘ƒπ‘π‘’π‘ŸπΆπ‘Žπ‘π‘–π‘‘π‘Žπ‘–
so that
β„Ž π‘₯ = πΊπ·π‘ƒπ‘π‘’π‘ŸπΆπ‘Žπ‘π‘–π‘‘π‘Žπ‘–
β„Ž π‘₯ = πΊπ·π‘ƒπ‘π‘’π‘ŸπΆπ‘Žπ‘π‘–π‘‘π‘Žπ‘–
The transformed variables are
π‘€π‘’π‘‘π‘Žπ‘™π‘ π‘–
∗
𝑀𝑒𝑑al𝑠 =
πΊπ·π‘ƒπ‘π‘’π‘ŸπΆπ‘Žπ‘π‘–π‘‘π‘Žπ‘–
9-31
Weighted Least Squares Example
The transformed variables are
𝑀𝑒𝑑al𝑠𝑖∗
=
π‘–π‘›π‘‘π‘’π‘Ÿπ‘π‘’π‘π‘‘ ∗
π‘€π‘’π‘‘π‘Žπ‘™π‘ π‘–
πΊπ·π‘ƒπ‘π‘’π‘ŸπΆπ‘Žπ‘π‘–π‘‘π‘Žπ‘–
=
1
πΊπ·π‘ƒπ‘π‘’π‘ŸπΆπ‘Žπ‘π‘–π‘‘π‘Žπ‘–
πΊπ·π‘ƒπ‘π‘’π‘ŸπΆπ‘Žπ‘π‘–π‘‘π‘Žπ‘–∗
=
πΊπ·π‘ƒπ‘π‘’π‘ŸπΆπ‘Žπ‘π‘–π‘‘π‘Žπ‘–
πΊπ·π‘ƒπ‘π‘’π‘ŸπΆπ‘Žπ‘π‘–π‘‘π‘Žπ‘–
9-32
Weighted Least Squares Example
Excel Results
9-33
Breusch-Pagan Test of Transformed
Weighted Least Squares Data
Unfortunately, even after
the transformation this
model still suffers from
heteroskedasticity
9-34
Robust Standard Errors
The preferred method to correct for heteroskedasticity is
to use White’s heteroskedastic consistent standard
errors.
The coefficient estimates are still unbiased so the only
thing that needs to be corrected are the standard errors.
In STATA, the command is
reg y x1 x2 x3, robust
The ,robust (or even ,r) is the portion of the command
that corrects the standard errors.
9-35
STATA Results with Original Standard Errors
STATA Results with Robust Standard Errors
9-36