Topic 7: Heteroskedasticity

advertisement
Topic 7: Heteroskedasticity
Advanced Econometrics (I)
Dong Chen
School of Economics, Peking University
1
Introduction
If the disturbance variance is not constant across observations, the regression is
heteroskedastic. That is,
Var (εi ) = σi2 , i = 1, ..., n.
(1)
We continue to assume that disturbances are pairwise uncorrelated. This implies
that
 2

σ1 0 · · · 0
 0 σ22
0 


0
2
E (εε ) = σ Ω =  .
(2)
..  .
.
..
 ..
. 
0
0 · · · σn2
Heteroskedasticity may arise in many applications, especially in cross-sectional
data.
Example 1: (i) The variation in profits of large firms may be greater than that
of small ones, even after accounting for differences in firm size.
(ii) The variation of expenditure on certain commodity groups may be higher
for high-income families than for low-income ones.
(iii) When estimate the return to education, ability is unobservable and thus
it enters the disturbance. It’s possible that the variance of ability varies with
the level of education.
(iv) Sometimes heteroskedasticity is a consequence of aggregation (e.g. taking average) of data.
By eyeballing the patterns of residuals from OLS estimation, we may find
some evidence of heteroskedasticity.
Example 2: Consider the following model.
2
EXP = β1 +β2 AGE +β3 IN COM E +β4 (IN COM E) +β5 OW N ER+ε, (3)
where EXP is the credit card expenditure and OW N ER is a dummy variable indicating whether an individual owns a house. Model (3) is estimated by
OLS and the residuals are saved. In Figure 1 the residuals are plotted against
IN COM E and in Figure 2 against AGE. In Figure 1, the spread of the residuals become wider for higher income, while in Figure 2 the distribution of the
residuals are largely random. Figure 1 and 2 suggest that a common cause of
1
2 Consequences
2
Fig. 1: Plot of the OLS Residuals against INCOME
heteroskedasticity is that the variances of the disturbance terms may depend
on some of the x variables, i.e., σi2 = h (xi ). In this case, it appears that σi2 is
positively related to IN COM E.
STATA Tips To obtain graphs like those in Figure 1 and 2, use the following
command in STATA.
reg exp age income income2 owner
predict e, resid
graph twoway scatter e income, msymbol(oh) yline(0)
2
Consequences
Recall from our previous discussion that if we use OLS when Var (ε) = σ 2 Ω,
then
(i) b is unbiased;
b is BLUE.
(ii) b is inefficient, while the GLS estimator, β,
−1
−1
−1
(iii) Var (b) = σ 2 (X0 X) X0 ΩX (X0 X) . So the use of σ 2 (X0 X) is incorrect and it leads to incorrect standard errors and unreliable inferences
about population parameters.
3 Robust Estimation of Asymptotic Covariance Matrix
3
Fig. 2: Plot of the OLS Residuals against AGE
3
Robust Estimation of Asymptotic Covariance Matrix
The above discussions suggest that if we are to continue using OLS in the
presence of heteroskedasticity, then we should at least use the correct formula
for Var (b). Note that in the expression for Var (b), σ 2 and Ω are both unknown.
To estimate Var (b), we need to estimate the matrix σ 2 X0 ΩX. White (1980,
Econometrica) shows that under very general conditions, the matrix,
n
1X 2
S0 =
e xi x0i ,
n i=1 i
(4)
is a consistent estimator of
n
Σ=
1 2 0
1X 2
σ X ΩX =
σ xi x0i ,
n
n i=1 i
(5)
where ei is the OLS residual for observation i and x0i = xi1 xi2 · · · xiK .
Therefore, we can obtain a consistent estimator of Var (b), which is given by
!
n
X
−1
−1
0
2
0
Est.Asy.Var (b) = (X X)
ei xi xi (X0 X) .
(6)
i=1
This is usually called the White heteroskedastic-consistent/robust estimator of the covariance matrix of b. Note that in forming this estimator, we
don’t have to assume any specific form of heteroskedasticity. So it’s a very
useful result. The asymptotic properties of the estimator is unambiguous, but
4 Testing for Heteroskedasticity
4
its usefulness in small sample is open to question. Some Monte Carlo studies
suggest that in small sample the White estimator tends to underestimate the
variance matrix.
Remark 1: With the White robust estimator for covariance matrix, we can construct the t statistic as usual, which is called the heteroskedastic-robust t
statistic. Note that this robust statistic follows a t distribution only asymptotically. In small sample, its sampling distribution is unknown.
Remark 2: We cannot use the F test for testing exact linear restrictions because
the distributional assumption of the F statistic requires homoskedasticity. But
we can use a Wald test. The statistic is
W
0
−1
=
(Rb − q) {R [Est.Asy.Var (b)] R0 }
∼
χ2J
(Rb − q)
(7)
under H0 : Rβ = q.
That is, the statistic is asymptotically distributed as χ2 with degrees of freedom
equal to the number of restrictions.
STATA Tips: In STATA, to obtain the White estimator, we simply add the
option “robust” to the “regress” command. For example,
reg y x1 x2 x3, robust
Then the output will report standard errors computed from the White estimator
of the covariance matrix of b.
4
Testing for Heteroskedasticity
Among others, three tests are common in practice to detect heteroskedasticity.
They are: (1) White’s general test; (2) Goldfeld-Quandt test; and (3) BreuschPagan LM test. These tests are based on the following strategy. OLS estimator
of β is consistent even in the presence of heteroskedasticity. Therefore, the OLS
residuals will mimic the heteroskedasticity of the true disturbances. Hence, tests
designed to detect heteroskedasticity will be applied to the OLS residuals.
4.1
White’s General Test
The hypotheses under examination are
H0 : σi2 = σ 2 vs. H1 : not H0 .
Note that to conduct White test, we do not have to assume any specific form
of heteroskedasticity.
White test is motivated by the observation that if the model does not have
heteroskedasticity, then ε2i should not be correlated with any regressors, the
squares of those regressors and their cross products. A simple operational version of White test is carried out by obtaining nR2 in the auxiliary regression
of e2i on a constant and all unique variables contained in xi and all the squares
and cross products of the variables in xi .
4 Testing for Heteroskedasticity
5
Example 3: Suppose we have four regressors, x1 , x2 , x3 , and a constant term.
Then White test is carried out by first obtaining the residuals, ei , from OLS of
the original model and then estimating an auxiliary regression e2i on a constant
and x1 , x2 , x3 , x21 , x22 , x23 , x1 x2 , x1 x3 , x2 x3 . Finally, record the R2 from the
auxiliary regression and construct the test statistic nR2 .
The test statistic, nR2 , is asymptotically distributed as chi-squared with
P − 1 degrees of freedom, where P is the number of regressors in the auxiliary
regression, including the constant.
a
nR2 ∼ χ2P −1 .
(8)
Remark 3: White test is very general in that it does not specify any specific form
of heteroskedasticity.
Remark 4: Due to its generality, White test may simply identify some other
specification errors (such as the omission of x2 from a simple regression) instead
of heteroskedasticity.
Remark 5: The power of White test may be low in some cases.
Remark 6: White test is nonconstructive in that if we reject the null hypothesis,
then the result of the test does not provide any guidance for the next step.
STATA Tips To perform White test in STATA, you can either manually construct the test statistic as in (8) or to use the “whitetst” command following a
“regress” command on the original model. “whitetst” is not an official STATA
command and has to be downloaded. Type “findit whitetst” in STATA and
follow the link and the command will be installed automatically.
4.2
Goldfeld-Quandt Test
Goldfeld-Quandt
test assumes some particular form of heteroskedasticity. It
tests that E ε2i = σ 2 h (xik ), e.g., σ 2 x2ik . This test is applicable if one of the x
variables is thought to cause the heteroskedasticity.
Steps:
1. Reorder observations by values of xk .
2. Omit c central observations and we are left with two samples of (n − c) /2
observations.
3. Let σ12 (σ22 ) be the error variance of the first (second) sample. Test
H0 : σ12 = σ22 vs. H1 : σ22 > σ12 .
4. Estimate the regression y = Xβ + ε in each sub-sample (which requires
that (n − c) /2 > K). Obtain e01 e1 and e02 e2 , where e1 and e2 are the
residual vectors from the two sub-samples respectively.
5. Form R = e02 e2 /e01 e1 .
4 Testing for Heteroskedasticity
6
It can be shown that under H0 ,
R ∼ Fn∗ ,n∗ ,
(9)
where n∗ = (n − c − 2K) /2.
Remark 7: c can be zero. Introducing c is intended to increase the power of the
test. However, if c increases, then (n − c) /2 decreases, which leads to lower
degrees of freedom in the estimation with each sub-sample and this tends to
diminish the power of the test. So there is a trade-off in choose the appropriate
c. Some studies suggest that no more than a third of the observations should
be dropped. One choice is that c ' n3 − 2K.
Remark 8: Goldfeld-Quandt test is exactly distributed as F under H0 if the
disturbances are normally distributed. If not, then F distribution is only an
approximation.
4.3
Breusch-Pagan LM Test
Goldfeld-Quandt test is reasonably powerful if we know or are able to identify
correctly the variable to use in sample separation. This limits its generality. For
example, what if a set of regressors jointly determine the nature of heteroskedasticity? In this regard, Breusch-Pagan LM test is more general. Assume
σi2 = h (z0i α) ,
where h (·) is some function, α is a coefficient vector unrelated to β, and zi is a
vector of variables causing heteroskedasticity, with the first element being 1.
Within this framework, if α2 = α3 = · · · = αP = 0, then σi2 = h (α1 ) = σ 2 ,
i.e., homoskedasticity. Therefore, we are to test
H0 : α2 = α3 = · · · = αP = 0 vs. H1 : not H0 .
Steps:
1. Regress y on X. Obtain OLS residual vector e.
2. Compute σ
b2 = e0 e/n and gi = e2i /b
σ 2 − 1.
3. Estimate, by OLS, an auxiliary regression
gi = α1 + α2 zi2 + α3 zi3 + · · · + αP ziP + vi.
(10)
4. Compute the regression sum of squares (SSR),
n
X
n
1X
SSR =
(b
gi − g) , where g =
gi .
n i=1
i=1
Under H0 ,
LM =
2
SSR a 2
∼ χP −1 .
2
(11)
(12)
5 Generalized Least Squares Estimator
7
STATA Tips To perform Breusch-Pagan LM Test in STATA, you can use the
“hettest” or the “bpagan” command following the “regress” command on the
original model. “bpagan” is unofficial and thus needs to be downloaded. The
syntax is the following.
hettest var_list
where var_list specifies zi without the 1. The same syntax applies to bpagan.
5
Generalized Least Squares Estimator
5.1
Weighted Least Squares when Ω Is Known
Suppose the variance matrix of ε is given by (2), where Ω is known. Without
loss of generality, we may write
σi2 = σ 2 ωi .
(13)
So,

···
ω1
ω2


Ω= .
 ..
0
..
.
···
0



..  .
. 
ωn
Now consider a “weight” matrix, P, as follows:
 1
√
···
0
ω1

1
√

ω2
P=
..
..
 ..
.
 .
.
0
· · · √1ωn
Hence, P0 P = Ω−1 and



Py = 

√
y1 / ω1
√
y2 / ω2
..
.
√
yn / ωn






 and PX = 


(14)



.


√
x1 / ω1
√
x2 / ω2
..
.
√
xn / ωn
(15)





Regressing Py on PX using OLS gives the GLS estimator,
b
β
=
=
=
−1
(X0 P0 PX) X0 P0 Py
−1 0 −1
X0 Ω−1 X
XΩ y
" n
#−1 " n
#
X
X
0
wi xi xi
wi xi yi ,
i=1
(16)
i=1
b is also called the weighted least squares
where wi = 1/ωi . In this case, β
(WLS) estimator.
5 Generalized Least Squares Estimator
8
A common specification is that the variance is proportional to one of the
regressors or its square. For example, if
σi2 = σ 2 x2ik
(17)
for some k, then the transformed regression model for GLS (or WLS) is
y
x1
x2
xK
ε
= βk + β1
+ β2
+ ... + βK
+
.
xk
xk
xk
xk
xk
(18)
If the variance is proportional to xk instead of x2k , then the weight applied to
√
each observation is 1/ xk instead of 1/xk .
STATA Tips In STATA, you can perform WLS either by manually transforming the data and then running OLS or use the “aweight” feature in the
“regress” command. The syntax is as follows.
regress y x1 x2 ...
xk [aweight=var_name]
The weight to be used is 1/ωi . For example, if σi2 = σ 2 x2ik , then you should
first generate a variable, say w, which equals 1/x2ik , and then write [aweight=w]
in the “regress” command. If σi2 = σ 2 xik , then w should be 1/xik .
5.2
Estimation when Ω Is Unknown
It’s rare that the form of Ω is known, so usually it has to be estimated. The
general form of the heteroskedastic regression model has too many parameters to
estimate. Typically, the model is restricted by formulating σ 2 Ω as a function of
a few parameters, α. Write this function as Ω (α). FGLS based on a consistent
estimator of Ω (α) is asymptotically equivalent to full GLS.
Recall that for the heteroskedastic model, the GLS estimator is
" n #−1 " n #
X 1
X 1
b=
β
xi x0i
xi yi .
(19)
2
σ
σi2
i
i=1
i=1
Basically, we first need to obtain estimates for σi2 , say σ
bi2 , usually using some
b
b from (19) and σ
function of the OLS
Then we can compute β
bi2 .
residuals.
2
2
Note that E εi = σi , so
ε2i = σi2 + vi ,
(20)
where vi is the difference between ε2i and its expectation. Since εi is unobservable, we would use the least squares residuals, for which
ei = εi − x0i (b − β) = εi + ui .
(21)
e2i = ε2i + u2i + 2εi ui.
(22)
Then
P
However, we know that b is consistent, i.e., b−→β. Therefore, the terms in ui
will become negligible and thus approximately we have,
e2i = σi2 + vi∗ .
(23)
5 Generalized Least Squares Estimator
9
The above reasoning leads to the following estimation strategy. If σi2 =
h (z0i α), where zi may or may not coincide with xi , then we can obtain a consistent estimator for α by estimating
e2i = h (z0i α) + vi∗ .
(24)
Obtaining the fitted value of e2i , say eb2i , we can use it in place of σi2 in (19) to
b
b the feasible generalized least squares (FGLS) estimator. This
construct β,
estimation method is called the two-step estimation.
A common functional form for h (·) is exponential. Suppose we have a model,
yi = β1 + β2 x2i + · · · + βK xKi + εi , where εi ∼ 0, σi2 .
(25)
We may write
σi2 = exp (α1 + α2 z2i + · · · + αP zP i ) vi ,
where vi is uncorrelated with z’s and has expectation of 1. Then
ln σi2 = α1 + α2 z2i + · · · + αP zP i + vi∗ .
(26)
(27)
In this case, the procedures to obtain the FGLS estimator are the following.
1. Regress y on (1, x1 , ..., xK ) and obtain ei .
2. Compute ln e2i and use it as the dependent variable in model (27). Ob\
tain the fitted value, ln
(e2 ).
i
\
3. Compute b
hi = exp ln
(e2i ) and its reciprocal, wi =
1
.
b
hi
4. Use wi as the weight to compute the weighted least squares estimator of
β.
Download