Document

advertisement
Principles of Econometrics – class of October 14th
Notes by José Mário
FEUNL
Lopes1
1. Heteroskedasticity in a cross-section framework (examples from chapter 8)
Heteroskedasticity happens whenever Var(ui|x1,x2,…) is not constant for all
observations.
Last class, you’ve seen how to robustify your standard errors when you suspect to be in
the presence of heteroskedasticity. In the multiple regression model,
y   0  1 x1   2 x2  ...   k xk  u
You would have to compute
n
^2
 r ij ûi2
^
^ 
Var  j   i 1 2
SSTx
 
Where rij denotes the ith residual from regressing xj on all other independent variables
(see section 8.2).
In EViews, this can be done by choosing on the “Options” Menu, the White standard
errors.
1
If you find any typo in these notes, please e-mail me so I can correct it.
1
Robust standard errors and t statistics are appropriate as the sample sizes increases. We
don’t always use these robust standard errors because, in small samples, the robust t
statistics can depart a lot from the t distribution.
Hence, it is important to know whether there is or there isn’t heteroskedasticity in our
sample. Let’s perform a few examples picking examples from the book. Take the
example on the demand for cigarettes, from chapter 8. Open the corresponding workfile.
We wish to estimate the demand for cigarettes measured by the number of cigarettes
smoked per day as a function of income, the price of a pack of cigarettes, education,
age, squared age and the presence of a ban on restaurants from the state the person
surveyed lives.
We get the following results:
2
-
-
neither income nor cigarette price is significant and their impacts would be small
anyway (eg, if income increases by 10%, cigs increases by (0.880/100)*10=0.088
cigarettes per day);
education reduces smoking;
smoking increases with age up until approx. 42.83 years (basically, maximize
cigarettes smoked in variable age; derive the part related to age and age squared and
make it equal to zero). After that, it falls.
But now, a very important question: is there heteroskedasticity? If so, the usual
standard errors and t statistics will be wrong and OLS will not be efficient. We will
perform just a couple tests to check for heteroskedasticity. See other tests available
on EViews.
First, let’s run the Breusch-Pagan test for heteroskedasticity:
1) Estimate the model by OLS, keep the squared OLS estimated residuals.
2) Run an auxiliary regression of the squared OLS estimated residuals on the
independent variables. Keep the R-squared from this regression.
3) Form either the F (following a F(k,n-k-1)) or the LM (following a chi-square with k
degrees of freedom). If the p-value is greater than 5%, we do not reject the null
of homoskedasticity.
In EViews, this is very easy to do.
3
Behold how many options you have for running a heteroskedasticity test!
4
For a BP test, we get
Both the F test and the LM (obs*Rsquared of the auxiliary regression) conclude for the
rejection of the null of homoskedasticity.
You should check that EViews is doing this right. How? Generate the residuals yourself
and perform the regression as usual (New Object/Equation, etc.). You will get the same
output as above.
White test for heteroskedasticity takes into account the possibility that the variance
structure might be richer. The squares and cross-products of the independent variables
are also included in the right-hand side. Alternatively, whenever you have too many
independent variables, you can use the fitted values of the dependent variable and the
squared fitted values of the independent variable.
5
In our case, you get
Heteroskedasticity Test: White
F-statistic
Obs*R-squared
Scaled explained SS
2.159258
52.17245
110.0813
Prob. F(25,781)
Prob. Chi-Square(25)
Prob. Chi-Square(25)
0.0009
0.0011
0.0000
This means that the null of homoskedasticity is rejected.
From this point, we can correct the standard errors using the White robust standard
errors.
Or, we can transform the model and run OLS on this transformed model. How?
Feasible Generalized Least Squares procedure:
- generate the estimated squared residuals (the residuals from the model
y   o  1 x1  ...   k xk  u ;
- regress the log of the estimated squared residuals on the independent variables (why
^
the log?), obtain the fitted values of this regression g
^
^
-
exponentiate the fitted values to get h  exp( g )
-
estimate the equation y   o  1 x1  ...   k xk  u by WLS, using 1 / h as weights
^
Since we have to estimate h, FGLS will not be unbiased but it is consistent and
asymptotically more efficient than OLS.
If cigs_residsq stands for the estimated h, we have to divide the model by
1/square root(h). Why? See book (there is a univariate example there for savings, start
from there).
6
We will get
Dependent Variable: CIGS/SQR(CIGS_RESIDSQF)
Method: Least Squares
Date: 10/12/09 Time: 17:06
Sample: 1 807
Included observations: 807
1/SQR(CIGS_RESIDSQF)
LOG(INCOME)/SQR(CIGS_RESIDSQF)
LOG(CIGPRIC)/SQR(CIGS_RESIDSQF)
EDUC/SQR(CIGS_RESIDSQF)
AGE/SQR(CIGS_RESIDSQF)
AGE^2/SQR(CIGS_RESIDSQF)
RESTAURN/SQR(CIGS_RESIDSQF)
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
Std. Error
t-Statistic
Prob.
5.635471
1.295239
-2.940314
-0.463446
0.481948
-0.005627
-3.461064
17.80314
0.437012
4.460145
0.120159
0.096808
0.000939
0.795505
0.316544
2.963855
-0.659242
-3.856953
4.978378
-5.989706
-4.350776
0.7517
0.0031
0.5099
0.0001
0.0000
0.0000
0.0000
0.002751
-0.004728
1.578698
1993.831
-1510.045
2.049719
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
0.966192
1.574979
3.759715
3.800425
3.775347
We could also use Menu “Options”/WLS and write down the appropriate weighting scheme.
7
2. A little bit on time series – just a few issues (examples from chapters 10 to 12)
2.1 Take the workfile about housing investment and prices.
There are a lot of interesting things you can do now. You can take a series and study its
evolution over time. Take the housing price index for instance
8
You can actually see several graphs at the same time if you select a Group of variables.
9
Let’s estimate a simple model, now.
10
The log of the price seems to be significant. You may think this is OK, but it is not.
Both variables are trending throughout the sample.
If you take a look at the residuals, you can see if what you’re doing makes sense or
not.
They are not stationary (there are formal tests to see this, namely unit root tests like the
Dickey-Fuller or Phillips-Perron tests and you can always look at the correlogram of
the residuals). This means we should rethink your specification. Our previous regression
was spurious.
We now add a linear trend to take account of the trending behaviour of LINVPC.
11
LPRICE does not come significant anymore. We conclude that there are other factors
beyond the price that are captured by the linear trend that seem to be important.2
Notice that these other factors are not modelled just by adding a linear trend.
Moreover, the fact that a linear trend appears to be informative shouldn’t prompt
you to get carried away and start obsessively adding a huge train of trend terms
(linear, quadratic,…)
What we just did (adding a linear trend) has a detrending interpretation: it is
equivalent to regressing all variables over a constant and a linear trend, saving the
residuals and regressing the residuals of the dependent variable regression over the
residuals of the independent variables regressions (see book).
2.2 Important assumptions and problems in a Time Series framework:
The Gauss-Markov theorem requires both homoskedasticity and absence of
serially correlated errors. Otherwise, the OLS estimator will not be BLUE and the
usual standard errors and t-statistics will no longer be valid.
How do we test for the presence of serial correlation?
Let’s see a few possibilities available in EViews.
You can take a look at the Durbin-Watson statistic, that appears at the bottom of the
results.
Dependent Variable: LINVPC
Method: Least Squares
Date: 11/26/03 Time: 08:47
Sample: 1 42
Included observations: 42
C
LPRICE
T
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
Std. Error
t-Statistic
Prob.
-0.913060
-0.380961
0.009829
0.135613
0.678835
0.003512
-6.732815
-0.561198
2.798445
0.0000
0.5779
0.0079
0.340765
0.306959
0.143641
0.804675
23.45930
1.048727
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
-0.666155
0.172543
-0.974252
-0.850133
10.07976
0.000296
The Durbin-Watson test, valid under classical assumptions, is based on the OLS
You should always test the residuals to see if they’re well-behaved. In this case, they are still
nonstationary. In a practical work, you should keep on looking for a correct specification.
2
12
residuals and one can show that DW is approximately 2(1-) where  is the first-order
correlation coefficient between residuals at t and residuals at t-1. If the Durbin-Watson
statistic is near 2, the correlation coefficient will be near 0. Hence, we are looking for a
value significantly below 2 (for a positive correlation coefficient) and significantly
above 2 (for a negative correlation coefficient). Imagine you were testing if  was close
to zero (DW close to 2) against an alternative hypothesis that  was bigger than zero
(DW smaller than 2). There are two critical values, dL and dU, tabled by Savin and
White (1977), depending on the number of observations and the number of regressors.
This means that, if DW falls between dL and dU, the results are inconclusive.
After you estimate your model, you have a Serial correlation menu under Residual
Tests.
13
This is the Breusch-Godfrey test. The Null is of absence of autocorrelation. Here, we
reject this Null: there is evidence to say there is autocorrelation. We basically are
keeping the residuals of the regression, and regressing ut over ut-1, ut-2,… and the
regressors. If the F statistic rejects this model, we conclude that there is no
autocorrelation.3
Once you find out that here is first-order serial correlation, you can transform the model
to take this into account:
1 – estimate the original model and take the estimated residuals.
2 – run the regression of ût over ût-1 to compute the correlation coefficient.
3 – For every-variable xt (and for the dependent variable), compute the quasidifferenced variable xt-xt-1
4- Apply OLS to the equation with the quasi-differenced variables. The usual standard
errors, t statistics and F are asymptotically valid.4
3
The regressors appear because we are assuming away strict exogeneity of the regressors. If we had strict
exogeneity, we only needed to regress the residuals on their lagged values, the regressors wouldn’t be
needed. See book on this.
4
This is known as the Cochrane-Orcutt estimation, omitting the first observation. If you transform the
first equation to include the first observation in the regression, you call this the Prais-Winsten estimation.
14
Alternatively, you can estimate the model as usual, but converting the standard
errors at the end. This may be better than simple FGLS.
Just pick the option “Newey-West”. You will be correcting for both heteroskedasticty
and autocorrelation.
Notice that you can always test for heteroskedasticty as in cross-section cases. (just
check the options under “Heteroskedasticity tests”. They are the same ones as before.
However, for these tests to be valid, the errors should not be autocorrelated; also, for the
F statistic of the Breusch-Pagan test to be valid, the residuals of this auxiliary regression
should themselves be serially uncorrelated and homoskedastic.
2.3 Famous time series processes
Open the programs that generate an AR(1) and an MA(1).
AR(1)
We can write it as yt  yt 1   t where the error is a white noise (constant variance
and mean zero). If =1, you have what is known as a random walk. It is a typically
nonstationary process, highly persistent. We say that this process has a stochastic
trend, as opposed to a deterministic trend that appears whenever we have a linear trend
directly establishing a trend in the variable.
15
Compare the highly persistent random walk above to a stationary little persistent AR(1)
16
If you add a constant to the random walk, you get the random walk with drift
yt  a  yt 1   t
Notice how the drift a defines a linear trend behaviour in the series!
MA(1)
Here, we model the residual part and make it richer than it was before.
ut   t   t 1
The graph comes as follows
17
It is clearly stationary (once again, this can be tested through the so-called unit root
tests). Actually, a pure MA process is always stationary.
You can create other processes yourself, eg, combining AR and MA parts to get ARMA
models.
18
Download