Word Document - Mysmu .edu mysmu.edu

advertisement
Econ107 Applied Econometrics
Topic 5: Specification: Choosing Independent Variables
(Studenmund, Chapter 6)
Specification errors that we will deal with: wrong independent variable; wrong
functional form. This lecture deals with wrong independent variables, which may
be due to i) omitted variables, ii) redundant variables (irrelevant variables).
Use the following example under both types:
lnW i =  0 +  1 S i +  2 OJT i +  i
where Wi = Wage rate of worker i.
Si
= Years of formal education of worker i..
OJTi = Effective years of On-the-Job Training of worker i.
The idea is that we have 2 forms of human capital: general human capital obtained
through formal education and specific human capital obtained through vocational
education, apprenticeship programmes, etc. Both may increase wages (i.e., β1>0
and β2>0), but not at the same rate (i.e., β1β2).
I.
Omitting a Relevant Variable.
One of the most common problems in regression analysis. Could be based in the
ignorance of the researcher (i.e., variable available, but not used). More likely,
data unavailable (e.g., Household Economic Survey).
Estimate the following model instead:
*
lnW i =  0 +  1 S i +  i
So that the true error in the above regression is
 i *   2 OJTi   i
So that Assumption 2 does not hold because E( i * )   2OJTi  0 . More importantly,
in the case where OJT and S are correlated, looks like Assumption 3 does not hold
because Cov( i * , Si )  0 . As a result, Gauss-Markov theorem does not apply. In
general, OLS estimate of the regression coefficient is biased, ie,
*
E ( ˆ1 )   1
Page - 2
And the bias is
*
*
bias ( ˆ1 )  E ( ˆ1 )  1   2 b12
where:
b12 =
Cov ( S i , OJT i )
Var ( S i )
Suppose that b12>0, then:
*
E ( ˆ1 ) >  1
and the estimated coefficient is biased upward.
Bias is zero when the coefficient of omitted variable is zero or the included and
omitted variables are uncorrelated.
In addition, the standard errors on these estimated coefficients will be biased. In
the misspecified model:
Var ( ˆ1 ) =
*
2
 si2
But variance of the 'true' estimator is:
Var ( ˆ1 ) =
2
2
 si2 (1 - r12
)
where r12 is the correlation coefficient between S and OJT. This means that:
If r122 > 0 , then Var ( ˆ1 ) < Var ( ˆ1 )
*
The variance of estimated coefficient is also biased. We're placing 'too much'
confidence in our coefficient estimates. The result is that the t test will be
misleading (this is true even if r12=0, because our estimate of σ2 will also be
biased.)
The remedial measure is easy IF we know which variable has been omitted and
this omitted variable is available. Include it in the model. If the omitted variable
not available, might try to find a proxy variable that is closely related to this
missing variable (e.g., use information on the average OJT or people in a
particular industry and occupation). Or at least sign the direction of the bias, and
estimate its potential magnitude.
The above remedy works in theory. In practice, sometimes it is difficult to know
if a variable has been omitted. To detect the existence of the problem of omitting
Page - 3
a relevant variable, one common practice is to examine the sign of estimated
coefficients and see if they meet our expectation or economic theory. If not, it is
very likely that relevant variables have been omitted. The next step is to use the
direction of the bias to look for relevant variables.
II.
Including an Irrelevant Variable.
Suppose true model doesn't contain OJTi. This is consistent with some theoretical
models that predict that this human capital will not affect wages, employers are
more likely to pay for it. Thus, the correct regression model is:
lnW i =  0 + 1 S i +  i
but we estimate:
lnW i =  0 +  1 S i +  2 OJT i +  i
**
The problems here are less severe compared to omitting a relevant variable. The
true error in the above regression is
 i **   i   2 OJTi
If OJT is irrelevant,  2 should be zero and hence Assumption 2 holds.
Assumption 3 holds too. What are the properties of the OLS estimates?
(i) Estimated coefficients are unbiased and consistent.
*
E ( ˆ1 ) =  1
(ii) t test is valid if the correct standard error is used.
(iii) The only problem is that the estimated coefficients are inefficient.
Under the 'false' model:
*
Var ( ˆ1 ) =
2
2
 si2 (1 - r12
)
Under the 'true' model:

Var ( ˆ1 ) = 2
 si
2
Since if r122 > 0 , then Var ( ˆ1 ) < Var ( ˆ1* ) , we're placing 'too little' confidence in our
coefficient estimates (i.e., the standard error on the estimated coefficient is larger
Page - 4
than it should be). This makes the t-ratio smaller than it should be, and makes it
more likely that we won’t be able to reject the null when we should.
This is an easy one to solve in theory. If the variable shouldn’t be in the regression,
eliminate it from the outset. But in practice, this isn’t so easy. The theory in this
example says that both specifications might be right. If an independent variable
may be relevant, include it.
III. How to Decide Whether to Include Variable or Not?
1. Graphic method to detect the problem of omitting a relevant variable
Plot the residuals and look for 'distinct pattern'. Take the earlier example on
functional form of the regression. We estimate:
*
lnW i =  0 +  1 S i +  i
but the 'true' model is:
lnW i =  0 +  1 S i   2 S i + ui
2
 i *   2 S i 2  ui
A plot of the residuals against Si would produce a 'detectable' pattern (i.e., curved
downward).
Page - 5
2. Four criteria




Economic theory: is there any sound theory?
Student t statistic: is it significant in the correct direction?
Has R 2 improved?
Do other coefficients change sign when a variable is included?
Include variable if answers are positive. Don’t necessarily drop insignificant
variables. An insignificant finding can be an important result.
Example:
Cofˆfee = 9.1  7.8Pbc + 2.4 Pt 0.0035Yd
(15.6)
t=
0.5
(1.2) (0.001)
2.0
3.5
R 2  0.60
n=25,
where Coffee= demand for Brazilian coffee in US
Pbc
= price of Brazilian coffee
Pt
= price of tea
Yd
= disposable income in US
What happens if you drop Pbc?
Cofˆfee = 9.3  2.6 Pt 0.0036Yd
(1.0) (0.0009)
t=
2.6
4.0
R 2  0.61
n=25,
What happens if you add another variable, price of Colombia coffee, Pcc
Cofˆfee = 10  8.0 Pcc  5.6 Pbc + 2.6 Pt 0.0030Yd
(4.0)
t=
n=25,
2
(2.0)
-2.8
R 2  0.65
(1.3) (0.001)
2
3
Page - 6
3. Three incorrect techniques for choosing variables
1) Data mining: simultaneously try a whole series of possible regression
formulations and then choose the equation that conforms the most to what the
researcher wants the results to look like. Doing econometrics = making sausages.
2) Stepwise regression technique: systematic way of variable selection based
on R 2 . The computer program is given a “shopping list” of possible independent
variables, and then builds the equation in step. It always adds to the regression
model the variable which increases R 2 the most. Problem: independent variables
could be correlated.
3) Sequential specification search: add and drop sequentially (ie estimate an
undisclosed number of regressions) but only present a final choice as if it were the
only specification estimated. When you test a model, you have a type I error. If
you estimate and test too many models, type I errors will accumulate.
IV. Lagged Independent Variables
Consider the following regressions:
Yt   0  1 X 1t   2 X 2t   t
Yt   0  1 X 1t 1   2 X 2t   t
(1)
(2)
where t = 1, …, n. That is, we have sample of n time-series observations. Note the
change of notation from i to t to emphasize time series data.
In equation (1), the effect of X1 on Y is instantaneous. In equation (2), the effect
is felt one period later. As long as X1 is exogenous (not influenced by Y), the
lagged structure of the equation poses no problem. Of course, the interpretation
of slope coefficient is different.
Page - 7
V. Akaike’s Information Criterion and Schwarz Criterion
In general the more variables included in the regression, the smaller will be the
RSS. But if a variable only contributes marginally to the reduction of the RSS, it
should not be included. AIC and SC (also known BIC) measures the RSS with
penalty of additional parameters. They are defined in regression models as:
AIC = ln(RSS/n) +2(K+1)/n
SC = ln(RSS/n) + ln(n)(K+1)/n
You may select models that minimize the AIC or SC. These are called model
selection criteria. Note that R 2 is also a model selection criterion. You choose
model to maximize R 2 .
Compared with the AIC or SC, R 2 tends to select a model with irrelevant
variables.
VI. Questions for Discussion: Q6.3, Q6.9
VII. Computing Exercise: Q6.5 (Johnson, Ch 6), Q6.15, Johnson Ch 6:
AIC
Download