notes

advertisement
CHOICE OF EXPLANATORY VARIABLES
VARIABLE SPECIFICATION ERROR
The classical linear regression model assumes that all important variables have been included and
all unimportant variables have been excluded. If we omit an important variable or include an
unimportant variable, then the model is misspecified. If the model is misspecified, then it may not
be a reasonable approximation of the data generation process. There are two possible types of
variable specification errors.
1. Omission of an important variable.
2. Inclusion of an unimportant variable.
OMISSION OF AN IMPORTANT VARIABLE
If the omitted variable has an effect on the dependent variable and is correlated with any of the
included explanatory variables, then it is a confounding variable. In this situation, the OLS
estimator for the slope coefficients will be biased and inconsistent, This is called omitted variable
bias.
INCLUSION OF AN UNIMPORTANT VARIABLE
If the unimportant included variable is correlated with any other included explanatory variables,
then the variances and standard errors of the OLS estimator for the regression coefficients will
increase. In this situation, the OLS estimator will be less precise. The probability of a Type II
error increases, and therefore it is more difficult to detect the effect of X on Y if one exists.
Formula for Precision of OLS Estimator
The formula for the variance of the OLS estimator βi^ is
Var(βi^)
σ2
= —————————
Σ(Xit – Xibar) (1 – Ri2)
Including an unimportant variable that is correlated with Xi increases Ri2, and therefore increases
Var(βi^) and decreases the precision of the estimate.
CHOOSING EXPLANATORY VARIABLES
Omitting important variables and including unimportant variables in a statistical model can lead
to the serious consequences of bias and imprecision in parameter estimates. Therefore, we need a
systematic approach that we can use to choose the appropriate set of explanatory variables to
include in our model.
Methodologies for Choosing Explanatory Variables
Two alternative methodologies that are used to choose the set of explanatory variables for a
statistical model are the following. 1) Kitchen sink methodology. 2) Theory/testing
methodology.
Kitchen Sink Methodology
This methodology suggests that you should include as an explanatory variable, any variable that
is even remotely related to the dependent variable. By using this methodology, you can greatly
reduce the likelihood of omitted variable bias. However, you will most likely include irrelevant
explanatory variables in the model. This will increase the variances and hence decrease the
precision of the estimates of all parameters in the model. Because of this, the kitchen sink
methodology is not recommended.
Theory/Testing Methodology
This methodology involves the following steps.
1. Use theory and past empirical studies to identify the set of potential explanatory variables.
2. Divide the set of potential variables into two subsets:
i)
Explanatory variables that are likely to be important.
ii)
Explanatory variables that may or may not be important.
3. Include variables in subset #1 in the model.
4. Conduct a statistical test to determine which variables in subset #2 should be included in the
model.
5. To carry out step #4, choose one of 3 approaches:
i)
Testing-down approach
ii)
Testing-up approach
iii)
Model selection criterion approach
The approach used most often by economists is the testing-down approach.
Testing-Down Approach
When using the testing-down approach, you begin with a general model. This general model
includes the set of all potential explanatory variables. That is, it includes explanatory variables
that are likely to be important and those that may or may not be important. You then test whether
the variables in subset #2 should be dropped from the model. If one or more of these variables
should be dropped, then you get a more specific model. Thus, you test from the general to the
specific. Three types of statistical tests are consistent with the testing-down approach: 1) t-test, 2)
F-test, 3) Wald test. These three tests begin with a general model or unrestricted model and ask
whether a more specific model or restricted model is more appropriate. In fact, it is possible to do
an F-test or a Wald test by estimating the unrestricted model only. It should be noted that the ttest and the F-test are small sample tests, and therefore the test statistics have known, exact
sampling distributions. A Wald test is a large sample test, and therefore its test statistic has an
approximate sampling distribution in finite samples. Thus, if possible always use a t-test or Ftest.
Download