CHOICE OF EXPLANATORY VARIABLES VARIABLE SPECIFICATION ERROR The classical linear regression model assumes that all important variables have been included and all unimportant variables have been excluded. If we omit an important variable or include an unimportant variable, then the model is misspecified. If the model is misspecified, then it may not be a reasonable approximation of the data generation process. There are two possible types of variable specification errors. 1. Omission of an important variable. 2. Inclusion of an unimportant variable. OMISSION OF AN IMPORTANT VARIABLE If the omitted variable has an effect on the dependent variable and is correlated with any of the included explanatory variables, then it is a confounding variable. In this situation, the OLS estimator for the slope coefficients will be biased and inconsistent, This is called omitted variable bias. INCLUSION OF AN UNIMPORTANT VARIABLE If the unimportant included variable is correlated with any other included explanatory variables, then the variances and standard errors of the OLS estimator for the regression coefficients will increase. In this situation, the OLS estimator will be less precise. The probability of a Type II error increases, and therefore it is more difficult to detect the effect of X on Y if one exists. Formula for Precision of OLS Estimator The formula for the variance of the OLS estimator βi^ is Var(βi^) σ2 = ————————— Σ(Xit – Xibar) (1 – Ri2) Including an unimportant variable that is correlated with Xi increases Ri2, and therefore increases Var(βi^) and decreases the precision of the estimate. CHOOSING EXPLANATORY VARIABLES Omitting important variables and including unimportant variables in a statistical model can lead to the serious consequences of bias and imprecision in parameter estimates. Therefore, we need a systematic approach that we can use to choose the appropriate set of explanatory variables to include in our model. Methodologies for Choosing Explanatory Variables Two alternative methodologies that are used to choose the set of explanatory variables for a statistical model are the following. 1) Kitchen sink methodology. 2) Theory/testing methodology. Kitchen Sink Methodology This methodology suggests that you should include as an explanatory variable, any variable that is even remotely related to the dependent variable. By using this methodology, you can greatly reduce the likelihood of omitted variable bias. However, you will most likely include irrelevant explanatory variables in the model. This will increase the variances and hence decrease the precision of the estimates of all parameters in the model. Because of this, the kitchen sink methodology is not recommended. Theory/Testing Methodology This methodology involves the following steps. 1. Use theory and past empirical studies to identify the set of potential explanatory variables. 2. Divide the set of potential variables into two subsets: i) Explanatory variables that are likely to be important. ii) Explanatory variables that may or may not be important. 3. Include variables in subset #1 in the model. 4. Conduct a statistical test to determine which variables in subset #2 should be included in the model. 5. To carry out step #4, choose one of 3 approaches: i) Testing-down approach ii) Testing-up approach iii) Model selection criterion approach The approach used most often by economists is the testing-down approach. Testing-Down Approach When using the testing-down approach, you begin with a general model. This general model includes the set of all potential explanatory variables. That is, it includes explanatory variables that are likely to be important and those that may or may not be important. You then test whether the variables in subset #2 should be dropped from the model. If one or more of these variables should be dropped, then you get a more specific model. Thus, you test from the general to the specific. Three types of statistical tests are consistent with the testing-down approach: 1) t-test, 2) F-test, 3) Wald test. These three tests begin with a general model or unrestricted model and ask whether a more specific model or restricted model is more appropriate. In fact, it is possible to do an F-test or a Wald test by estimating the unrestricted model only. It should be noted that the ttest and the F-test are small sample tests, and therefore the test statistics have known, exact sampling distributions. A Wald test is a large sample test, and therefore its test statistic has an approximate sampling distribution in finite samples. Thus, if possible always use a t-test or Ftest.