Chapter 7

advertisement
CHAPTER 7
REGRESSION DIAGNOSTIC IV:
MODEL SPECIFICATION ERRORS
Damodar Gujarati
Econometrics by Example
MODEL SPECIFICATION ERRORS
 One of the assumptions of the classical linear regression
(CLRM) is that the model is specified correctly.
 By correct specification we mean one or more of the
following:
 1. The model does not exclude any “core” variables.
 2. The model does not include superfluous variables.
 3. The functional form of the model is suitably chosen.
 4. There are no errors of measurement in the regressand and
regressors.
 5. Outliers in the data, if any, are taken into account.
 6. The probability distribution of the error term is well specified.
 7. The regressors are nonstochastic.
Damodar Gujarati
Econometrics by Example
OMISSION OF RELEVANT VARIABLES
If we omit a relevant variable because we do not
have the data, or because we have not studied
the underlying economic theory carefully, or
because we have not studied prior research in
the area thoroughly, or just due to carelessness,
we are underfitting a model.
Damodar Gujarati
Econometrics by Example
CONSEQUENCES
 1. If the omitted variables are correlated with the variables included in the
model, the coefficients of the estimated model are biased.
 This bias does not disappear as the sample size gets larger (i.e., the estimated coefficients
of the misspecified model are also inconsistent).
 2. Even if the incorrectly excluded variables are not correlated with the
variables included in the model, the intercept of the estimated model is
biased.
 3. The disturbance variance is incorrectly estimated.
 4. The variances of the estimated coefficients of the misspecified model are
biased.
 5. In consequence, the usual confidence intervals and hypothesis-testing
procedures become suspect, leading to misleading conclusions about the
statistical significance of the estimated parameters.
 6. Furthermore, forecasts based on the incorrect model and the forecast
confidence intervals based on it will be unreliable.
Damodar Gujarati
Econometrics by Example
F TEST TO COMPARE TWO MODELS
 If the original model is the “restricted” model, and the model
with the added (previously omitted) variable – which could also
be a squared term or an interaction term – is the “unrestricted”
model, we can compare the two using an F test:
( Rur2  Rr2 ) / m
F
(1  Rur2 ) /(n  k )
where m = number of restrictions (or omitted variables), n =
number of observations, and k = number of parameters in the
unrestricted model
 A rejection of the null suggests that the omitted variables belong
in the model.
Damodar Gujarati
Econometrics by Example
DETECTION OF OMISSION OF VARIABLES
Ramsey’s Regression Specification Error
(RESET) Test
Lagrange Multiplier (LM) test
Damodar Gujarati
Econometrics by Example
RAMSEY’S RESET TEST
 1. From the (incorrectly) estimated model, we first obtain the
estimated, or fitted, values of the dependent variable, Yˆi .
 2. Reestimate the original model including Yˆi 2 and Yˆi 3 (and possibly
higher powers of the estimated dependent variable) as additional
regressors.
 3. The initial model is the restricted model and the model is Step 2 is
the unrestricted model.
 4. Under the null hypothesis that the restricted (i.e., the original)
model is correct, we can use the previously mentioned F test .
 5. If the F test in Step 4 is statistically significant, we can reject the
null hypothesis. That is, the restricted model is not appropriate in the
present situation. By the same token, if the F statistic is statistically
insignificant, we do not reject the original model.
Damodar Gujarati
Econometrics by Example
LAGRANGE MULTIPLIER TEST
 1. From the original model, we obtain the estimated residuals, ei.
 2. If in fact the original model is the correct model, then the residuals ei
obtained from this model should not be related to the regressors omitted from
that model.
 3. We now regress ei on the regressors in the original model and the omitted
variables from the original model. This is the auxiliary regression.
 4. If the sample size is large, it can be shown that n (the sample size) times
the R2 obtained from the auxiliary regression follows the chi-square
distribution with df equal to the number of regressors omitted from the
original regression.
 5. If the computed chi-square value exceeds the critical chi-square value at
the chosen level of significance, or if its p value is sufficiently low, we reject
the original (or restricted) regression. This is to say, that the original model
was misspecified.
Damodar Gujarati
Econometrics by Example
INCLUSION OF IRRELEVANT OR
UNNECESSARY VARIABLES
Sometimes researchers add variables in the
hope that the R2 value of their model will
increase in the mistaken belief that the higher
the R2 the better the model. This is called
overfitting a model. But if the variables are not
economically meaningful and relevant, such a
strategy is not recommended.
Damodar Gujarati
Econometrics by Example
CONSEQUENCES
 1. The OLS estimators of the “incorrect”or overfitted model are
all unbiased and consistent.
 2. The error variance is correctly estimated.
 3. The usual confidence interval and hypothesis testing
procedures remain valid.
 4. However, the estimated coefficients of such a model are
generally inefficient (their variances will be larger than those of
the true model).
Damodar Gujarati
Econometrics by Example
MISSPECIFICATION OF THE FUNCTIONAL
FORM OF A REGRESSION MODEL
Sometimes researchers mistakenly do not
account for the nonlinear nature of variables in
a model. Moreover, some dependent variables
(such as wage, which tends to be skewed to the
right) are more appropriately entered in natural
log form.
Damodar Gujarati
Econometrics by Example
COMPARING ON BASIS OF R2
 We can transform the models as follows, as in Chapter 2:
 1. Compute the geometric mean (GM) of the dependent variable, call it Y*.
 2. Divide Yi by Y* to obtain: Yi ~
Y
*
 Yi
~
 3. Estimate the equation with lnYi as the dependent variable using Yi in
~
lieu of Yi as the dependent variable (i.e., use ln Yi as the dependent
variable).
~
 4. Estimate the equation with Yi as the dependent variable using Yi as the
dependent variable instead of Yi.
 5. Compute the following, putting the larger RSS value in the numerator:
2
n RSS1
ln(
)
1
2
RSS2
~
If this is significant, the model with the lower RSS value is better.
Damodar Gujarati
Econometrics by Example
ERRORS OF MEASUREMENT
One of the assumptions of CLRM is that the
model used in the analysis is correctly specified.
Although not explicitly spelled out, this
presumes that the values of the regressand as
well as regressors are accurate. That is, they are
not guess estimates, extrapolated, interpolated
or rounded off in any systematic manner or
recorded with errors.
Damodar Gujarati
Econometrics by Example
CONSEQUENCES
 Consequences for Errors of Measurement in the
Regressand:
 1. The OLS estimators are still unbiased.
 2. The variances and standard errors of OLS estimators are
still unbiased.
 3. But the estimated variances, and ipso facto the standard
errors, are larger than in the absence of such errors.
 In short, errors of measurement in the regressand do not
pose a very serious threat to OLS estimation.
Damodar Gujarati
Econometrics by Example
CONSEQUENCES
 Consequences for Errors of Measurement in the Regressor:
 1. OLS estimators are biased as well as inconsistent.
 2. Errors in a single regressor can lead to biased and inconsistent
estimates of the coefficients of the other regressors in the model.
 It is not easy to establish the size and direction of bias in the estimated
coefficients.
 It is often suggested that we use instrumental or proxy variables for
variables suspected of having measurement errors.
 The proxy variables must satisfy two requirements—that they are highly
correlated with the variables for which they are a proxy and also they are
uncorrelated with the usual equation error as well as the measurement error
 But such proxies are not easy to find.
 We should thus be very careful in collecting the data and making sure
that some obvious errors are eliminated.
Damodar Gujarati
Econometrics by Example
OUTLIERS, LEVERAGE, AND INFLUENCE DATA
OLS gives equal weight to every observation in
the sample.
This may create problems if we have
observations that may not be “typical” of the
rest of the sample.
Such observations, or data points, are known as
outliers, leverage or influence points.
Damodar Gujarati
Econometrics by Example
OUTLIERS, LEVERAGE, AND INFLUENCE DATA
 Outliers: In the context of regression analysis, an outlier is an
observation with a large residual (ei), large in comparison with the
residuals of the rest of the observations.
 Leverage: An observation is said to exert (high) leverage if it is
disproportionately distant from the bulk of the sample
observations. In this case such observation(s) can pull the
regression line towards itself, which may distort the slope of the
regression line.
 Influential point: If a levered observation in fact pulls the
regression line toward itself, it is called an influential point. The
removal of such a data point(s) from the sample can dramatically
change the slope of the estimated regression line.
Damodar Gujarati
Econometrics by Example
PROBABILITY DISTRIBUTION
OF THE ERROR TERM
The classical normal linear regression model
(CNLRM), an extension of CLRM, assumes
that the error term ui in the regression model is
normally distributed.
This assumption is critical if the sample size is
relatively small, for the commonly used tests of
significance, such as t and F, are based on the
normality assumption.
Damodar Gujarati
Econometrics by Example
JARQUE-BERA (JB) TEST OF NORMALITY
 This is a large sample test.
 The test statistic is as follows: JB =
 S 2 ( K  3) 2 
n 

6
24


where n is the sample size, S = skewness coefficient, K = kurtosis
coefficient.
 For a normally distributed variable S = 0 and K= 3. When this is the case, the
JB statistic is zero.
 Therefore, the closer is the value of JB to zero, the better is the normality assumption.
 Since in practice we do not observe the true error term, we use its proxy, ei.
The null hypothesis is the joint hypothesis that S=0 and K = 3. JB have
shown that the statistic follows the chi-square distribution with 2 df (because
we are imposing two restrictions, namely, that skewness is zero and kurtosis
is 3). If the computed JB statistic exceeds the critical chi-square value, we
reject the hypothesis that the error term is normally distributed.
Damodar Gujarati
Econometrics by Example
RANDOM OR STOCHASTIC REGRESSORS
 The CLRM assumes that the regressand is random but the
regressors are non-stochastic or fixed—that is, we keep the
values of the regressors fixed and draw several random
samples of the dependent variable.
 Although the assumption of fixed regressors may be valid in
several economic situations, it may not be tenable for all
economic data. In other words, we assume that both Y (the
dependent variable) and the Xs (the regressors) are drawn
randomly. This is the case of stochastic or random
regressors.
Damodar Gujarati
Econometrics by Example
THE SIMULTANEITY PROBLEM
 There are many situations where such unidirectional relationship between Y
and the Xs cannot be maintained, since some Xs affect Y but in turn Y also
affects one or more Xs.
 In other words, there may be a feedback relationship between the Y and X variables.
 Simultaneous equation regression models are models that take into account
feedback relationships among variables.
 Endogenous variables are variables whose values are determined in the
model.
 Exogenous variables are variables whose values are not determined in the
model.
 Sometimes, exogenous variables are called predetermined variables, for their values are
determined independently or fixed, such as the tax rates fixed by the government.
 Estimate parameters using Method of Indirect Least Squares (ILS) or
Method of Two-Stage Least Squares (2SLS).
Damodar Gujarati
Econometrics by Example
Download