Note

advertisement
MULTIPLE CLASSICAL LINEAR REGRESSION MODEL
The multiple classical linear regression model (MCLRM) is an extension of the SCLRM. It
allows for more than one explanatory variable.
ADVANTAGES OF THE MCLRM
The MCLRM has 3 major advantages.
1. We can control for observable confounding variables
2. We can estimate the independent causal effects of two or more explanatory variables
3. We can include additional variables to increase the predictive ability of the model
SPECIFICATION OF THE MODEL
Once again, the statistical relationship between two or more economic variables can be written in
general functional form as Yt = ƒ(Xt1, Xt2…Xtk) + μt. For the MCLRM, we let Xt1 = 1 for each
unit, so Xt1 plays the role of a constant. The MCLRM allows the economic relationship to have a
dependent variable (Y) and k – 1 explanatory variables (Xt2…Xtk). All factors other than Xt2…Xtk
that affect Y are included in the error term, μ. To specify the MCLRM model, we make three
types of assumptions. 1) Variables 1) Functional form. 2) Error term.
VARIABLES
The MCLRM assumes that all variables that have an important effect on Y are included in the
model, and all variables that are unimportant are excluded from the model. The unimportant
variables are represented by the error term.
Possible Violations of Assumption
This assumption is violated if one or more variables other than Xt2…Xtk have an important effect
on Y, or one or more of the included variables Xt2…Xtk are unimportant. If this assumption is
violated, then the MCLRM may not be a reasonable approximation of the true data generation
process.
FUNCTIONAL FORM
The MCLRM assumes that the conditional mean function (population regression function) is
linear. Assuming there are 3 explanatory variables, we have E(Y|X1,X2,X3,X4) = ƒ(X1,X2,X3,X4) =
β1 + β2X2 + β3X3 + β4X4. The parameters are β1, β2, β3, β4. We are assuming that the average value
of Y is a linear function of the X’s in the population.
Interpretation of Parameters
The parameter β1 is the intercept of the regression surface. It measures the average value of Y
when the X’s equal zero. This parameter is usually not of primary interest. The parameters β2, β3,
β4 are the slope parameters. They are given by βi = Δ E(Y| X1,X2,X3,X4) / Δ Xi, holding all other
X’s constant; for i = 2, 3, 4. βi measures the change is average Y when Xi changes by one unit,
holding all other X’s constant. It is the marginal effect of Xi on Y. The slope parameters are of
primary interest. The 3 important questions we want address can be restated as follows
(parentheses) 1) Does Xi effect Y? (Is βi zero or nonzero?). 2) What is the direction of the effect?
(Is the algebraic sign of βi positive or negative?). 3) What is the size of the effect? (What is the
magnitude of βi?)
Elasticity
Suppose we want to address the following question. “Which explanatory variable (X2, X3, …, Xk)
has the biggest effect on Y?” To answer this question, we need a unit-free measure of marginal
effect. This allows us to compare the marginal effects of two or more variables that are measured
in different units. In economics, the unit-free measure of marginal effect used most often is
elasticity. The elasticity of Y with respect to Xi is defined as εyx = %∆Y / %∆Xi, holding all other
X’s constant; for i = 2, 3, 4. Applying this formula to the conditional mean function we have εyx
= βi (Xi / Y), holding all other X’s constant; for i = 2, 3, 4.
Possible Violations of Assumption
If the assumption of linearity is violated, then the MCLRM may not be a reasonable
approximation of the true data generation process.
ERROR TERM
The statistical relationship between Yt and the Xt’s is given by Yt = β1 + β2X2 + β3X3 + β4X4+ μt.
The error term is a random variable that represents the “net effect” of all factors other than the
Xt’s that affect Yt for the tth unit in the population. By definition, the error term measures the
deviation between Yt and the conditional mean of Yt for the tth unit. That is, μt = Yt – (β1 + β2Xt2
+ β3Xt3 + β4Xt4).
Assumptions
We describe the behavior of the random variable μ by a conditional probability distribution
ƒ(μ| X1,X2,X3,X4). For each set of values of the X’s there is a probability distribution for μ. The
assumptions about the error term for the MCLRM are the same as the SCLRM. They are
extended to two or more X’s.
1. Error term has mean zero.
2. Error term is uncorrelated with each explanatory variable.
3. Error term has constant variance.
4. Errors are independent.
5. Error Term Has a Normal Distribution
E(μt|Xt1,Xt2,Xt3,Xt4) = 0
Cov(μt, Xit) = 0 for i = 2,3,4
Var(μt|Xt1,Xt2,Xt3,Xt4) = σ2
Cov(μt, μs) ≠ 0
μt ~ N
The possible violations of these assumptions are similar to those we discussed for the SCLRM.
MCLRM CONCISELY STATED
The MCLRM can be written concisely in any of the following 4 ways.
Yt = β1 + β2X2 + β3X3 + β4X4 + μt
E(μt|Xt1,Xt2,Xt3,Xt4) = 0
Yt = β1 + β2X2 + β3X3 + β4X4 + μt
E(Yt| Xt1,Xt2,Xt3,Xt4) = β1 + β2X2 + β3X3 + β4X4t
Var(μt| Xt1,Xt2,Xt3,Xt4) = σ2
Cov(μt, μs) = 0
μt ~ N
Var(Yt| Xt1,Xt2,Xt3,Xt4) = σ2
Cov(Yt, Ys) = 0
Yt ~ N
Yt = β1 + β2X2 + β3X3 + β4X4 + μt
μt ~ iid N(0, σ2)
Yt ~ iid N(β1 + β2X2 + β3X3 + β4X4, σ2)
ESTIMATION
ESTIMATOR FOR β1, β2, β3, β4
To obtain estimates of the regression coefficients, β1, β2, β3, β4, we use the ordinary least squares
(OLS) estimator. The ordinary least squares estimator tell us to choose as our estimates of β1, β2,
β3, β4 the numbers that minimize the residual sum of squares for the sample.
Residual Sum of Squares Function
The residual sum of squares function is given by
ESS(β1^, β2^, β3^, β4^) = ∑(Yt – β1 – β2Xt2 – β3Xt3 – β4Xt4)2
Deriving the OLS Estimators for  and 
To derive the OLS estimators for β1, β2, β3, β4, we find the values of β1^, β2^, β3^, β4^ that minimize
the function ESS. This is an unconstrained minimization problem in 4 unknowns. To find the
values that minimize this function, we find the 4 partial derivative functions and set them equal to
zero. This yields, the a system of 4 equations in 4 unknowns, called the normal equations.
Solving this system of 4 equations in 4 unknowns yields 4 solution expressions, one for each
unknown parameter. These equations are the OLS estimators β1^, β2^, β3^, β4^. They are a set of
rules that tell us how to use the data to obtain estimates of the 4 regression coefficients.
Sampling Distributions for β1^, β2^, β3^, β4^
Given the assumptions of the MCLRM, the sampling distribution for  i is given by
i ~ N(Mean, Variance), where
Mean = E(βi^) = βi
Variance = Var(βi^) = 2 / [ ∑(Xt – XBAR)2 (1 – Ri2) ]
_________________________
Standard Error s.e.(βi^) = √ 2 / [ ∑(Xt – XBAR)2 (1 – Ri2) ]
The variance is for β2^, β3^, β4^. The variance for β1^ is somewhat different. Ri2 is the R2 statistic
from a regression of the explanatory variable Xi on all other explanatory variables in the model. It
measures the degree of correlation among the explanatory variables.
Variance/Covariance Matrix of Estimates
The variance/covariance matrix summarizes all variances and covariances for the OLS estimators
β1^, β2^, β3^, β4^.
Small Sample Properties
Given the assumptions of the MCLRM, the OLS estimators are unbiased and efficient in small
samples.
Large Sample Properties
Given the assumptions of the MCLRM, the OLS estimators are consistent in large samples.
Conclusions about the OLS Estimators
If the assumptions of the CLRM are satisfied, and therefore the CLRM is a reasonable
approximation of the true data generation process, then the OLS estimators are the best
estimators. This is because we can’t find an alternative estimator that produces more accurate
and reliable estimates than OLS; that is, estimates that will consistently come closer to the true
values of the population parameters. However, the following caveats must be noted.
Best May Not Be Very Good
The OLS estimator will be more reliable than any other estimator. However, this may not be very
reliable. The smaller the variation in Xi the higher the correlation between Xi and the other X’s,
and the larger the error variance, the bigger the variance of the sampling distribution of βi^. The
bigger the variance of the sampling distributions of βi^, the less reliable the OLS estimator, and
therefore the less precise the estimate.
Bias in the OLS Estimator
The OLS estimator will be biased if the error term does not have mean zero. The error term will
not have mean zero if it is correlated with the explanatory variable. The error term will be
correlated with the explanatory variable if there are omitted confounding variables, reverse
causation, sample selection problems, or measurement error in the explanatory variable.
ESTIMATOR FOR σ2
The estimator for the error variance is same as the SCLRM: 2  = ESS / df = ESS / (n – k).
Standard Error of the Regression
The standard error of the regression is the square root of the estimate of the error variance.
ESTIMATOR FOR THE VARIANCE, COVARIANCE, AND STANDARD ERROR OF THE
OLS ESTIMATORS σ^ AND β^
The estimators for the variances and standard errors of the OLS estimators β2^, β3^, β4^ are
Var(βi^)^ = 2^ / [ ∑(Xt – XBAR)2 (1 – Ri2) ]
______________________
s.e.(βi^)^ = √ 2^ / [ ∑(Xt – XBAR)2 (1 – Ri2) ]
for i = 2,3,4
for i = 2,3,4
There are similar formulas for the estimators Var(β1^)^, s.e.(β1^)^, and the covariances.
ESTIMATOR FOR ELASTICITY
The estimator for the average elasticity of Y with respect to Xi is: εyx^ = βi^ (XiBAR / YBAR), where
βi^ is the OLS estimator of βi; XiBAR is the sample mean of Xi; YBAR is the sample mean of Y. The
estimated standard error of the elasticity estimate is: s.e.( εyx^)^ = (XiBAR / YBAR) • s.e.( βi^)^,
where s.e.( βi^)^ is the estimated standard error of the estimate βi^.
INTERVAL ESTIMATES of σ^ and β^
Same as SCLRM
HYPOTHESIS TESTING
In a multiple regression model, there are 4 major types of hypotheses we can test. These are:
1. Hypothesis about an individual parameter (fixed value restriction).
2. Joint hypothesis about two or more individual parameters (joint fixed value
restriction).
3. Linear hypothesis (linear restriction).
4 Nonlinear hypothesis (nonlinear restriction).
Hypothesis about an Individual Parameter
Same as for the SCLRM. The most often tested hypothesis is the hypothesis that the value of
single parameter is zero. This is the hypothesis that an explanatory variable has no effect on the
dependent variable.
Joint Hypothesis about Two or More Individual Parameters
A joint hypothesis is a hypothesis that the values of two or more individual parameters are jointly
zero. This is a hypothesis about whether two or more explanatory variables have no joint effect
on the dependent variable. The alternative hypothesis is that at least one explanatory variable has
an effect on the dependent variable.
Linear Hypothesis
A linear hypothesis is a hypothesis that one parameter is a linear function of one or more other
parameters.
Nonlinear Hypothesis
A nonlinear hypothesis is a hypothesis that one parameter is a nonlinear function of one or more
other parameters.
HYPOTHESIS ABOUT AN INDIVIDUAL PARAMETER
Testing an hypothesis about an individual parameter is the same as the SCLRM. We can use 3
alternative approaches. 1) Level of significance approach. 2) Confidence interval approach. 3) Pvalue approach. The appropriate statistical test is the t-test. The p-value can be interpreted as a
measure of the strength of evidence for an effect.
JOINT AND LINEAR HYPOTHESES
To test joint and linear hypotheses we will use the level of significance approach. The 5 basic
steps involved in the level of significance approach is the same as the SCLRM.
Appropriate Statistical Test
To test a joint hypothesis about two or more individual parameters, you use an F-test. To test a
single linear hypothesis, you can use either a t-test or an F-test. In this class, we will use the F-test
since it is the easiest to implement. Note that the author of your book calls an F-test a Wald test.
We will not use this terminology. The t-test and the F-test are small (finite) sample tests. This is
because the test statistic for each of these tests has a known, exact sampling distribution in finite
samples.
F-Test
Like all statistical tests, the F-test has a test statistic that follows a sampling distribution under the
null hypothesis. The test statistic is the F-statistic. The sampling distribution of the F-statistic is
an F distribution.
Test Statistic
The F-statistic can be calculated from information obtained from estimating two models. 1)
Unrestricted Model. 2) Restricted Model. The unrestricted model imposes no restrictions on the
parameters of the model. The restricted model imposes the restriction(s) on the parameters of the
statistical model that define the null-hypothesis. The F-statistic is given by F = (RSSR) – (RSSU) 
(DFR – DFU) / RSSU  DFU, where RSSR is the residual sum of squares for the restricted model;
RSSU is the residual sum of squares for the unrestricted model; DFR is the degrees of freedom for
the restricted model; DFU is the degrees of freedom for the unrestricted model. Note the divisor in
the numerator, (DFR – DFU), is always equal to the number of restrictions being tested, denoted J.
The divisor in the denominator, DFU = n – k, where n is the sample size and k is the number of
regression coefficients in the unrestricted model. Therefore, we can write the F-statistic
equivalently as F = [(RSSR – RSSU) / J] / [RSSU / (n – k)].
Sampling Distribution of the Test Statistic
Under the null-hypothesis, the F-statistic has an F-distribution with J degrees of freedom in the
numerator, and n – k degrees of freedom in the denominator. This is because the F-statistic is the
ratio of two random variables. We can write this as follows,
F = [(RSSR – RSSU) / J] / [RSSU / (n – k)]
~ F(J , n – k)
Calculating the F-Statistic
When using a statistical package like SAS, there are 3 ways to calculate the actual value of the Fstatistic for a sample of data. 1) Derive the restricted model, and tell SAS to estimate the
unrestricted and restricted models. Use the information to calculate the F-statistic. 2) Tell SAS to
derive the restricted model, and estimate the unrestricted and restricted models. Use the
information to calculate the F-statistic. 3) Tell SAS to calculate the F-statistic.
PREDICTION AND GOODNESS OF FIT
To use X2, X3, and X4 to predict Y, we substitute values of these variables into the sample
regression function and calculate the corresponding value of Y. The value of Y is the predicted
value of Y.
$15.
Measures of Goodness of Fit
The 3 most often measures of goodness of fit for the MCLRM are the following. 1) Standard
error of the regression. 2) R2 statistic. 3) Adjusted R2 statistic.
Standard Error of the Regression
Same as SCLRM
R2 Statistic
Same as SCLRM
Adjusted R2 Statistic
To penalize the “fishing expedition” of adding variables to the model to increase R2, economists
most often use a measure of goodness of fit called the adjusted R2. The adjusted R2 is the R2
statistic adjusted for degrees of freedom. The adjusted R2 statistic is
Adjusted R2 = 1 – [RSS  (n – k) / TSS  (n – 1) = 1 – (unexplained Var(Y) / explained Var(Y))
or equivalently,
Adjusted R2 = 1 – [RSS / TSS][(n – 1) / (n – k)]
When you add an additional variable to the model, this has 2 opposing effects on the Adjusted R2.
1) RSS decreases, (RSS / TSS) decreases, and hence Adjusted R2 increases. This is a measure of
the benefit of adding the extra variable to the model. 2) k increases, n – k decreases, (n – 1) / (n –
k) increases, and hence Adjusted R2 decreases. This is the penalty or a measure of the cost of
adding an additional variable to the model. To conclude, adding an additional variable to the
model can either increase or decrease the Adjusted R2 statistic. The following points should be
noted about the Adjusted R2 statistic.
1. Adding an additional variable to the model can either increase or decrease the Adjusted R2.
2. The Adjusted R2 statistic can never be larger than the R2 statistic for the same model.
3. The Adjusted R2 statistic can be negative. A negative Adjusted R2 statistic indicates that the
statistical model does not adequately describe the economic data generation process.
4. If the t-statistic for a coefficient is one or greater, then dropping the variable from the model
will decrease the adjusted R2 statistic. If the t-statistic is less than one, than dropping the
variable from the model will increase the adjusted R2.
DRAWING CONCLUSIONS FROM THE STUDY
Same as SCLRM
Download