Note

advertisement
CLASSICAL LINEAR REGRESSION MODEL
INTRODUCTION
The classical linear regression model is a statistical model that describes a data generation
process.
SPECIFICATION
The specification of the classical linear regression model is defined by the following set of
assumptions.
Assumptions
1. The functional form is linear in parameters.
Yt = 1Xt1 + 2Xt2 + … + kXtk + t
2. The error term has mean zero.
E(t) = 0 for t = 1, 2, …, T
3. The error term has constant variance.
Var(t) = E(t2) = 2 for t = 1, 2, …, T
4. The errors are uncorrelated.
Cov(t,s) = E(t  s) = 0 for all t  s
5. The error term has a normal distribution.
t ~ N for t = 1, 2, …, T
6. The error term is uncorrelated with each explanatory variable.
Cov(t,Xti) = E(t  Xti) = 0 for t = 1, 2, …, T and i = 1, 2, …, K
7. The explanatory variables are nonrandom variables.
Classical Linear Regression Model Concisely Stated
The sample of T multivariate observations (Yt, Xt1, Xt2, …, Xtk) are generated by a process
described as follows.
Yt = 1Xt1 + 2Xt2 + … + kXtk + t
t ~ N(0, 2)
for t = 1, 2, …, T
or alternatively,
Yt ~ N( 1Xt1 + 2Xt2 + … + kXtk , 2) for t = 1, 2, …, T
Classical Linear Regression Model in Matrix Format
The sample of T multivariate observations (Yt, Xt1, Xt2, …, Xtk) are generated by a process
described by the following system of T equations.
Observation 1 Y1 = 1X11 + 2X12 + … + kX1k + 1
Observation 2 Y2 = 1X21 + 2X22 + … + kX2k + 2
………………………………………
Observation T YT = 1XT1 + 2XT2 + … + kXTk + T
Note the following. 1) There is one equation for each multivariate observation. 2) The
parameters are constants, and therefore have the same value for each multivariate
observation. 3) The system of T equations can be written equivalently in matrix format as
follows.
y = X + 
y is a Tx1 column vector of observations on the dependent variable. X is a TxK matrix of
observations on the K-1 explanatory variables X2, X3, …Xk. The first column of the matrix X is a
column of 1’s representing the constant (intercept) term. The matrix X is called the data matrix
or the design matrix.  is a Kx1 column vector of parameters 1, 2 … k.  is a Tx1 column
vector of disturbances (errors).
Assumptions in Matrix Format
1. The functional form is linear in parameters.
y = X + 
2. The mean vector of disturbances is a Tx1 null vector.
E() = 0
3. The disturbances are spherical. (The variance-covariance matrix of disturbances is a TxT
diagonal
matrix).
Cov() = E(T) = 2I
Where superscript T denotes transpose and I is a TxT identity matrix.
4. The disturbance vector has a multivariate normal distribution.
~N
5. The disturbance vector is uncorrelated with the data matrix.
Cov (,X) = 0
6. The data matrix is a nonstochastic matrix.
Classical Linear Regression Model Concisely Stated in Matrix Format
The sample of T multivariate observations (Yt, Xt1, Xt2, …, Xtk) are generated by a process
described as follows.
y = X + ,
 ~ N(0, 2I)
or alternatively
y ~ N(X, 2I)
ESTIMATION
For the classical linear regression model, there are K+1 parameters to estimate: K regression
coefficients 1, 2 … k, and the error variance (conditional variance of Y) 2.
Choosing an Estimator for 1, 2 …k
To obtain estimates of the parameters, you need to choose an estimator. To choose an
estimator, you choose an estimation procedure. You then apply the estimation procedure to
your statistical model. This yields an estimator. In econometrics, the estimation procedures
used most often are:
1. Least squares estimation procedure
2. Maximum likelihood estimation procedure
Least Squares Estimation Procedure
When you apply the least squares estimation procedure to the classical linear regression model
you get the ordinarily least squares (OLS) estimator. The least squares estimation procedure
tells you to choose as your estimates of the unknown parameters those values that minimize
the residual sum of squares function for the sample of data. For the classical linear regression
model, the residual sum of squares function is
RSS(1^, 2^ … k^) = (Yt - 1^ - 2^ X12 - … - k^ X1k)2
Or in matrix format,
RSS( ^) = (y - X ^)T(y - X^)
The first-order necessary conditions for a minimum are
XTX^ = XTy
These are called the normal equations. If the inverse of the KxK matrix XTX exists, then you can
find the
solution vector ^. The solution vector is given by
^ = (XTX)-1XTy
where ^ is a Kx1 column vector of estimates for the K-parameters of the model. This formula
is the OLS estimator. It is a rule that tells you how to use the sample of data to obtain
estimates of the population parameters.
Maximum Likelihood Estimation Procedure
When you apply the maximum likelihood estimation procedure to the classical linear regression
model you get the maximum likelihood estimator. The maximum likelihood estimation
procedure tells you to choose as your estimates of the unknown parameters those values that
maximize the likelihood function for the sample of data. For the classical linear regression
model, the maximum likelihood estimator is
^ = (XTX)-1XTy
Thus, for the classical linear regression model the maximum likelihood estimator is the same as
the OLS estimator.
Choosing an Estimator for 2
To obtain an estimate of the error variance, the following estimator is the preferred estimator,
2^
RSS
T
=  = 
T–k
T–k
Properties of the OLS Estimator
1. Linear Estimator
The OLS estimator is a linear estimator
2. Sampling Distribution of the OLS Estimator
The OLS estimator has a multivariate normal sampling distribution
3. Mean of the OLS Estimator
The mean vector of the OLS estimator gives the mean of the sampling distribution of the
estimator for each of the K parameters. To derive the mean vector of the OLS estimator, you
need to make two assumptions:
1. The error term has mean zero.
2. The error term is uncorrelated with each explanatory variable.
If these two assumptions are satisfied, then it can be shown that the mean vector of the OLS
estimator is
E(^) = 
That is, the mean vector of the OLS estimator is equal to the true values of the population
parameters being estimated. This tells us that for the classical linear regression model the OLS
estimator is an unbiased estimator.
4. Variance-Covariance Matrix of Estimates
The variance-covariance matrix of estimates gives the variances and covariances of the
sampling distributions of the estimators of the K parameters. To derive the variancecovariance matrix of estimates, you need to make four assumptions:
1. The error term has mean zero.
2. The error term is uncorrelated with each explanatory variable
3. The error term has constant variance.
4. The errors are uncorrelated.
If these four assumptions are satisfied, then it can be shown that the variance-covariance
matrix of estimates is
Cov( ^) = 2(XTX)-1
For the classical linear regression model, it can be shown that the elements in the variancecovariance matrix of OLS estimates is less than or equal to the corresponding elements in the
variance-covariance matrix for any alternative linear unbiased estimator; therefore, for the
classical linear regression model the OLS estimator is an efficient estimator.
5. Sampling Distribution of the OLS Estimator Written Concisely
^ ~ N(, 2(XTX)-1)
The OLS estimator has a multivariate normal distribution with mean vector  and variancecovariance matrix 2(XTX)-1.
Summary of Small Sample Properties
Gauss-Markov Theorem - For the classical linear regression model, the OLS estimator is the
best linear unbiased estimator (BLUE) of the population parameters.
Summary of Large Sample Properties
For the classical linear regression model, the OLS estimator is asymptotically unbiased,
consistent, and asymptotically efficient.
Estimating the Variance-Covariance Matrix of Estimates
The true variance-covariance matrix of estimates, 2(XTX)-1 is unknown. The is because the true
error variance 2 is unknown. Therefore, the variance-covariance matrix of estimates must be
estimated using the sample of data. To obtain an estimate of the variance-covariance matrix,
you replace 2 with its estimate 2^ = RSS / (T – K). This yields the estimated variancecovariance matrix of estimates
Cov^( ^) = 2^(XTX)-1
HYPOTHESIS TESTING
The following statistical tests can be used to test hypotheses in the classical linear regression
model.
1.
2.
3.
4.
5.
t-test
F-test
Likelihood ratio test
Wald test
Lagrange multiplier test
You must choose the appropriate test to test the hypothesis in which you are interested.
GOODNESS-OF-FIT
If our objective is to use the explanatory variable(s) to predict the dependent variable, then we should
measure the goodness of fit of the model. Goodness-of-fit refers to how well the model fits the sample
data. The better the model fits the data, the higher the predictive validity of the model, and therefore
the better values of X should predict values of Y. The statistical measure that is used most often to
measure the accuracy of a classical linear regression model is he R-squared (R2) statistic.
R-Squared Statistic
The coefficient of determination measures the proportion of the variation in the dependent variable
that is explained by the variation in the explanatory variables. It can take any value between 0 and 1. If
the R2 statistic is equal to zero, then the explanatory variables explain none of the variation in the
dependent variable. If the R2 is equal to one, then the explanatory variables explain all of the variation
in the dependent variable. The R2 statistic is a measure of goodness of fit. This is because it measures
how well the sample regression line fits the data. If the R2 is equal to one, then all of the data points lie
on the sample regression line. If the R2 is equal to zero, then the data points are highly scattered around
the regression line, which is a horizontal line. The higher the R2 statistic, the better the explanatory
variables explain the dependent variable, and using this criterion the better the model.
Important Points About the Coefficient of Determination
1. The R2 can take any value between zero and one.
2. The closer the data points to the sample regression line, the better the line fits the data. The better
the line fits the data, the lower the residual sum of squares and the higher the R2 statistic.
3. If R2 =1, then all the data points lie on the sample regression line.
4. If R2 = 0, then the regression line is horizontal at the sample mean of Y. In this case, the simple mean
of Y predicts Y as well as the conditional mean of Y.
5. The R2 statistic can be computed by finding the correlation coefficient between the actual values of
(Yt) and the corresponding estimated values (Yt), and squaring this correlation coefficient. This is
true regardless of the number of explanatory variables in the model.
6. The OLS estimator fits a line to the data that minimizes the residual sum of squares. By doing this,
the OLS estimator fits a line to the data that minimizes the unexplained variation is Y, and therefore
maximizes the explained variation in Y. Thus, the OLS estimator fits a line to the data that
maximizes the R2 statistic.
7. There is no rule-of-thumb to decide whether the R2 statistic is high or low. When a model is
estimated with time-series data, the R2 statistic is usually high. This is because with time-series
data, the variables tend to have underlying trends that make them highly correlated. When a model
is estimated with cross-section data, the R2 statistic is usually lower. Therefore, an R2 statistic of 0.5
may be considered relatively low for a model estimated with time-series data, and relatively high for
a model estimated with cross-section data.
8. Peter Kennedy says, “ In general econometricians are interested in obtaining good parameter
estimates, where “good” is not defined in terms of R2. Consequently, the measure of R2 is not of
much importance in econometrics. Unfortunately, many practitioners act as though it is important,
for reasons that are not entirely clear.” Cramer states, “Measures of goodness of fit have a fatal
attraction. Although it is generally conceded among insiders that they do not mean a thing, high
values are still a source of pride and satisfaction to their authors, however hard they may try to
conceal these feelings.”
Major Shortcoming of the R-Squared Statistic
The R2 statistic has a major deficiency. When you add additional independent variables to the model,
the R2 cannot decrease and will most likely increase. Thus, it may be tempting to engage in a “fishing
expedition” to increase the R2.
Adjusted R-Squared Statistic
To penalize the “fishing expedition” for variables that increase the R2, economists most often use a
measure called the adjusted R2. The adjusted R2 is the R2 statistic adjusted for degrees of freedom. The
following points should be noted about the Adjusted R2 statistic.
1. Adding an additional variable to the model can either increase or decrease the Adjusted R2.
2. The Adjusted R2 statistic can never be larger than the R2 statistic for the same model.
3. The Adjusted R2 statistic can be negative. A negative Adjusted R2 statistic indicates that the
statistical model does not adequately describe the economic data generation process.
4. If the t-statistic for a coefficient is one or greater, then dropping the variable from the model
will decrease the adjusted R2 statistic. If the t-statistic is less than one, than dropping the
variable from the model will increase the adjusted R2.
R2 Statistic When the Model Does Not Include a Constant
If the statistical model does not include a constant, then the R2 does not measure the proportion of the
variation in the dependent variable explained by the independent variable, and therefore should not be
used.
PREDICTION
Often times, the objective of an empirical study is to make predictions about the dependent variable.
This is also called forecasting. In general, the better the model fits the sample data, the better the it will
predict the dependent variable. Said another way, the larger the amount of variation in the dependent
variable that the model explains, the better it will predict the dependent variable. Thus, if your
objective if prediction, then you would place more emphasis on the R2 statistic. This is because the
higher the R2 statistic, the greater the predictive ability of the model over the sample observations.
Download