CLASSICAL LINEAR REGRESSION MODEL INTRODUCTION The classical linear regression model is a statistical model that describes a data generation process. SPECIFICATION The specification of the classical linear regression model is defined by the following set of assumptions. Assumptions 1. The functional form is linear in parameters. Yt = 1Xt1 + 2Xt2 + … + kXtk + t 2. The error term has mean zero. E(t) = 0 for t = 1, 2, …, T 3. The error term has constant variance. Var(t) = E(t2) = 2 for t = 1, 2, …, T 4. The errors are uncorrelated. Cov(t,s) = E(t s) = 0 for all t s 5. The error term has a normal distribution. t ~ N for t = 1, 2, …, T 6. The error term is uncorrelated with each explanatory variable. Cov(t,Xti) = E(t Xti) = 0 for t = 1, 2, …, T and i = 1, 2, …, K 7. The explanatory variables are nonrandom variables. Classical Linear Regression Model Concisely Stated The sample of T multivariate observations (Yt, Xt1, Xt2, …, Xtk) are generated by a process described as follows. Yt = 1Xt1 + 2Xt2 + … + kXtk + t t ~ N(0, 2) for t = 1, 2, …, T or alternatively, Yt ~ N( 1Xt1 + 2Xt2 + … + kXtk , 2) for t = 1, 2, …, T Classical Linear Regression Model in Matrix Format The sample of T multivariate observations (Yt, Xt1, Xt2, …, Xtk) are generated by a process described by the following system of T equations. Observation 1 Y1 = 1X11 + 2X12 + … + kX1k + 1 Observation 2 Y2 = 1X21 + 2X22 + … + kX2k + 2 ……………………………………… Observation T YT = 1XT1 + 2XT2 + … + kXTk + T Note the following. 1) There is one equation for each multivariate observation. 2) The parameters are constants, and therefore have the same value for each multivariate observation. 3) The system of T equations can be written equivalently in matrix format as follows. y = X + y is a Tx1 column vector of observations on the dependent variable. X is a TxK matrix of observations on the K-1 explanatory variables X2, X3, …Xk. The first column of the matrix X is a column of 1’s representing the constant (intercept) term. The matrix X is called the data matrix or the design matrix. is a Kx1 column vector of parameters 1, 2 … k. is a Tx1 column vector of disturbances (errors). Assumptions in Matrix Format 1. The functional form is linear in parameters. y = X + 2. The mean vector of disturbances is a Tx1 null vector. E() = 0 3. The disturbances are spherical. (The variance-covariance matrix of disturbances is a TxT diagonal matrix). Cov() = E(T) = 2I Where superscript T denotes transpose and I is a TxT identity matrix. 4. The disturbance vector has a multivariate normal distribution. ~N 5. The disturbance vector is uncorrelated with the data matrix. Cov (,X) = 0 6. The data matrix is a nonstochastic matrix. Classical Linear Regression Model Concisely Stated in Matrix Format The sample of T multivariate observations (Yt, Xt1, Xt2, …, Xtk) are generated by a process described as follows. y = X + , ~ N(0, 2I) or alternatively y ~ N(X, 2I) ESTIMATION For the classical linear regression model, there are K+1 parameters to estimate: K regression coefficients 1, 2 … k, and the error variance (conditional variance of Y) 2. Choosing an Estimator for 1, 2 …k To obtain estimates of the parameters, you need to choose an estimator. To choose an estimator, you choose an estimation procedure. You then apply the estimation procedure to your statistical model. This yields an estimator. In econometrics, the estimation procedures used most often are: 1. Least squares estimation procedure 2. Maximum likelihood estimation procedure Least Squares Estimation Procedure When you apply the least squares estimation procedure to the classical linear regression model you get the ordinarily least squares (OLS) estimator. The least squares estimation procedure tells you to choose as your estimates of the unknown parameters those values that minimize the residual sum of squares function for the sample of data. For the classical linear regression model, the residual sum of squares function is RSS(1^, 2^ … k^) = (Yt - 1^ - 2^ X12 - … - k^ X1k)2 Or in matrix format, RSS( ^) = (y - X ^)T(y - X^) The first-order necessary conditions for a minimum are XTX^ = XTy These are called the normal equations. If the inverse of the KxK matrix XTX exists, then you can find the solution vector ^. The solution vector is given by ^ = (XTX)-1XTy where ^ is a Kx1 column vector of estimates for the K-parameters of the model. This formula is the OLS estimator. It is a rule that tells you how to use the sample of data to obtain estimates of the population parameters. Maximum Likelihood Estimation Procedure When you apply the maximum likelihood estimation procedure to the classical linear regression model you get the maximum likelihood estimator. The maximum likelihood estimation procedure tells you to choose as your estimates of the unknown parameters those values that maximize the likelihood function for the sample of data. For the classical linear regression model, the maximum likelihood estimator is ^ = (XTX)-1XTy Thus, for the classical linear regression model the maximum likelihood estimator is the same as the OLS estimator. Choosing an Estimator for 2 To obtain an estimate of the error variance, the following estimator is the preferred estimator, 2^ RSS T = = T–k T–k Properties of the OLS Estimator 1. Linear Estimator The OLS estimator is a linear estimator 2. Sampling Distribution of the OLS Estimator The OLS estimator has a multivariate normal sampling distribution 3. Mean of the OLS Estimator The mean vector of the OLS estimator gives the mean of the sampling distribution of the estimator for each of the K parameters. To derive the mean vector of the OLS estimator, you need to make two assumptions: 1. The error term has mean zero. 2. The error term is uncorrelated with each explanatory variable. If these two assumptions are satisfied, then it can be shown that the mean vector of the OLS estimator is E(^) = That is, the mean vector of the OLS estimator is equal to the true values of the population parameters being estimated. This tells us that for the classical linear regression model the OLS estimator is an unbiased estimator. 4. Variance-Covariance Matrix of Estimates The variance-covariance matrix of estimates gives the variances and covariances of the sampling distributions of the estimators of the K parameters. To derive the variancecovariance matrix of estimates, you need to make four assumptions: 1. The error term has mean zero. 2. The error term is uncorrelated with each explanatory variable 3. The error term has constant variance. 4. The errors are uncorrelated. If these four assumptions are satisfied, then it can be shown that the variance-covariance matrix of estimates is Cov( ^) = 2(XTX)-1 For the classical linear regression model, it can be shown that the elements in the variancecovariance matrix of OLS estimates is less than or equal to the corresponding elements in the variance-covariance matrix for any alternative linear unbiased estimator; therefore, for the classical linear regression model the OLS estimator is an efficient estimator. 5. Sampling Distribution of the OLS Estimator Written Concisely ^ ~ N(, 2(XTX)-1) The OLS estimator has a multivariate normal distribution with mean vector and variancecovariance matrix 2(XTX)-1. Summary of Small Sample Properties Gauss-Markov Theorem - For the classical linear regression model, the OLS estimator is the best linear unbiased estimator (BLUE) of the population parameters. Summary of Large Sample Properties For the classical linear regression model, the OLS estimator is asymptotically unbiased, consistent, and asymptotically efficient. Estimating the Variance-Covariance Matrix of Estimates The true variance-covariance matrix of estimates, 2(XTX)-1 is unknown. The is because the true error variance 2 is unknown. Therefore, the variance-covariance matrix of estimates must be estimated using the sample of data. To obtain an estimate of the variance-covariance matrix, you replace 2 with its estimate 2^ = RSS / (T – K). This yields the estimated variancecovariance matrix of estimates Cov^( ^) = 2^(XTX)-1 HYPOTHESIS TESTING The following statistical tests can be used to test hypotheses in the classical linear regression model. 1. 2. 3. 4. 5. t-test F-test Likelihood ratio test Wald test Lagrange multiplier test You must choose the appropriate test to test the hypothesis in which you are interested. GOODNESS-OF-FIT If our objective is to use the explanatory variable(s) to predict the dependent variable, then we should measure the goodness of fit of the model. Goodness-of-fit refers to how well the model fits the sample data. The better the model fits the data, the higher the predictive validity of the model, and therefore the better values of X should predict values of Y. The statistical measure that is used most often to measure the accuracy of a classical linear regression model is he R-squared (R2) statistic. R-Squared Statistic The coefficient of determination measures the proportion of the variation in the dependent variable that is explained by the variation in the explanatory variables. It can take any value between 0 and 1. If the R2 statistic is equal to zero, then the explanatory variables explain none of the variation in the dependent variable. If the R2 is equal to one, then the explanatory variables explain all of the variation in the dependent variable. The R2 statistic is a measure of goodness of fit. This is because it measures how well the sample regression line fits the data. If the R2 is equal to one, then all of the data points lie on the sample regression line. If the R2 is equal to zero, then the data points are highly scattered around the regression line, which is a horizontal line. The higher the R2 statistic, the better the explanatory variables explain the dependent variable, and using this criterion the better the model. Important Points About the Coefficient of Determination 1. The R2 can take any value between zero and one. 2. The closer the data points to the sample regression line, the better the line fits the data. The better the line fits the data, the lower the residual sum of squares and the higher the R2 statistic. 3. If R2 =1, then all the data points lie on the sample regression line. 4. If R2 = 0, then the regression line is horizontal at the sample mean of Y. In this case, the simple mean of Y predicts Y as well as the conditional mean of Y. 5. The R2 statistic can be computed by finding the correlation coefficient between the actual values of (Yt) and the corresponding estimated values (Yt), and squaring this correlation coefficient. This is true regardless of the number of explanatory variables in the model. 6. The OLS estimator fits a line to the data that minimizes the residual sum of squares. By doing this, the OLS estimator fits a line to the data that minimizes the unexplained variation is Y, and therefore maximizes the explained variation in Y. Thus, the OLS estimator fits a line to the data that maximizes the R2 statistic. 7. There is no rule-of-thumb to decide whether the R2 statistic is high or low. When a model is estimated with time-series data, the R2 statistic is usually high. This is because with time-series data, the variables tend to have underlying trends that make them highly correlated. When a model is estimated with cross-section data, the R2 statistic is usually lower. Therefore, an R2 statistic of 0.5 may be considered relatively low for a model estimated with time-series data, and relatively high for a model estimated with cross-section data. 8. Peter Kennedy says, “ In general econometricians are interested in obtaining good parameter estimates, where “good” is not defined in terms of R2. Consequently, the measure of R2 is not of much importance in econometrics. Unfortunately, many practitioners act as though it is important, for reasons that are not entirely clear.” Cramer states, “Measures of goodness of fit have a fatal attraction. Although it is generally conceded among insiders that they do not mean a thing, high values are still a source of pride and satisfaction to their authors, however hard they may try to conceal these feelings.” Major Shortcoming of the R-Squared Statistic The R2 statistic has a major deficiency. When you add additional independent variables to the model, the R2 cannot decrease and will most likely increase. Thus, it may be tempting to engage in a “fishing expedition” to increase the R2. Adjusted R-Squared Statistic To penalize the “fishing expedition” for variables that increase the R2, economists most often use a measure called the adjusted R2. The adjusted R2 is the R2 statistic adjusted for degrees of freedom. The following points should be noted about the Adjusted R2 statistic. 1. Adding an additional variable to the model can either increase or decrease the Adjusted R2. 2. The Adjusted R2 statistic can never be larger than the R2 statistic for the same model. 3. The Adjusted R2 statistic can be negative. A negative Adjusted R2 statistic indicates that the statistical model does not adequately describe the economic data generation process. 4. If the t-statistic for a coefficient is one or greater, then dropping the variable from the model will decrease the adjusted R2 statistic. If the t-statistic is less than one, than dropping the variable from the model will increase the adjusted R2. R2 Statistic When the Model Does Not Include a Constant If the statistical model does not include a constant, then the R2 does not measure the proportion of the variation in the dependent variable explained by the independent variable, and therefore should not be used. PREDICTION Often times, the objective of an empirical study is to make predictions about the dependent variable. This is also called forecasting. In general, the better the model fits the sample data, the better the it will predict the dependent variable. Said another way, the larger the amount of variation in the dependent variable that the model explains, the better it will predict the dependent variable. Thus, if your objective if prediction, then you would place more emphasis on the R2 statistic. This is because the higher the R2 statistic, the greater the predictive ability of the model over the sample observations.