MULTIPLE CLASSICAL LINEAR REGRESSION MODEL The multiple classical linear regression model (MCLRM) is an extension of the SCLRM. It allows for more than one explanatory variable. ADVANTAGES OF THE MCLRM The MCLRM has 3 major advantages. 1. We can control for observable confounding variables 2. We can estimate the independent causal effects of two or more explanatory variables 3. We can include additional variables to increase the predictive ability of the model SPECIFICATION OF THE MODEL Once again, the statistical relationship between two or more economic variables can be written in general functional form as Yt = ƒ(Xt1, Xt2…Xtk) + μt. For the MCLRM, we let Xt1 = 1 for each unit, so Xt1 plays the role of a constant. The MCLRM allows the economic relationship to have a dependent variable (Y) and k – 1 explanatory variables (Xt2…Xtk). All factors other than Xt2…Xtk that affect Y are included in the error term, μ. To specify the MCLRM model, we make three types of assumptions. 1) Variables 1) Functional form. 2) Error term. VARIABLES The MCLRM assumes that all variables that have an important effect on Y are included in the model, and all variables that are unimportant are excluded from the model. The unimportant variables are represented by the error term. Possible Violations of Assumption This assumption is violated if one or more variables other than Xt2…Xtk have an important effect on Y, or one or more of the included variables Xt2…Xtk are unimportant. If this assumption is violated, then the MCLRM may not be a reasonable approximation of the true data generation process. FUNCTIONAL FORM The MCLRM assumes that the conditional mean function (population regression function) is linear. Assuming there are 3 explanatory variables, we have E(Y|X1,X2,X3,X4) = ƒ(X1,X2,X3,X4) = β1 + β2X2 + β3X3 + β4X4. The parameters are β1, β2, β3, β4. We are assuming that the average value of Y is a linear function of the X’s in the population. Interpretation of Parameters The parameter β1 is the intercept of the regression surface. It measures the average value of Y when the X’s equal zero. This parameter is usually not of primary interest. The parameters β2, β3, β4 are the slope parameters. They are given by βi = Δ E(Y| X1,X2,X3,X4) / Δ Xi, holding all other X’s constant; for i = 2, 3, 4. βi measures the change is average Y when Xi changes by one unit, holding all other X’s constant. It is the marginal effect of Xi on Y. The slope parameters are of primary interest. The 3 important questions we want address can be restated as follows (parentheses) 1) Does Xi effect Y? (Is βi zero or nonzero?). 2) What is the direction of the effect? (Is the algebraic sign of βi positive or negative?). 3) What is the size of the effect? (What is the magnitude of βi?) Elasticity Suppose we want to address the following question. “Which explanatory variable (X2, X3, …, Xk) has the biggest effect on Y?” To answer this question, we need a unit-free measure of marginal effect. This allows us to compare the marginal effects of two or more variables that are measured in different units. In economics, the unit-free measure of marginal effect used most often is elasticity. The elasticity of Y with respect to Xi is defined as εyx = %∆Y / %∆Xi, holding all other X’s constant; for i = 2, 3, 4. Applying this formula to the conditional mean function we have εyx = βi (Xi / Y), holding all other X’s constant; for i = 2, 3, 4. Possible Violations of Assumption If the assumption of linearity is violated, then the MCLRM may not be a reasonable approximation of the true data generation process. ERROR TERM The statistical relationship between Yt and the Xt’s is given by Yt = β1 + β2X2 + β3X3 + β4X4+ μt. The error term is a random variable that represents the “net effect” of all factors other than the Xt’s that affect Yt for the tth unit in the population. By definition, the error term measures the deviation between Yt and the conditional mean of Yt for the tth unit. That is, μt = Yt – (β1 + β2Xt2 + β3Xt3 + β4Xt4). Assumptions We describe the behavior of the random variable μ by a conditional probability distribution ƒ(μ| X1,X2,X3,X4). For each set of values of the X’s there is a probability distribution for μ. The assumptions about the error term for the MCLRM are the same as the SCLRM. They are extended to two or more X’s. 1. Error term has mean zero. 2. Error term is uncorrelated with each explanatory variable. 3. Error term has constant variance. 4. Errors are independent. 5. Error Term Has a Normal Distribution E(μt|Xt1,Xt2,Xt3,Xt4) = 0 Cov(μt, Xit) = 0 for i = 2,3,4 Var(μt|Xt1,Xt2,Xt3,Xt4) = σ2 Cov(μt, μs) ≠ 0 μt ~ N The possible violations of these assumptions are similar to those we discussed for the SCLRM. MCLRM CONCISELY STATED The MCLRM can be written concisely in any of the following 4 ways. Yt = β1 + β2X2 + β3X3 + β4X4 + μt E(μt|Xt1,Xt2,Xt3,Xt4) = 0 Yt = β1 + β2X2 + β3X3 + β4X4 + μt E(Yt| Xt1,Xt2,Xt3,Xt4) = β1 + β2X2 + β3X3 + β4X4t Var(μt| Xt1,Xt2,Xt3,Xt4) = σ2 Cov(μt, μs) = 0 μt ~ N Var(Yt| Xt1,Xt2,Xt3,Xt4) = σ2 Cov(Yt, Ys) = 0 Yt ~ N Yt = β1 + β2X2 + β3X3 + β4X4 + μt μt ~ iid N(0, σ2) Yt ~ iid N(β1 + β2X2 + β3X3 + β4X4, σ2) ESTIMATION ESTIMATOR FOR β1, β2, β3, β4 To obtain estimates of the regression coefficients, β1, β2, β3, β4, we use the ordinary least squares (OLS) estimator. The ordinary least squares estimator tell us to choose as our estimates of β1, β2, β3, β4 the numbers that minimize the residual sum of squares for the sample. Residual Sum of Squares Function The residual sum of squares function is given by ESS(β1^, β2^, β3^, β4^) = ∑(Yt – β1 – β2Xt2 – β3Xt3 – β4Xt4)2 Deriving the OLS Estimators for and To derive the OLS estimators for β1, β2, β3, β4, we find the values of β1^, β2^, β3^, β4^ that minimize the function ESS. This is an unconstrained minimization problem in 4 unknowns. To find the values that minimize this function, we find the 4 partial derivative functions and set them equal to zero. This yields, the a system of 4 equations in 4 unknowns, called the normal equations. Solving this system of 4 equations in 4 unknowns yields 4 solution expressions, one for each unknown parameter. These equations are the OLS estimators β1^, β2^, β3^, β4^. They are a set of rules that tell us how to use the data to obtain estimates of the 4 regression coefficients. Sampling Distributions for β1^, β2^, β3^, β4^ Given the assumptions of the MCLRM, the sampling distribution for i is given by i ~ N(Mean, Variance), where Mean = E(βi^) = βi Variance = Var(βi^) = 2 / [ ∑(Xt – XBAR)2 (1 – Ri2) ] _________________________ Standard Error s.e.(βi^) = √ 2 / [ ∑(Xt – XBAR)2 (1 – Ri2) ] The variance is for β2^, β3^, β4^. The variance for β1^ is somewhat different. Ri2 is the R2 statistic from a regression of the explanatory variable Xi on all other explanatory variables in the model. It measures the degree of correlation among the explanatory variables. Variance/Covariance Matrix of Estimates The variance/covariance matrix summarizes all variances and covariances for the OLS estimators β1^, β2^, β3^, β4^. Small Sample Properties Given the assumptions of the MCLRM, the OLS estimators are unbiased and efficient in small samples. Large Sample Properties Given the assumptions of the MCLRM, the OLS estimators are consistent in large samples. Conclusions about the OLS Estimators If the assumptions of the CLRM are satisfied, and therefore the CLRM is a reasonable approximation of the true data generation process, then the OLS estimators are the best estimators. This is because we can’t find an alternative estimator that produces more accurate and reliable estimates than OLS; that is, estimates that will consistently come closer to the true values of the population parameters. However, the following caveats must be noted. Best May Not Be Very Good The OLS estimator will be more reliable than any other estimator. However, this may not be very reliable. The smaller the variation in Xi the higher the correlation between Xi and the other X’s, and the larger the error variance, the bigger the variance of the sampling distribution of βi^. The bigger the variance of the sampling distributions of βi^, the less reliable the OLS estimator, and therefore the less precise the estimate. Bias in the OLS Estimator The OLS estimator will be biased if the error term does not have mean zero. The error term will not have mean zero if it is correlated with the explanatory variable. The error term will be correlated with the explanatory variable if there are omitted confounding variables, reverse causation, sample selection problems, or measurement error in the explanatory variable. ESTIMATOR FOR σ2 The estimator for the error variance is same as the SCLRM: 2 = ESS / df = ESS / (n – k). Standard Error of the Regression The standard error of the regression is the square root of the estimate of the error variance. ESTIMATOR FOR THE VARIANCE, COVARIANCE, AND STANDARD ERROR OF THE OLS ESTIMATORS σ^ AND β^ The estimators for the variances and standard errors of the OLS estimators β2^, β3^, β4^ are Var(βi^)^ = 2^ / [ ∑(Xt – XBAR)2 (1 – Ri2) ] ______________________ s.e.(βi^)^ = √ 2^ / [ ∑(Xt – XBAR)2 (1 – Ri2) ] for i = 2,3,4 for i = 2,3,4 There are similar formulas for the estimators Var(β1^)^, s.e.(β1^)^, and the covariances. ESTIMATOR FOR ELASTICITY The estimator for the average elasticity of Y with respect to Xi is: εyx^ = βi^ (XiBAR / YBAR), where βi^ is the OLS estimator of βi; XiBAR is the sample mean of Xi; YBAR is the sample mean of Y. The estimated standard error of the elasticity estimate is: s.e.( εyx^)^ = (XiBAR / YBAR) • s.e.( βi^)^, where s.e.( βi^)^ is the estimated standard error of the estimate βi^. INTERVAL ESTIMATES of σ^ and β^ Same as SCLRM HYPOTHESIS TESTING In a multiple regression model, there are 4 major types of hypotheses we can test. These are: 1. Hypothesis about an individual parameter (fixed value restriction). 2. Joint hypothesis about two or more individual parameters (joint fixed value restriction). 3. Linear hypothesis (linear restriction). 4 Nonlinear hypothesis (nonlinear restriction). Hypothesis about an Individual Parameter Same as for the SCLRM. The most often tested hypothesis is the hypothesis that the value of single parameter is zero. This is the hypothesis that an explanatory variable has no effect on the dependent variable. Joint Hypothesis about Two or More Individual Parameters A joint hypothesis is a hypothesis that the values of two or more individual parameters are jointly zero. This is a hypothesis about whether two or more explanatory variables have no joint effect on the dependent variable. The alternative hypothesis is that at least one explanatory variable has an effect on the dependent variable. Linear Hypothesis A linear hypothesis is a hypothesis that one parameter is a linear function of one or more other parameters. Nonlinear Hypothesis A nonlinear hypothesis is a hypothesis that one parameter is a nonlinear function of one or more other parameters. HYPOTHESIS ABOUT AN INDIVIDUAL PARAMETER Testing an hypothesis about an individual parameter is the same as the SCLRM. We can use 3 alternative approaches. 1) Level of significance approach. 2) Confidence interval approach. 3) Pvalue approach. The appropriate statistical test is the t-test. The p-value can be interpreted as a measure of the strength of evidence for an effect. JOINT AND LINEAR HYPOTHESES To test joint and linear hypotheses we will use the level of significance approach. The 5 basic steps involved in the level of significance approach is the same as the SCLRM. Appropriate Statistical Test To test a joint hypothesis about two or more individual parameters, you use an F-test. To test a single linear hypothesis, you can use either a t-test or an F-test. In this class, we will use the F-test since it is the easiest to implement. Note that the author of your book calls an F-test a Wald test. We will not use this terminology. The t-test and the F-test are small (finite) sample tests. This is because the test statistic for each of these tests has a known, exact sampling distribution in finite samples. F-Test Like all statistical tests, the F-test has a test statistic that follows a sampling distribution under the null hypothesis. The test statistic is the F-statistic. The sampling distribution of the F-statistic is an F distribution. Test Statistic The F-statistic can be calculated from information obtained from estimating two models. 1) Unrestricted Model. 2) Restricted Model. The unrestricted model imposes no restrictions on the parameters of the model. The restricted model imposes the restriction(s) on the parameters of the statistical model that define the null-hypothesis. The F-statistic is given by F = (RSSR) – (RSSU) (DFR – DFU) / RSSU DFU, where RSSR is the residual sum of squares for the restricted model; RSSU is the residual sum of squares for the unrestricted model; DFR is the degrees of freedom for the restricted model; DFU is the degrees of freedom for the unrestricted model. Note the divisor in the numerator, (DFR – DFU), is always equal to the number of restrictions being tested, denoted J. The divisor in the denominator, DFU = n – k, where n is the sample size and k is the number of regression coefficients in the unrestricted model. Therefore, we can write the F-statistic equivalently as F = [(RSSR – RSSU) / J] / [RSSU / (n – k)]. Sampling Distribution of the Test Statistic Under the null-hypothesis, the F-statistic has an F-distribution with J degrees of freedom in the numerator, and n – k degrees of freedom in the denominator. This is because the F-statistic is the ratio of two random variables. We can write this as follows, F = [(RSSR – RSSU) / J] / [RSSU / (n – k)] ~ F(J , n – k) Calculating the F-Statistic When using a statistical package like SAS, there are 3 ways to calculate the actual value of the Fstatistic for a sample of data. 1) Derive the restricted model, and tell SAS to estimate the unrestricted and restricted models. Use the information to calculate the F-statistic. 2) Tell SAS to derive the restricted model, and estimate the unrestricted and restricted models. Use the information to calculate the F-statistic. 3) Tell SAS to calculate the F-statistic. PREDICTION AND GOODNESS OF FIT To use X2, X3, and X4 to predict Y, we substitute values of these variables into the sample regression function and calculate the corresponding value of Y. The value of Y is the predicted value of Y. $15. Measures of Goodness of Fit The 3 most often measures of goodness of fit for the MCLRM are the following. 1) Standard error of the regression. 2) R2 statistic. 3) Adjusted R2 statistic. Standard Error of the Regression Same as SCLRM R2 Statistic Same as SCLRM Adjusted R2 Statistic To penalize the “fishing expedition” of adding variables to the model to increase R2, economists most often use a measure of goodness of fit called the adjusted R2. The adjusted R2 is the R2 statistic adjusted for degrees of freedom. The adjusted R2 statistic is Adjusted R2 = 1 – [RSS (n – k) / TSS (n – 1) = 1 – (unexplained Var(Y) / explained Var(Y)) or equivalently, Adjusted R2 = 1 – [RSS / TSS][(n – 1) / (n – k)] When you add an additional variable to the model, this has 2 opposing effects on the Adjusted R2. 1) RSS decreases, (RSS / TSS) decreases, and hence Adjusted R2 increases. This is a measure of the benefit of adding the extra variable to the model. 2) k increases, n – k decreases, (n – 1) / (n – k) increases, and hence Adjusted R2 decreases. This is the penalty or a measure of the cost of adding an additional variable to the model. To conclude, adding an additional variable to the model can either increase or decrease the Adjusted R2 statistic. The following points should be noted about the Adjusted R2 statistic. 1. Adding an additional variable to the model can either increase or decrease the Adjusted R2. 2. The Adjusted R2 statistic can never be larger than the R2 statistic for the same model. 3. The Adjusted R2 statistic can be negative. A negative Adjusted R2 statistic indicates that the statistical model does not adequately describe the economic data generation process. 4. If the t-statistic for a coefficient is one or greater, then dropping the variable from the model will decrease the adjusted R2 statistic. If the t-statistic is less than one, than dropping the variable from the model will increase the adjusted R2. DRAWING CONCLUSIONS FROM THE STUDY Same as SCLRM