CHAPTER 6 Multiple Regression 1. 2. 3. 4. 5. 6. 7. 8. The model The Least Squares Method of Determining Sample Regression Coefficients Using Matrix Algebra to Solve the System of Least Squares Normal Equations Standard Error of Estimate Sampling Properties of the Least Squares Estimators 5.1. The Variances and Covariances of the Least Squares Estimators Inferences about the Population Regression Parameters 6.1. Interval Estimates for the Coefficients of the Regression 6.2. Confidence Interval for the Mean Value of y for Given Values of π₯π 6.3. Prediction Interval for the Individual Value of y for Given Values of π₯π 6.4. Hypothesis Testing for a Single Coefficient 6.4.1. Two-Tail Test of Significance 6.4.2. One-Tailed Tests 6.4.2.1. Example 1: Testing for Price Elasticity of Demand 6.4.2.2. Example 2: Testing for Effectiveness of Advertising 6.4.2.3. Example 3: Testing for a Linear Combination of Coefficients Extension of the Regression Model 7.1. The Optimal Level of Advertising 7.2. Interaction Variables 7.3. Log-Linear Model Measuring Goodness of Fit—π 2 8.1. Adjusted π 2 1. The model In multiple regression models we may include two or more independent variables. We will start with the model that includes two independent variables π₯2 and π₯3 . The main features of and assumptions regarding the simple regression model can be extended to multiple regression models. Here the dependent variable is influenced by the variations in the independent variables through the three parameters π½1 , π½2 , and π½3 . The disturbance term u is still present, because the variations in y continue to be influenced by the random component π’. Thus, π¦π = π½1 + π½2 π₯2π + π½3 π₯3π + π’π Given that πΈ(π’π ) = 0, the following holds (dropping the π subscript), πΈ(π¦) = π½1 + π½2 π₯2 + π½3 π₯3 The sample regression equation is π¦ = π1 + π2 π₯2 + π3 π₯3 + π To estimate the parameters from the sample data we need formulas to determine values for the estimators of the population parameters. The estimators are the sample regression coefficients π1 , π2 , and π3 . Chapter 6—Multiple Regression 1 of 26 2. The Least Squares Method of Determining Sample Regression Coefficients We want to find the values for π1 , π2 , and π3 such that the sum of the squared deviation of the observed π¦ from the fitted plane is minimized. The deviation takes on the same familiar form as in the simple regression model π = π¦ − π¦Μ where π¦Μ is the predicted value which lies on the regression plane. π¦Μ = π1 + π2 π₯2 + π3 π₯3 Substituting for π¦Μ in the residual equation above, squaring both sides, and then summing for all π, we have, π = π¦ − π1 − π2 π₯2 − π3 π₯3 ∑π2 = ∑(π¦ − π1 − π2 π₯2 − π3 π₯3 )2 Taking three partial derivatives, one for each coefficient, and then setting them equal to zero, we obtain three normal equations. π ∑π 2 = −2∑(π¦ − π1 − π2 π₯2 − π3 π₯3 ) = 0 ππ1 π ∑π 2 = −2∑π₯2 (π¦ − π1 − π2 π₯2 − π3 π₯3 ) = 0 ππ2 π ∑π 2 = −2∑π₯3 (π¦ − π1 − π2 π₯2 − π3 π₯3 ) = 0 ππ3 The normal equations are: ∑π¦ − ππ1 − π2 ∑π₯2 − π3 ∑π₯3 = 0 ∑π₯2 π¦ − π1 ∑π₯2 − π2 ∑π₯22 − π3 ∑π₯2 π₯3 = 0 ∑π₯3 π¦ − π1 ∑π₯3 − π2 ∑π₯2 π₯3 − π3 ∑π₯23 = 0 To find the solutions for the three π’s, we can write the normal equations in the following way, by taking the terms not involving the ππ to the right-hand-side. ππ1 + (∑π₯2 )π2 + (∑π₯3 )π3 = ∑π¦ (∑π₯2 )π1 + (∑π₯22 )π2 + (∑π₯2 π₯3 )π3 = ∑π₯2 π¦ (∑π₯3 )π1 + (∑π₯2 π₯3 )π2 + (∑π₯32 )π3 = ∑π₯3 π¦ Here we have a system of three equations with three unknowns, π1 , π2 , and π3 . Now we need to develop a method to find the solution for these unknowns. For this, a brief introduction to matrix algebra is called for. (See the appendix at the end of this chapter.) Chapter 6—Multiple Regression 2 of 26 3. Using Matrix Algebra to Solve the System of Least Squares Normal Equations We can write the system of normal equations in the matrix format: π [∑π₯2 ∑π₯3 ∑π₯2 ∑π₯3 ∑π¦ π1 2 ∑π₯2 ∑π₯2 π₯3 ] [π2 ] = [∑π₯2 π¦] π3 ∑π₯2 π₯3 ∑π₯32 ∑π₯3 π¦ with the following shorthand notation ππ = π where, π π = [∑π₯2 ∑π₯3 ∑π₯2 ∑π₯3 2 ∑π₯2 ∑π₯2 π₯3 ] ∑π₯2 π₯3 ∑π₯32 π1 π = [π2 ] π3 ∑π¦ π = [∑π₯2 π¦ ] ∑π₯3 π¦ Solutions for ππ are obtained by finding the product of the inverse matrix of π, π −π , times π. π = π −π π Example: We want to obtain a regression of ππππ‘βππ¦ π ππππ (π¦), in $1,000's, on πππππ (π₯2 ), in dollars, and πππ£πππ‘ππ πππ (π₯3 ), in $1,000's, of a fast food restaurant. The data is contained in the Excel file “CH6 DATA” in worksheet tab “burger”. Use Excel to compute the values for the elements of the matrix π and π€: π [∑π₯2 ∑π₯3 ∑π₯2 ∑π₯3 75 426.5 138.3 ∑π₯22 ∑π₯2 π₯3 ] = [426.5 2445.707 787.381] 138.3 787.381 306.21 ∑π₯2 π₯3 ∑π₯32 ∑π¦ 5803.1 [∑π₯2 π¦] = [32847.7] 10789.6 ∑π₯3 π¦ Thus, our system of normal equations can be written as: 75 426.5 [426.5 2445.707 138.3 787.381 138.3 π1 5803.1 787.381] [π2 ] = [32847.7] 306.21 π3 10789.6 Since π = π −1 π, we need to find π −1 . Now that we know what the inverse of a matrix means and how to find it, we have Excel to do the hard work for us. In Excel use the array type function =ππππππππ(). You must first highlight a block of 3 × 3 cells, then call up the function =MINVERSE(). When it asks for “array”, you must “lasso” the block of cells containing the elements of the π matrix and then press Ctrl-Shift-Enter keys together. The result is the following: Chapter 6—Multiple Regression 3 of 26 1.689828 -0.28462 -0.03135 -0.28462 0.050314 -0.00083 -0.03135 -0.00083 0.019551 which is the matrix π −1 . Then pre multiplying π by π −1 , we have bβ bβ bβ 1.689828 -0.28462 -0.03135 = -0.28462 0.050314 -0.00083 -0.03135 -0.00083 0.019551 × 5803.1 32847.7 10789.6 = 118.9136 -7.90785 1.862584 π¦Μ = 118.9136 − 7.90785π₯2 + 1.862584π₯3 Μ = 118.914 − 7.908ππ πΌπΆπΈ + 1.863π΄π·ππΈπ π ππ΄πΏπΈπ π2 = −7.908 implies that for each dollar increase in price (advertising held constant) monthly sales would fall by $7,908. Or, a 10¢ increase in price would result in a decrease in monthly sales of $790.8. π3 = $1.863 implies that (holding price constant) for each additional $1,000 of advertising sales would increase by $1,863. The following is the Excel regression output showing the estimated coefficients. SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.669521 0.448258 0.432932 4.886124 75 ANOVA df 2 72 74 SS 1396.5389 1718.9429 3115.4819 MS 698.26946 23.874207 F 29.247859 Significance F 5.04086E-10 Coefficients 118.91361 -7.90785 1.86258 Standard Error 6.35164 1.09599 0.68320 t Stat 18.72172 -7.21524 2.72628 P-value 0.00000 0.00000 0.00804 Lower 95% 106.25185 -10.09268 0.50066 Regression Residual Total Intercept PRICE ADVERT Upper 95% 131.57537 -5.72303 3.22451 4. Standard Error of Estimate In simple regression we learned that the disturbance term π’ for each given π₯ is normally distributed about the population regression line, with an expected value E(π’) = 0, and variance var(π’). The unbiased estimator of var(π’), var(π), was obtained by using the least squares residuals π = π¦ − π¦Μ. var(π) = ∑π 2 π−2 = ∑(π¦ − π¦Μ)2 π−2 The square root of the estimated variance was called the standard error of estimate, se(π). Here with two independent variables, the least squares residuals take the same form: π = π¦ − π¦Μ = π¦ − (π1 + π2 π₯2 + π3 π₯3 ) Chapter 6—Multiple Regression 4 of 26 Thus the estimated variance of the disturbance or error term is: var(π) = ∑π 2 π−π = ∑(π¦ − π¦Μ)2 π−π where π is number of parameters of the regression being estimated. Here π = 3, for estimating π½1 , π½2 , and π½3 . Given the estimated regression equation in the last example, π¦Μ = 118.9136 − 7.90785π₯2 + 1.862584π₯3 the predicted value of ππ΄πΏπΈπ (π¦Μ), for price π₯2 = $6.20 and advertising π₯3 = $3.0 (observation #10), is: π¦Μ = 118.9136 − 7.90785(6.2) + 1.862584(3) = $75.473 (thousand) The observed value of sales is π¦10 = $76.4. Thus, the residual is calculated as π = 76.4 − 75.473 = 0.927. Computing such residuals for all 75 observations and finding the sum of squared residuals, the estimated variance and the standard error of estimate are, respectively var(π) = ∑(π¦ − π¦Μ)2 π−π = 1718.943 = 23.874 72 se(π) = 4.886 5. Sampling Properties of the Least Squares Estimators As they were in simple regression, the least squares estimators, or the coefficients of the estimated regression, are sample statistics obtained from a random sample. Thus they are all random variables each with its own expected value and variance. Again, like the simple regression coefficients, these estimators are BLUE. ο· ο· ο· π1 , π2 , and π3 are each a linear function of the dependent variable π¦. The coefficients are unbiased estimators: E(π1 ) = π½1 , E(π2 ) = π½2 , and E(π3 ) = π½3 var(π1 ), var(π2 ), and var(π3 ) are all minimum variances. Once again, the linearity assumption is important for inferences about the parameters of the regression. If the disturbance terms π’ , and correspondingly the π¦ values, are normally distributed about the regression plane, then linearly related coefficients are also normally distributed. If the disturbance terms are not normal, then according to the central limit theorem, the coefficients will be approximately normal for large samples. 5.1. The Variances and Covariances of the Least Squares Estimators Since the least squares estimators are random variables, their variances show how closely or widely dispersed they tend to scatter around their respective population parameters. In a multiple regression model with three parameters π½1 , π½2 , and π½3 , the variance of each estimator π1 , π2 , and π3 is estimated from the sample data. We can determine the variances and covariances of the least square estimators using elementary matrix algebra. Using the inverse matrix X −1 determined above, we can generate a covariance matrix as follows: Chapter 6—Multiple Regression 5 of 26 var(π1 ) [covar(π1 , π2 ) covar(π1 , π3 ) covar(π1 , π2 ) var(π2 ) covar(π2 , π3 ) var(π)π −1 = 23.87421 × covar(π1 , π3 ) covar(π2 , π3 )] = var(π)π −1 var(π3 ) 1.6898 -0.2846 -0.0313 -0.2846 0.0503 -0.0008 -0.0313 -0.0008 0.0196 = 40.3433 -6.7951 -0.7484 -6.7951 1.2012 -0.0197 -0.7484 -0.0197 0.4668 From the covariance matrix on the extreme right hand side we have the following variances and covariances: var(π1 ) = 40.3433 var(π2 ) = 1.2012 var(π3 ) = 0.4668 covar(π1 , π2 ) = −6.7951 covar(π1 , π3 ) = −0.7484 covar(π2 , π3 ) = −0.0197 6. Inferences about the Population Regression Parameters To build an interval estimate or perform a hypothesis test about the π½’s in the population regression, the least square estimators of these parameters must be normally distributed. The normality is established if the disturbance terms and, correspondingly, the π¦ values for the various values of the explanatory variables are normally distributed. And, as long as the least squares estimators are linear functions of the π¦, then the estimator are also normally distributed. 6.1. Interval Estimates for the Coefficients of the Regression Following the same reasoning as in simple regression, the interval estimates for each π½ parameter take the following form, for a given confidence level 1 − πΌ (confidence level): For π½1 : πΏ, π = π1 ± π‘α⁄2,ππ se(π1 ) For π½2 : πΏ, π = π2 ± π‘α⁄2,ππ se(π2 ) For π½3 : πΏ, π = π3 ± π‘α⁄2,ππ se(π3 ) The standard error for each regression coefficient can be obtained from the variance-covariance matrix. The 95% interval estimates for the slope coefficients in the above example, based on the following information, are given below. π1 = 18.914 π2 = −7.908 π3 = 1.863 π‘0.025,(72) = 1.993 se(π1 ) = 6.352 se(π2 ) = 1.096 se(π3 ) = 0.683 The 95% interval estimate for the response of ππ΄πΏπΈπ to a ππ πΌπΆπΈ change, π½2 , is πΏ, π = π2 ± π‘0.025,(π−3) se(π2 ) πΏ, π = −7.908 ± (1.993)(1.096) = [−10.093, −5.723] Chapter 6—Multiple Regression 6 of 26 This implies that a $1 decrease in price will result in an increase in revenue somewhere between $5,723 and $10,093. The 95% interval estimate for the response of ππ΄πΏπΈπ to a change in π΄π·ππΈπ π, π½3 , is πΏ, π = π3 ± π‘0.025,(π−3) se(π3 ) πΏ, π = 1.863 ± (1.993)(0.683) = [0.501,3.225] An increase in advertising expenditure of $1,000 would increase sales between $501 and $3,225. This is a relatively wide interval and, therefore, does not convey good information about the range of responses in sales to advertising. This wide interval arises from the fact that the sampling variability of the π3 coefficient is large. One way to reduce this variability is to increase the sample size. But this in many cases is impractical because of the limitations on data availability. An alternative way is to introduce some kind of non-sample information on the coefficient, to be explained later. 6.2. Interval Estimate for the Mean Value of y for Given Values of ππ The mean value of π¦ for given values of the independent variables is π¦Μ0 , the predicted value of π¦ for the given values xββ, xββ, ... The confidence interval for the mean value of y is therefore, πΏ, π = π¦Μ0 ± π‘α⁄2,(π−π) se(π¦Μ0 ) To determine se(π¦Μ0 ), start with var(π¦Μ0 ). For a model with two independent variables, π¦Μ0 = π1 + π2 π₯02 + π3 π₯03 var(π¦Μ0 ) = var(π1 + π2 π₯02 + π3 π₯03 ) 2 2 var(π¦Μ0 ) = var(π1 ) + π₯02 var(π2 ) + π₯03 var(π3 ) + 2π₯02 covar(π1 , π2 ) + 2π₯03 covar(π1 , π3 ) + 2π₯02 π₯03 covar(π2 , π3 ) To build a 95% interval for the mean value of ππ΄πΏπΈπ in the model under consideration, Μ = 118.914 − 7.908ππ πΌπΆπΈ + 1.863π΄π·ππΈπ π ππ΄πΏπΈπ Let π₯02 = ππ πΌπΆπΈ0 = 5 and π₯03 = π΄π·ππΈπ π0 = 3. Then π¦Μ0 = 84.96. π¦Μ0 = 118.914 − 7.908(5) + 1.863(3) = 84.96 From the covariance matrix, 40.3433 -6.7951 -0.7484 -6.7951 1.2012 -0.0197 -0.7484 -0.0197 0.4668 we have, var(π¦Μ0 ) = 40.3433 + (52 )(1.2012) + (32 )(0.4668) + 2(5)(−6.7951) + 2(3)(−0.7484) + 2(5)(3)(−0.0197) var(π¦Μ0 ) = 1.5407 se(π¦Μ0 ) = √1.5407 = 1.241 Chapter 6—Multiple Regression 7 of 26 πΏ, π = 84.96 ± (1.993)(1.241) = [82.49,87.44] 6.3. Prediction Interval for the Individual Value of π for a Given π For the given value of π₯π , π₯0π , the difference between an individual value π¦0 and the mean value π¦Μ0 is the error term, π¦0 − π¦Μ0 = π Therefore, var(π¦0 ) = var(π) + var(π¦Μ0 ) In the above example, var(π¦0 ) = 23.8742 + 1.5407 = 25.4149 se(π¦0 ) = 5.0413 πΏ, π = 84.96 ± (1.993)(5.0413) = [74.91,95.01] 6.4. Interval Estimation for a Linear Combination of Coefficients In the above example, suppose we want to increase advertising expenditure by $800 and reduce the price by $0.40. Build a 95% interval estimate for the change in the mean (expected) sales. The change in the mean sales is: βπ¦Μ = π¦Μ1 − π¦Μ0 βπ¦Μ = [π1 + π2 (π₯02 − 0.4) + π3 (π₯03 + 0.8)] − [π1 + π2 π₯02 + π3 π₯03 ] βπ¦Μ = −0.4π2 + 0.8π3 The interval estimate is: πΏ, π = βπ¦Μ ± π‘0.025,ππ se(βπ¦Μ) πΏ, π = (−0.4π2 + 0.8π3 ) ± π‘0.025,ππ se(−0.4π2 + 0.8π3 ) var(−0.4π2 + 0.8π3 ) = 0.42 var(π2 ) + 0.82 var(π3 ) − 2 × 0.4 × 0.8 × cov(π2 , π3 ) var(−0.4π2 + 0.8π3 ) = 0.42 (1.2012) + 0.82 (0.4668) − 2 × 0.4 × 0.8(−0.0197) var(−0.4π2 + 0.8π3 ) = 0.5036 se(−0.4π2 + 0.8π3 ) = 0.7096 βπ¦Μ = −0.4π2 + 0.8π3 = −0.4(−7.908) + 0.8(1.8626) = 4.653 πΏ, π = 4.653 ± 1.993 × 0.7096 = [3.239,6.068] We are 95% confident the mean increase in sales will be between $3,239 and $6,068. Chapter 6—Multiple Regression 8 of 26 6.5. Hypothesis Testing for a Single Coefficient 6.5.1. Two-Tail Test of Significance Generally, a computer output such as the Excel regression output provides the π‘ test statistic for the test of null hypothesis π»0 : π½π = 0 against π»1 : π½π ≠ 0. If the null hypothesis is not rejected, then we conclude that the independent variable π₯π does not influence π¦. The test statistic for such a test is ππ = |π‘| = ππ se(ππ ) And, since this is a two-tail test, the critical value, for a level of significance α, is πΆπ = π‘α⁄2,(π−π) . For example, π»0 : π½2 = 0 π»1 : π½2 ≠ 0 ππ = 7.908⁄1.096 = 7.215, and πΆπ = π‘0.025,(72) = 1.993. Since ππ > πΆπ, we reject the null hypothesis and conclude that revenue is related to price. Also note that the probability value for the test statistic is 2 × P(π‘ > 7.215) ≈ 0. Since this is less than α = 0.05, we reject the null. The test for π½3 is as follows: π»0 : π½3 = 0 π»1 : π½3 ≠ 0 ππ = 1.863⁄0.683 = 2.726 and πΆπ = 1.993. Since ππ > πΆπ, we reject the null hypothesis and conclude that revenue is related to advertising expenditure. The probability value for the test statistic is 2 × P(π‘ > 2.726) = 0.0080. All these figures are presented in the Excel regression output shown above. 6.5.2. One-Tailed Tests We will use the current example to provide an example of one-tailed test of hypothesis in regression. 6.5.2.1. Example 1: Testing for Price Elasticity of Demand Suppose we are interested to test the hypothesis that the demand is price inelastic against the alternative hypothesis that demand is price elastic. According to the total-revenue test of elasticity, if demand is inelastic, then a decrease in price would lead to a decrease in revenue. If demand is elastic, a decrease in price will lead to an increase in revenue. π»0 : π½2 ≥ 0 (When demand is price inelastic, price and revenue change in the same direction.) π»1 : π½2 < 0 (When demand is price elastic, price and revenue change in the opposite direction.) Note that here we are giving the benefit of the doubt to “inelasticity”. If demand is elastic, we want to have strong evidence of that. Also note that the regression result already provides some evidence that demand is elastic (π2 = −7.908 < 0). But is this evidence significant? The test statistic is Chapter 6—Multiple Regression 9 of 26 ππ = π2 − (π½2 )0 se(π2 ) Since by the null hypothesis (π½2 )0 is equal to (or greater than) zero, the test statistic simplifies to ππ = π2 ⁄se(π2 ). However, since this is a one-tailed test, the critical value is πΆπ = π‘α,(π−π) . ππ = 7.908⁄1.096 = 7.215 > πΆπ = π‘0.05,(72) = 1.666 This leads us to reject the null hypothesis and conclude that there is strong evidence that π½2 is negative, and hence demand is elastic. 6.5.2.2. Example 2: Testing for Effectiveness of Advertising We can also perform a test for the effectiveness of advertising. Does an increase in advertising expenditure bring an increase in total revenue above that spent on advertising? That is, is ππ¦⁄ππ₯3 > 1? Again, the sample regression provides that π3 = 1.863 > 1. But we want to prove if this evidence is significant. Thus, π»0 : π½3 ≤ 1 π»1 : π½3 > 1 The test statistic is, π‘= π3 − (π½3 )0 1.862 − 1 = = 1.263 se(π2 ) 0.683 Since π‘ = 1.263 < πΆπ = π‘0.05,(72) = 1.666, do not reject the null hypothesis. The p-value for the test is P(π‘ > 1.263) = 0.1053, which exceeds πΌ = 0.05. The test does not prove that advertising is effective. That is, π½3 is not significantly greater than 1. 6.5.2.3. Example 3: Testing for a Linear Combination of Coefficients Test the hypothesis that dropping price by $0.20 will be more effective for increasing sales revenue than increasing advertising expenditure by $500, that is: −0.20π½2 > 0.5π½3 Note that the regression model provides that −0.2π2 = 1.582 > 0.5π3 = 0.931. The test is to prove that this is significant. Therefore, the null and alternative hypotheses are: π»0 : − 0.2π½2 − 0.5π½3 ≤ 0 π»1 : − 0.2π½2 − 0.5π½3 > 0 The test statistic is, π‘= −0.2π2 − 0.5π3 − (−0.2π½2 − 0.5π½3 )0 se(−0.2π2 − 0.5π3 ) ππ = −0.2π2 − 0.5π3 se(−0.2π2 − 0.5π2 ) The problem here is to find the standard error of the linear combination of the two coefficients in the denominator. For that, first determine var(−0.20π2 − 0.5π2 ): var(−0.2π2 − 0.5π3 ) = (−0.2)2 var(π2 ) + (−0.5)2 var(π3 ) + 2(−0.2)(−0.5)covar(π2 , π3 ) Chapter 6—Multiple Regression 10 of 26 Form the covariance matrix above, we obtain the following: var(−0.2π2 − 0.5π3 ) = (0.2)2 (1.2012) + (0.5)2 (0.4668) + 2(0.2)(0.5)(−0.0197) var(−0.2π2 − 0.5π3 ) = 0.1608 se(−0.2π2 − 0.5π3 ) = 0.4010 Then, π‘= (−0.2)(−7.9079) − (0.5)(1.8626) = 1.622 0.4010 πΆπ = π‘0.05,(72) = 1.666 Since π‘ = 1.622 < πΆπ = π‘0.05,(72) = 1.666, do not reject the null hypothesis. The p-value for the test is P(π‘ > 1.622) = 0.055, which exceeds πΌ = 0.05. The test does not prove that −0.20π½2 > 500π½3 . 7. Extension of the Regression Model We can extend a regression model by altering the existing independent variables and treat them as new variables. Let’s use the current example of sales-price and advertising expenditure model. We want to address the issue that sales does not rise indefinitely and at a constant rate in response to increases in advertising expenditure. As expenditure on advertising rises revenues may rise at a decreasing (rather than a constant) rate, implying diminishing returns to advertising expenditure. To take into account the impact of the diminishing returns on advertising is to include the squared value of advertising, π₯32 in the model. π¦ = π½1 + π½2 π₯2 + π½3 π₯3 + π½4 π₯32 + π’ E(π¦) = π½1 + π½2 π₯2 + π½3 π₯3 + π½4 π₯32 Thus, πE(π¦) = π½3 + 2π½4 π₯3 ππ₯3 We expect revenues to increase with each additional unit increase in advertising expenditure. Therefore, π½3 > 0. We also expect the rate of increase in revenues to decrease with each additional unit increase in advertising. Therefore, π½4 < 0. Once we point out the characteristics of the extended model, we can treat π₯32 as a new variable π₯4 . Using the same data, the Excel regression output is show below. Chapter 6—Multiple Regression 11 of 26 SUMMARY OUTPUT Regression Statistics Multiple R 0.7129061 R Square 0.5082352 Adjusted R Square 0.4874564 Standard Error 4.645283 Observations 75 ANOVA df SS Regression MS F 3 1583.397408 527.79914 Residual 71 1532.084459 21.578654 Total 74 3115.481867 Coefficients Intercept Standard Error t Stat Significance F 24.459316 P-value 5.59996E-11 Lower 95% Upper 95% 109.719 6.799 16.137 1.87E-25 96.162 PRICE -7.640 1.046 -7.304 3.236E-10 -9.726 123.276 -5.554 ADVERT 12.151 3.556 3.417 0.0010516 5.060 19.242 ADVERT² -2.768 0.941 -2.943 0.0043927 -4.644 -0.892 From the regression table, the coefficient of π΄π·ππΈπ π 2 , π4 = −2.768. It has the anticipated sign, and it is also significantly different from zero (π­value = 0.0044 < 0.05). There is diminishing returns to advertising. To interpret the role of the coefficients π3 = 12.151 and π4 = −2.768, consider the following table where π¦Μ (ππ΄πΏπΈπ) is computed for a fixed value of π₯2 (ππ πΌπΆπΈ) = $5 and for various values of π₯3 (π΄π·ππΈπ π). Starting from (π₯3 )0 = $1.8 and increasing advertising by the increment of βπ₯3 = 0.10 to , π₯3 = 1.9, the predicted sales increases by βπ¦Μ = $0.19. As advertising expenditure is increased by the same increment, the increment in predicted sales decreases to 0.14 and then to 0.08. (π₯3 )0 1.8 1.9 2.0 2.1 7.1. π¦Μ 84.42 84.61 84.75 84.83 βπ¦Μ 0.19 0.14 0.08 The Optimal Level of Advertising What is the optimum level of advertising? Optimality in economics always implies marginal benefit of an action be equal to its marginal cost. If marginal benefit exceeds the marginal cost, the action should be taken. If marginal benefit is less than the marginal cost, the action should be curtailed. The optimum is, therefore, where the two are equal. The marginal benefit of advertising is the contribution of each additional one thousand dollar ($1) of advertising expenditure to total revenue. Form the model, π¦ = π½1 + π½2 π₯2 + π½3 π₯3 + π½4 π₯32 + π’ the marginal benefit of advertising is: Chapter 6—Multiple Regression 12 of 26 πE(π¦) = π½3 + 2π½4 π₯3 ππ₯3 Ignoring the marginal cost of additional sales, marginal cost is each additional $1 (thousand) spent on advertising. Thus, the optimality requirement is ππ΅ = ππΆ: π½3 + 2π½4 π₯3 = $1 Using the estimated least squares coefficients, we thus have: 12.151 + 2(−2.768)π₯3 = 1 Yielding, π₯3 = $2.014 thousand. We want to build an interval estimate for the optimal level of advertising: (π₯3 )π = 1 − π½3 2π½4 Note that the sample statistic 1−π3 2π4 is obtained as a nonlinear combination of the two coefficients π3 and π4 . Therefore, we cannot use the same formula that we used to find the variance of the linear combination of the two coefficients. The (approximate) variance of the nonlinear combination of two random variables is obtained using the delta method. Let π= 1 − π3 2π4 Then, ππ 2 ππ 2 ππ ππ var(π) = ( ) var(π3 ) + ( ) var(π4 ) + 2 ( )( ) covar(π3 , π4 ) ππ3 ππ4 ππ3 ππ4 Using partial derivatives, ππ 1 =− ππ3 2π4 and ππ 1 − π3 =− ππ4 2π42 Hence, 2 1 2 1 − π3 1 1 − π3 var(π) = (− ) var(π3 ) + (− ) var(π4 ) + 2 (− ) (− ) covar(π3 , π4 ) 2π4 2π4 2π42 2π42 We can obtain var(π3 ) and var(π4 ) by simply squaring the standard errors from the regression output. Unfortunately, the Excel regression output does not provide the covariance value. We can, however, still use Excel to compute covar(π3 , π4 ). If you recall, using matrix algebra we can determine the variance-covariance matrix: var(π)π −1 . Adding π₯32 to the model, we have now have three independent variables. The solution for this problem is simple because the π matrix can be expanded to incorporate any number of independent variables. For a 3-variable model we have, Chapter 6—Multiple Regression 13 of 26 ∑π₯2 ∑π₯2 ∑π₯22 π= ∑π₯3 ∑π₯2 π₯3 [∑π₯4 ∑π₯2 π₯4 π ∑π₯3 ∑π₯2 π₯3 ∑π₯32 ∑π₯3 π₯4 ∑π₯4 ∑π₯2 π₯4 ∑π₯3 π₯4 ∑π₯42 ] Using Excel we can compute these quantities as the elements of the X matrix: 75.0 426.5 138.3 306.2 426.5 2445.7 787.4 1746.5 138.3 787.4 306.2 755.0 306.2 1746.5 755.0 1982.6 Then determine the inverse matrix π −1 : 2.1423 -0.2978 -0.5376 0.1362 -0.2978 0.0507 0.0139 -0.0040 -0.5376 0.0139 0.5861 -0.1524 0.1362 -0.0040 -0.1524 0.0410 and find the variance-covariance matrix by finding the product var(π)π −1 = 21.579π −1 46.227 -6.426 -11.601 2.939 -6.426 1.094 0.300 -0.086 -11.601 0.300 12.646 -3.289 2.939 -0.086 -3.289 0.885 Thus, 2 1 1 − 12.151 2 1 1 − 12.151 var(π) = ( ) (12.646) + ( ) (0.885) + 2 (− ) (− ) (−3.289) 2 2 × 2.768 2 × 2.768 2 × 2.768 2 × 2.7682 var(π) = 0.41265 + .46857 + 2(1.18064)(0.72773)(−3.28875) = 0.01657 se(π) = 0.12872 The 95% confidence interval for the optimal level of advertising then is, πΏ, π = π ± π‘πΌ⁄2,(π−4) se(π) πΏ, π = 2.014 ± (1.994)(0.12872) = ($1.757, $2.271) 7.2. Interaction Variables An example to illustrate the use of interaction variables is the life-cycle model of consumption behavior. Here the model involves the response of ππΌπππ΄ consumption (π¦) to π΄πΊπΈ (π₯2 ) and πΌππΆπππΈ (π₯3 ) of the consumer. First consider the simple model without the interaction variables. π¦ = π½1 + π½2 π₯2 + π½3 π₯3 + π’ Using the data in the Excel file “CH6 DATA” in the tab “pizza”, the estimated regression equation is Chapter 6—Multiple Regression 14 of 26 π¦Μ = 342.88 − 7.576π₯2 + 1.832π₯3 Thus, holding πΌππΆπππΈ constant, for each additional year of π΄πΊπΈ the expenditure on ππΌπππ΄ decreases by $7.58: π2 = ππ¦Μ = −7.576 ππ₯2 And, holding π΄πΊπΈ constant, for each additional $1,000 increase in πΌππΆπππΈ expenditure on ππΌπππ΄ rises by $1.83. However, we would expect that people of different ages would not spend similar amount on pizza for each additional $1,000 increase in income. It is reasonable to expect that older persons spend smaller amount of additional income on pizza than younger persons. Thus, we expect there to be an interaction between age and income variables. This gives rise to the interaction variable in the model, represented by the product of the variables π΄πΊπΈ and πΌππΆπππΈ: π΄πΊπΈ × πΌππΆπππΈ (π₯2 π₯3 ). π¦ = π½1 + π½2 π₯2 + π½3 π₯3 + π½4 π₯2 π₯3 + π’ The estimated regression equation is now: π¦Μ = 161.465 − 2.977π₯2 + 6.980π₯3 − 0.123π₯2 π₯3 ππ¦Μ = π2 + π4 π₯3 ππ₯2 ππ¦Μ = −2.977 − 0.123π₯3 ππ₯2 ππ¦Μ = π3 + π4 π₯2 ππ₯3 ππ¦Μ = 6.98 − 0.123π₯2 ππ₯2 For π₯3 = 30 ($1,000) For π₯2 = 25 years ππ¦Μ = −2.977 − 0.123(30) = −6.67 ππ₯2 ππ¦Μ = 6.98 − 0.123(25) = 3.90 ππ₯2 When income is $30,000, for each additional year, expenditure on pizza is reduced by $6.67 When age is 25, for each additional $1,000 income, expenditure on pizza is increased by $3.90. For π₯3 = 80 ($1,000) For π₯3 = 50 years ππ¦Μ = −2.977 − 0.123(80) = −12.84 ππ₯2 ππ¦Μ = 6.98 − 0.123(50) = 0.82 ππ₯2 When income is $80,000, for each additional year, expenditure on pizza is reduced by $12.84 When age is 50, for each additional $1,000 income, expenditure on pizza is increased by $0.82. 7.3. Log-Linear Model Let’s start with a model where the percent change in ππ΄πΊπΈ (π¦) is a stochastic function of two independent variables πΈπ·ππΆπ΄ππΌππ (π₯2 ) and years of πΈπππΈπ πΌπΈππΆπΈ (π₯3 ) ln(π¦) = π½1 + π½2 π₯2 + π½3 π₯3 + π’ Chapter 6—Multiple Regression 15 of 26 Now if we believe that the effect of each additional year of experience also depends on the level of education, then we may include the interaction variable πΈπ·ππΆπ΄ππΌππ × πΈπππΈπ πΌπΈππΆπΈ (π₯2 π₯3 ) in the model is a third variable. ln(π¦) = π½1 + π½2 π₯2 + π½3 π₯3 + π½4 π₯2 π₯3 + π’ Using the data in the tab “π€πππ”, the estimated regression equation is Μ ln (π¦) = 1.3923 + 0.09494π₯2 + 0.006329514π₯3 − 0.0000364π₯2 π₯3 The effect of another year of πΈπ·ππΆπ΄ππΌππ, holding πΈπππΈπ πΌπΈππΆπΈ constant, is, 1 ππ¦Μ = π2 + π4 π₯3 π¦Μ ππ₯2 For example, given πΈπππΈπ πΌπΈππΆπΈ, π₯3 = 5 years, the increase in ππ΄πΊπΈ from an extra year of πΈπ·ππΆπ΄ππΌππ is 9.48%. ππ¦Μ⁄ππ₯2 = 0.09494 − 0.000036(5) = 0.09476 (9.476%) π¦Μ At a higher level of πΈπππΈπ πΌπΈππΆπΈ, say π₯3 = 10 years, the percentage increase in ππ΄πΊπΈ for an additional year of πΈπ·ππΆπ΄ππΌππ decreases slightly to 9.46%. ππ¦Μ⁄ππ₯2 = 0.09494 − 0.000036(10) = 0.09457 (9.457%) π¦Μ Note that these percentage changes in ππ΄πΊπΈ are the result of a very small change in the variable πΈπ·ππΆπ΄ππΌππ. The results of a discrete change in the variable πΈπ·ππΆπ΄ππΌππ, where βπ₯2 = 1, are shown in the calculations in the following table. The results show that at a higher level of πΈπππΈπ πΌπΈππΆπΈ, any additional year of education has a slightly smaller impact on ππ΄πΊπΈ. π΄ πΌππ‘ππππππ‘ πΈπ·ππΆ (π₯2 ) πΈπππΈπ (π₯2 ) πΈπ·πΈπ (π₯2 π₯3 ) Μ ln (π¦) π¦Μ βπ¦Μ (βπ¦Μ ⁄π¦Μ)% ππ 1.39232 0.09494 0.00633 -0.000036 π₯02 1 16 5 80 2.94 18.917 π΅ π₯12 1 17 5 85 3.03 20.797 1.880 9.94% π₯02 1 16 10 160 2.97 19.469 π₯12 1 17 10 170 3.06 21.400 1.931 9.92% The effect of another year of πΈπππΈπ πΌπΈππΆπΈ, holding πΈπ·ππΆπ΄ππΌππ constant, is 1 ππ¦Μ = π3 + π4 π₯2 π¦Μ ππ₯3 Holding πΈπ·ππΆπ΄ππΌππconstant at π₯2 = 8, Chapter 6—Multiple Regression 16 of 26 ππ¦Μ⁄ππ₯3 = 0.00633 − 0.000036(8) = 0.00604 (0.604%) π¦Μ Holding πΈπ·ππΆπ΄ππΌππconstant at π₯2 = 16, ππ¦Μ⁄ππ₯3 = 0.00633 − 0.000036(16) = 0.00575 (0.575%) π¦Μ The results of a discrete change in the variable πΈπππΈπ πΌπΈππΆπΈ, where βπ₯3 = 1, are shown the calculation in the following table. π΄ πΌππ‘ππππππ‘ πΈπ·ππΆ (π₯2 ) πΈπππΈπ (π₯2 ) πΈπ·πΈπ (π₯2 π₯3 ) Μ ln (π¦) π¦Μ βπ¦Μ (βπ¦Μ ⁄π¦Μ)% ππ 1.39232 0.09494 0.00633 -0.000036 π0 π΅ π1 1 8 10 80 2.212 9.136 π0 1 8 11 88 2.218 9.192 0.055 0.617% 1 16 10 160 2.969 19.469 π1 1 16 11 176 2.975 19.585 0.116 0.598% The greater the number of years of education, the less valuable is an extra year of experience. 8. Measuring Goodness of Fit—πΉπ The coefficient of determination π 2 in a multiple regression measures the combined effects of all independent variables on π¦. It simply measures the proportion of total variation in π¦ that is explained by the regression model. π 2 = πππ ∑(π¦Μ − π¦Μ )2 = πππ ∑(π¦ − π¦Μ )2 These quantities are easily calculated and they are also shown in the ANOVA part of the regression output for the π΅ππ πΊπΈπ example. ANOVA df Regression Residual Total π 2 = 2 72 74 SS 1396.539 1718.943 3115.482 1396.539 = 0.4483 3115.482 This implies that nearly 45% of the variations in π¦ (monthly sales) is explained by the variations in price and advertising expenditure. In Chapter 3 it was shown that π 2 is a measure of goodness of fit, that is, how well the estimated regression 2 2 fits the data: π 2 = ππ¦π¦ Μ . The same argument applies here. A high π value means there is a close association between the predicted and observed values of π¦. Chapter 6—Multiple Regression 17 of 26 8.1. Adjusted πΉπ In multiple regression π 2 is affected by the number of independent variables. As we add more explanatory variables to the model, π 2 will increase. This would artificially “improve” the model. To see this, consider the regression model π¦Μ = π1 + π2 π₯2 + π3 π₯3 According to the formula π 2 = 1 − ∑(π¦ − π¦Μ)2 ∑(π¦ − π¦Μ )2 Now, if we add another variable to the regression model, then the quantity πππΈ = ∑(π¦ − π¦Μ)2 becomes smaller and π 2 = 1 − πππΈ becomes larger. An alternative measure devised to address this problem is the adjusted π 2 : π 2 = 1 − 1 πππΈ ⁄(π − π) πππ ⁄(π − 1) For our example, RA2 ο½ 1 ο 1718.943 72 = 0.4329 3115.482 74 This figure is shown in the computer regression output right below the regular π 2 . Note that π 2 = 1 − πππΈ⁄πππ. Divide the numerator and denominator of the quotient on the right-hand-side by their respective degrees of freedom. Thus, 1 π π΄2 = 1 − πππΈ⁄(π − π) πππ⁄(π − 1) To show why π π΄2 does not increase with the increase in π (the number of independent variables), we can rewrite it as: π π΄2 = 1 − πππΈ π − 1 π−1 = 1 − (1 − π 2 ) πππ π − π π−π As π increases, the negative adjustment on the right-hand-side rises, reducing π π΄2 . Chapter 6—Multiple Regression 18 of 26 Appendix A Brief Introduction to Matrix Algebra 1. Introduction 1.1. The Algebra of Matrices 1.1.1. Addition and subtraction of matrices 1.1.1.1. Scalar Multiplication 1.1.1.2. Matrix Multiplication 1.1.2. Identity Matrices 1.1.3. Transpose of a Matrix 1.1.4. Inverse of a Matrix 1.1.4.1. How to Find the Inverse of a Matrix 1.1.4.1.1. The Determinant of a Matrix 1.1.4.1.2. Properties of Determinants 1.1.5. How to use the Inverse of Matrix A to find the solutions for an equation system 1. Introduction Matrix algebra enables us to: ο· write an equation system in a compact way, ο· develop a method to test the existence of a solution by evaluation of a “determinant”, and ο· devise a method to find the solution. For example, consider the following equation system with three variables π₯1 , π₯2 , and π₯3 : 6π₯1 + 3π₯2 + 1π₯3 = 22 1π₯1 + 4π₯2 − 2π₯3 = 12 4π₯1 − 1π₯2 + 5π₯3 = 10 This equation system can be written in the matrix format as follows: 6 [1 4 3 1 π₯1 22 4 −2] [π₯2 ] = [12] −1 5 π₯3 10 The lead matrix on the left hand side is the coefficient matrix, denoted by π΄. The lag matrix is the variable matrix π₯ and the matrix on the right hand side is the matrix of the constant terms, π. 6 π΄ = [1 4 3 4 −1 1 −2] 5 π₯1 π₯ π₯ = [ 2] π₯3 22 π = [12] 10 Thus, the short hand version of an equation system is: π΄π₯ = π Any given matrix can be defined by its dimension: The number of rows (π) and number of columns (π). The dimension of π΄ is 3 × 3, that of π₯ and π are both 3 × 1. 1.1. The Algebra of Matrices 1.1.1. Addition and subtraction of matrices Two matrices can be added if and only if they are conformable for addition. That is, they have the same dimension. For example: Chapter 6—Multiple Regression 19 of 26 π΄=[ 4 5 ] 6 12 6 πΆ=[ 12 6 3 10 ] 15 π΄+π΅ =[ 8 π·=[ 3 5 ] 11 πΆ−π· =[ π΅=[ 8 ] 19 4+6 6+3 6−8 12 − 3 5 + 10 10 ]=[ 12 + 15 9 −2 8−5 ]=[ 9 19 − 11 15 ] 27 3 ] 8 1.1.1.1. Scalar Multiplication When every element of a matrix is multiplied by a number (scalar), then we are performing a scalar multiplication. 4 4[ 6 16 5 ]=[ 24 12 20 ] 48 1.1.1.2. Matrix Multiplication To multiply two matrices, they must be conformable for multiplication. This requires that the number of columns of the lead matrix be equal to the number of rows of the lag matrix. For example, let dimension of π΄ be 3 × 2 and that of π΅, 2 × 1, then π΄ and π΅ are conformable for multiplication because π΄ has two columns and π΅ has 2 rows. The resulting product matrix will have a dimension of 3 × 1. A ο΄ B ο½ AB (3ο΄2) ( 2ο΄1) (3ο΄1) The following example shows how the multiplication rule applies to two matrices. π11 π΄ = [π21 π31 5 π΄ = [6 2 π12 π22 ] π΅ = [π11 ] π21 π32 4 1] 9 π11 π11 + π12 π21 π΄π΅ = [ π21 π11 + π22 π21 ] π31 ππ11 + π32 π21 5(7) + 4(5) 55 π΄π΅ = [6(7) + 1(5)] = [47] 59 2(7) + 9(5) 7 π΅=[ ] 5 Note that if π΅ is used as the lead matrix and π΄ as the lag matrix, the two are no longer conformable for multiplication. π΅: 2 × 1 and π΄: 3 × 2. The number of columns of the lead matrix is not the same as the number of rows of the lag matrix. Even if switching the lead and lag matrices preserved the conformability, still the resulting product matrix would not be the same. That is: π΄π΅ ≠ π΅π΄. Matrix multiplication is not commutative. Another example, 1 A =[ 3 ο¨2ο΄3ο© 2 1 −1 ] 4 4 AB = [ 6 ο¨2ο΄2 ο© −2 ] 16 −2 B =[ 4 ο¨3ο΄2 ο© 2 5 −3] 1 13 BA = [−5 ο¨3ο΄3ο© 5 1 5 5 22 −16] 2 Also note that we have used the matrix multiplication rule to write an equation system in the matrix format: 6 [1 4 3 1 π₯1 22 4 −2] [π₯2 ] = [12] −1 5 π₯3 10 Chapter 6—Multiple Regression → 6π₯1 + 3π₯2 + 1π₯3 = 22 1π₯1 + 4π₯2 − 2π₯3 = 12 4π₯1 − 1π₯2 + 5π₯3 = 10 20 of 26 A x ο½ d (3ο΄3) (3ο΄1) (3ο΄1) 1.1.2. Identity Matrices An identity matrix is a matrix with the same number of rows and columns (a square matrix) which has 1’s in its principal diagonal and 0’s everywhere else. The identity matrix is denoted by πΌ. The numeric subscript, if shown, indicates the dimension. πΌ2 = [ 1 πΌ3 = [0 0 1 0 ] 1 0 0 1 0 0 0] 1 Pre or post multiplying a matrix by an identity matrix leaves the matrix unchanged. π΄πΌ = πΌπ΄ = π΄ Example 2 π΄ = [3 1 4 2 4 6 3] 9 2 4 π΄πΌ = [3 2 1 4 6 1 3] [0 9 0 0 1 0 0 2 0] = [3 1 1 4 2 4 6 3] = π΄ 9 1 0 πΌπ΄ = [0 1 0 0 0 2 0] [3 1 1 4 2 4 6 2 3] = [3 9 1 4 2 4 6 3] = π΄ 9 1.1.3. Transpose of a Matrix A matrix is transposed by interchanging its rows and columns. The transpose of the matrix π΄ is denoted by π΄′. π11 π΄ = [π21 π31 π12 π22 ] π32 π11 π΄′ = [π 12 π21 π22 π31 π32 ] For example, 5 π΄ = [6 2 4 1] 9 5 π΄′ = [ 4 6 1 2 ] 9 1.1.4. Inverse of a Matrix The inverse of the square matrix π΄ (if it exists) is another matrix, denoted by π΄−1 , such that if π΄ is pre or post multiplied by π΄−1 , the resulting product is an identity matrix. π΄π΄−1 = π΄−1 π΄ = πΌ If π΄ does not have an inverse, then it is called a singular matrix. Otherwise it is a nonsingular matrix. Chapter 6—Multiple Regression 21 of 26 1.1.4.1. How to Find the Inverse of a Matrix Finding the inverse of a matrix is a complicated process. First we must understand several concepts required in determining the inverse of a matrix. 1.1.4.1.1. The Determinant of a Matrix The determinant of a matrix π΄ is a scalar quantity (a number) and is denoted by |π΄|. It is obtained by summing various products of the elements of π΄. For example, the determinant of a 2 × 2 matrix is defined to be: π |π΄| = |π11 21 π12 π22 | = π11 π22 − π12 π21 The determinant of a 3 × 3 matrix is obtained as follows: π11 |π΄| = |π21 π31 π12 π22 π32 π13 π π23 | = π11 | 22 π32 π33 π23 π21 π33 | − π12 |π31 π23 π21 π33 | + π13 |π31 π22 π32 | |π΄| = π11 π22 π33 − π11 π23 π32 − π12 π21 π33 + π12 π23 π31 + π13 π21 π32 − π13 π22 π31 In the latter case, each element in the top row is multiplied by a “sub determinant”. The first sub determinant π22 |π π23 π33 | 32 multiplied by π11 , is obtained by eliminating the first row and the first column. The sub determinant associated with π11 is called the minor of that element and is denoted by π11 . The minor of π11 , π11 , is obtained by eliminating the first row and second column, and so on. If the sum of the subscript of the element is odd, then the sign is negative. The calculation of the determinant of A then can be presented as: |π΄| = π11 π11 − π12 π12 + π13 π13 2 |4 7 1 5 8 3 5 6| = 2 | 8 9 4 6 |−| 7 9 6 4 | + 3| 9 7 5 | = 2(5 × 9 − 6 × 8) − (4 × 9 − 6 × 7) + 3(4 × 8 − 5 × 7) = −9 8 Here the determinant is obtained by expanding the first row. The same determinant can be found by expanding any other row or column. In the previous example, find the determinant by expanding the third column: |π΄| = π13 π13 − π23 π23 + π33 π33 2 |4 7 1 5 8 3 4 6| = 3 | 7 9 2 5 | −6| 7 8 1 2 | + 9| 8 4 1 | = 3(4 × 8 − 5 × 7) − 6(2 × 8 − 1 × 7) + 9(2 × 5 − 1 × 4) = −9 5 Now let’s introduce a related concept to the minor, called the cofactor. A cofactor, denoted by πππ , is a minor with a prescribed algebraic sign attached. If the sum of the two subscripts in the minor πππ is odd then the cofactor is negative: πππ = (−1)(π+π) πππ In short, the value of determinant |π΄| of order π can be found by the expansion of any row or any column as follows: Chapter 6—Multiple Regression 22 of 26 π |π΄| = ∑ πππ πππ [expansion by the πth row] π=1 π |π΄| = ∑ πππ πππ [expansion by the πth column] π=1 For example, for n = 3, π |π΄| = ∑ π1π π1π = π11 |π11 | + π12 |π12 | + π13 |π13 | Expansion by the first row: π=1 π Expansion by the first column: |π΄| = ∑ ππ1 ππ1 = π11 |π11 | + π21 |π21 | + π31 |π31 | π=1 1) 1.1.4.1.2. Properties of Determinants Transposing a matrix does not affect the value of the determinant: |π΄| = |π΄′| 1 |π΄| = | 3 2) 2 | = 1 × 4 − 2 × 3 = −2 4 2 | = 1 × 4 − 2 × 3 = −2 4 | 3 1 4 |= 3×2−4×1 =2 2 | 1×5 3 2×5 | = 5 × 4 − 10 × 3 = −10 4 The addition (subtraction) of a multiple of any row (or column) to another row (or column) will leave the value of the determinant unchanged. In the determinant| 1 | 3−3 5) 3 | = 1 × 4 − 2 × 3 = −2 4 Multiplying any one row (or any one column) by a scalar will change the value of the determinant by the multiple of that scalar. 1 | 3 4) 1 |π΄′| = | 2 The interchange of any two rows (or any two columns) will alter the sign, leaving the absolute value of the determinant unchanged. 1 | 3 3) 2 | = 1 × 4 − 2 × 3 = −2 4 1 3 2 1 |=| 4−6 0 2 |, multiply the first row by −3 and add to the second row: 4 2 | = −2 −2 If one row (or column) is a multiple of another row (or column), the value of the determinant is zero; the determinant will vanish. In other words, if two rows (or columns) are linearly dependent, the determinant vanishes. In the following example the second row is the first row multiplied by 4. 1 | 4 2 |=8−2×4 =0 8 A very important conclusion relating the existence of the determinant to the existence of a unique solution for an equation system: Chapter 6—Multiple Regression 23 of 26 Consider the equation system shown at beginning of the discussion of matrices: 6π₯1 + 3π₯2 + 1π₯3 = 22 1π₯1 + 4π₯2 − 2π₯3 = 12 4π₯1 − 1π₯2 + 5π₯3 = 10 The coefficient matrix of the equation system is 6 π΄ = [1 4 3 4 −1 1 −2] 5 This equation system has a unique solution because there is a non-vanishing determinant |π΄| = 52. The matrix A is a nonsingular matrix. Now consider the following equation system: 16π₯1 + 3π₯2 + 1π₯3 = 22 12π₯1 + 6π₯2 + 2π₯3 = 11 14π₯1 − 1π₯2 + 5π₯3 = 10 The determinant of the coefficient matrix is 6 |π΄| = |12 4 3 6 −1 1 2| = 6(6 × 5 + 2) − 3(12 × 5 − 2 × 4) + (−1 × 12 − 6 × 4) = 0 5 The determinant vanishes because rows 1 and 2 are linearly dependent: The second row is first row multiplied by 2. Thus the equation system will not have a unique solution. 6) The expansion of the determinant by alien cofactors (the cofactor of a “wrong” row or column) always yields a value of zero. Expand the following determinant by using the first row elements but the cofactors of the second row elements 6 3 π΄ = |1 4 4 −1 1 −2| 5 π21 = −[3 × 5 − (−1) × 1] = −16 π22 = (6 × 5 − 1 × 4) = 26 π23 = −[6 × (−1) − 3 × 4] = 18 π11 = 6 π12 = 3 π12 = 1 π11 π21 = −96 π12 π22 = 78 π13 π23 = 18 3 ∑ πππ πππ = π11 π21 + π12 π22 + π13 π23 = −96 + 78 + 18 = 0 π=1 This last property of the determinants finally leads us to the method to find the inverse of a determinant. Finding the inverse of A involves the following steps: π11 π΄ = [π21 π31 π12 π22 π32 π13 π23 ] π33 Chapter 6—Multiple Regression 24 of 26 1) Replace each element πππ of A by its cofactors πππ . π11 πΆ = [π21 π31 2) π12 π22 π32 π13 π23 ] π33 Find the transpose of πΆ. This transpose matrix, πΆ′, is called the ππ πππππ matrix of π΄ . π11 πΆ ′ ≡ adjoint π΄ = [π12 π13 3) π21 π22 π23 π31 π32 ] π33 Multiply π΄ by adjoint π΄ π11 π΄πΆ = [π21 π31 ′ π12 π22 π32 π13 π11 π23 ] [π12 π33 π13 π21 π22 π23 ∑π1π π1π π31 π32 ] = [∑π2π π1π π33 ∑π3 π1π ∑π1π π2π ∑π2π π2π ∑π3π π2π ∑π1π π3π ∑π2π π3π ] ∑π3π π3π Now note that the elements in principal diagonal of the product matrix simply provide the determinant of π΄. All other elements outside the principal diagonal are the expansions of the determinant by an alien cofactor. Thus, they are all zeros. ∑π1π π1π [∑π2π π1π ∑π3 π1π ∑π1π π2π ∑π2π π2π ∑π3π π2π ∑π1π π3π |π΄| ∑π2π π3π ] = [ 0 0 ∑π3π π3π 0 |π΄| 0 0 1 0 ] = |π΄| [0 |π΄| 0 0 1 0 1 0] = |π΄|πΌ 1 Thus, π΄πΆ ′ = |π΄|πΌ Now divide both sides of the equation by the determinant |π΄| π΄πΆ′ =πΌ |π΄| Pre multiply both sides by π΄−1 , π΄−1 π΄πΆ′ = π΄−1 πΌ |π΄| Since π΄−1 π΄ = πΌ, πΌπΆ ′ = πΆ′, and π΄−1 πΌ = π΄−1 , then: π΄−1 = πΆ′ |π΄| The inverse of matrix A is obtained by dividing the πππππππ‘ A by the determinant. 6 Find the inverse of π΄ = [1 4 3 4 −1 1 −2] 5 First find the |π΄| Chapter 6—Multiple Regression 25 of 26 |π΄| = 6 | 4 −1 −2 1 | − 3| 5 4 −2 1 |+| 5 4 4 | = 6(18) − 3(13) + (−17) = 52 −1 Next find the cofactor matrix 4 −2 | −1 5 3 1 πΆ = −| | −1 5 3 1 [ |4 −2| | −2 | 5 6 1 | | 4 5 6 1 −| | 1 −2 −| 1 4 1 4 6 −| 4 | 4 | −1 18 3 | = [−16 −1 −10 6 3 | |] 1 4 −13 26 13 −17 18] 21 Find adjoint π΄ 18 πΆ ′ = [−13 −17 π΄−1 = −16 26 18 −10 13] 21 18 1 ′ 1 πΆ = [−13 |π΄| 52 −17 −16 26 18 −10 13] 21 With each element in π΄−1 rounded to three decimal points, 0.346 π΄−1 = [−0.250 −0.327 −0.308 0.500 0.346 −0.192 0.250] 0.404 1.1.5. How to use the Inverse of Matrix A to find the solutions for an equation system Note that in the previous example the matrix A was the coefficient matrix of the equation system 6π₯1 + 3π₯2 + 1π₯3 = 22 1π₯1 + 4π₯2 − 2π₯3 = 12 4π₯1 − 1π₯2 + 5π₯3 = 10 6 [1 4 3 4 −1 1 π₯1 22 −2] [π₯2 ] = [12] 5 π₯3 10 π΄π₯ = π Now pre multiply both sides of the matrix notation of the equation system by π΄−1 : π΄−1 π΄ = π΄−1 π which results in π₯ = π΄−1 π π₯1 0.346 [π₯2 ] = [−0.250 π₯3 −0.327 −0.308 0.500 0.346 −0.192 22 2 0.250] [12] = [3] 0.404 10 1 Thus, π₯1 = 2, π₯2 = 3, and π₯3 = 1. Chapter 6—Multiple Regression 26 of 26