Multiple Regression with Two Independent Variables

advertisement
CHAPTER 6
Multiple Regression
1.
2.
3.
4.
5.
6.
7.
8.
The model
The Least Squares Method of Determining Sample Regression Coefficients
Using Matrix Algebra to Solve the System of Least Squares Normal Equations
Standard Error of Estimate
Sampling Properties of the Least Squares Estimators
5.1. The Variances and Covariances of the Least Squares Estimators
Inferences about the Population Regression Parameters
6.1. Interval Estimates for the Coefficients of the Regression
6.2. Confidence Interval for the Mean Value of y for Given Values of π‘₯𝑖
6.3. Prediction Interval for the Individual Value of y for Given Values of π‘₯𝑖
6.4. Hypothesis Testing for a Single Coefficient
6.4.1. Two-Tail Test of Significance
6.4.2. One-Tailed Tests
6.4.2.1. Example 1: Testing for Price Elasticity of Demand
6.4.2.2. Example 2: Testing for Effectiveness of Advertising
6.4.2.3. Example 3: Testing for a Linear Combination of Coefficients
Extension of the Regression Model
7.1. The Optimal Level of Advertising
7.2. Interaction Variables
7.3. Log-Linear Model
Measuring Goodness of Fit—𝑅2
8.1. Adjusted 𝑅2
1. The model
In multiple regression models we may include two or more independent variables. We will start with the
model that includes two independent variables π‘₯2 and π‘₯3 . The main features of and assumptions regarding
the simple regression model can be extended to multiple regression models. Here the dependent variable is
influenced by the variations in the independent variables through the three parameters 𝛽1 , 𝛽2 , and 𝛽3 . The
disturbance term u is still present, because the variations in y continue to be influenced by the random
component 𝑒. Thus,
𝑦𝑖 = 𝛽1 + 𝛽2 π‘₯2𝑖 + 𝛽3 π‘₯3𝑖 + 𝑒𝑖
Given that 𝐸(𝑒𝑖 ) = 0, the following holds (dropping the 𝑖 subscript),
𝐸(𝑦) = 𝛽1 + 𝛽2 π‘₯2 + 𝛽3 π‘₯3
The sample regression equation is
𝑦 = 𝑏1 + 𝑏2 π‘₯2 + 𝑏3 π‘₯3 + 𝑒
To estimate the parameters from the sample data we need formulas to determine values for the estimators of
the population parameters. The estimators are the sample regression coefficients 𝑏1 , 𝑏2 , and 𝑏3 .
Chapter 6—Multiple Regression
1 of 26
2. The Least Squares Method of Determining Sample Regression Coefficients
We want to find the values for 𝑏1 , 𝑏2 , and 𝑏3 such that the sum of the squared deviation of the observed 𝑦
from the fitted plane is minimized. The deviation takes on the same familiar form as in the simple regression
model
𝑒 = 𝑦 − 𝑦̂
where 𝑦̂ is the predicted value which lies on the regression plane.
𝑦̂ = 𝑏1 + 𝑏2 π‘₯2 + 𝑏3 π‘₯3
Substituting for 𝑦̂ in the residual equation above, squaring both sides, and then summing for all 𝑖, we have,
𝑒 = 𝑦 − 𝑏1 − 𝑏2 π‘₯2 − 𝑏3 π‘₯3
∑𝑒2 = ∑(𝑦 − 𝑏1 − 𝑏2 π‘₯2 − 𝑏3 π‘₯3 )2
Taking three partial derivatives, one for each coefficient, and then setting them equal to zero, we obtain three
normal equations.
πœ• ∑𝑒 2
= −2∑(𝑦 − 𝑏1 − 𝑏2 π‘₯2 − 𝑏3 π‘₯3 ) = 0
πœ•π‘1
πœ• ∑𝑒 2
= −2∑π‘₯2 (𝑦 − 𝑏1 − 𝑏2 π‘₯2 − 𝑏3 π‘₯3 ) = 0
πœ•π‘2
πœ• ∑𝑒 2
= −2∑π‘₯3 (𝑦 − 𝑏1 − 𝑏2 π‘₯2 − 𝑏3 π‘₯3 ) = 0
πœ•π‘3
The normal equations are:
∑𝑦 − 𝑛𝑏1 − 𝑏2 ∑π‘₯2 − 𝑏3 ∑π‘₯3 = 0
∑π‘₯2 𝑦 − 𝑏1 ∑π‘₯2 − 𝑏2 ∑π‘₯22 − 𝑏3 ∑π‘₯2 π‘₯3 = 0
∑π‘₯3 𝑦 − 𝑏1 ∑π‘₯3 − 𝑏2 ∑π‘₯2 π‘₯3 − 𝑏3 ∑π‘₯23 = 0
To find the solutions for the three 𝑏’s, we can write the normal equations in the following way, by taking the
terms not involving the 𝑏𝑗 to the right-hand-side.
𝑛𝑏1 + (∑π‘₯2 )𝑏2 +
(∑π‘₯3 )𝑏3 = ∑𝑦
(∑π‘₯2 )𝑏1 + (∑π‘₯22 )𝑏2 + (∑π‘₯2 π‘₯3 )𝑏3 = ∑π‘₯2 𝑦
(∑π‘₯3 )𝑏1 + (∑π‘₯2 π‘₯3 )𝑏2 + (∑π‘₯32 )𝑏3 = ∑π‘₯3 𝑦
Here we have a system of three equations with three unknowns, 𝑏1 , 𝑏2 , and 𝑏3 . Now we need to develop a
method to find the solution for these unknowns. For this, a brief introduction to matrix algebra is called for.
(See the appendix at the end of this chapter.)
Chapter 6—Multiple Regression
2 of 26
3. Using Matrix Algebra to Solve the System of Least Squares Normal
Equations
We can write the system of normal equations in the matrix format:
𝑛
[∑π‘₯2
∑π‘₯3
∑π‘₯2
∑π‘₯3
∑𝑦
𝑏1
2
∑π‘₯2
∑π‘₯2 π‘₯3 ] [𝑏2 ] = [∑π‘₯2 𝑦]
𝑏3
∑π‘₯2 π‘₯3 ∑π‘₯32
∑π‘₯3 𝑦
with the following shorthand notation
𝐗𝐛 = 𝐜
where,
𝑛
𝐗 = [∑π‘₯2
∑π‘₯3
∑π‘₯2
∑π‘₯3
2
∑π‘₯2
∑π‘₯2 π‘₯3 ]
∑π‘₯2 π‘₯3 ∑π‘₯32
𝑏1
𝐛 = [𝑏2 ]
𝑏3
∑𝑦
𝐜 = [∑π‘₯2 𝑦 ]
∑π‘₯3 𝑦
Solutions for 𝑏𝑗 are obtained by finding the product of the inverse matrix of 𝐗, 𝐗 −𝟏 , times 𝐜.
𝐛 = 𝐗 −𝟏 𝐜
Example:
We want to obtain a regression of π‘šπ‘œπ‘›π‘‘β„Žπ‘™π‘¦ π‘ π‘Žπ‘™π‘’π‘  (𝑦), in $1,000's, on π‘π‘Ÿπ‘–π‘π‘’ (π‘₯2 ), in dollars, and
π‘Žπ‘‘π‘£π‘’π‘Ÿπ‘‘π‘–π‘ π‘–π‘›π‘” (π‘₯3 ), in $1,000's, of a fast food restaurant. The data is contained in the Excel file “CH6 DATA” in
worksheet tab “burger”.
Use Excel to compute the values for the elements of the matrix 𝐗 and 𝐀:
𝑛
[∑π‘₯2
∑π‘₯3
∑π‘₯2
∑π‘₯3
75
426.5
138.3
∑π‘₯22
∑π‘₯2 π‘₯3 ] = [426.5 2445.707 787.381]
138.3
787.381
306.21
∑π‘₯2 π‘₯3 ∑π‘₯32
∑𝑦
5803.1
[∑π‘₯2 𝑦] = [32847.7]
10789.6
∑π‘₯3 𝑦
Thus, our system of normal equations can be written as:
75
426.5
[426.5 2445.707
138.3
787.381
138.3 𝑏1
5803.1
787.381] [𝑏2 ] = [32847.7]
306.21 𝑏3
10789.6
Since 𝑏 = 𝑋 −1 𝑐, we need to find 𝑋 −1 . Now that we know what the inverse of a matrix means and how to find
it, we have Excel to do the hard work for us. In Excel use the array type function =πŒπˆππ•π„π‘π’π„(). You must
first highlight a block of 3 × 3 cells, then call up the function =MINVERSE(). When it asks for “array”, you
must “lasso” the block of cells containing the elements of the 𝑋 matrix and then press Ctrl-Shift-Enter keys
together. The result is the following:
Chapter 6—Multiple Regression
3 of 26
1.689828
-0.28462
-0.03135
-0.28462
0.050314
-0.00083
-0.03135
-0.00083
0.019551
which is the matrix 𝑋 −1 . Then pre multiplying π‘˜ by 𝑋 −1 , we have
b₁
bβ‚‚
b₃
1.689828
-0.28462
-0.03135
=
-0.28462
0.050314
-0.00083
-0.03135
-0.00083
0.019551
×
5803.1
32847.7
10789.6
=
118.9136
-7.90785
1.862584
𝑦̂ = 118.9136 − 7.90785π‘₯2 + 1.862584π‘₯3
Μ‚ = 118.914 − 7.908𝑃𝑅𝐼𝐢𝐸 + 1.863𝐴𝐷𝑉𝐸𝑅𝑇
𝑆𝐴𝐿𝐸𝑆
𝑏2 = −7.908 implies that for each dollar increase in price (advertising held constant) monthly sales would fall
by $7,908. Or, a 10¢ increase in price would result in a decrease in monthly sales of $790.8. 𝑏3 = $1.863
implies that (holding price constant) for each additional $1,000 of advertising sales would increase by $1,863.
The following is the Excel regression output showing the estimated coefficients.
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.669521
0.448258
0.432932
4.886124
75
ANOVA
df
2
72
74
SS
1396.5389
1718.9429
3115.4819
MS
698.26946
23.874207
F
29.247859
Significance F
5.04086E-10
Coefficients
118.91361
-7.90785
1.86258
Standard Error
6.35164
1.09599
0.68320
t Stat
18.72172
-7.21524
2.72628
P-value
0.00000
0.00000
0.00804
Lower 95%
106.25185
-10.09268
0.50066
Regression
Residual
Total
Intercept
PRICE
ADVERT
Upper 95%
131.57537
-5.72303
3.22451
4. Standard Error of Estimate
In simple regression we learned that the disturbance term 𝑒 for each given π‘₯ is normally distributed about the
population regression line, with an expected value E(𝑒) = 0, and variance var(𝑒). The unbiased estimator of
var(𝑒), var(𝑒), was obtained by using the least squares residuals 𝑒 = 𝑦 − 𝑦̂.
var(𝑒) =
∑𝑒 2
𝑛−2
=
∑(𝑦 − 𝑦̂)2
𝑛−2
The square root of the estimated variance was called the standard error of estimate, se(𝑒). Here with two
independent variables, the least squares residuals take the same form:
𝑒 = 𝑦 − 𝑦̂ = 𝑦 − (𝑏1 + 𝑏2 π‘₯2 + 𝑏3 π‘₯3 )
Chapter 6—Multiple Regression
4 of 26
Thus the estimated variance of the disturbance or error term is:
var(𝑒) =
∑𝑒 2
𝑛−π‘˜
=
∑(𝑦 − 𝑦̂)2
𝑛−π‘˜
where π‘˜ is number of parameters of the regression being estimated. Here π‘˜ = 3, for estimating 𝛽1 , 𝛽2 , and 𝛽3 .
Given the estimated regression equation in the last example,
𝑦̂ = 118.9136 − 7.90785π‘₯2 + 1.862584π‘₯3
the predicted value of 𝑆𝐴𝐿𝐸𝑆 (𝑦̂), for price π‘₯2 = $6.20 and advertising π‘₯3 = $3.0 (observation #10), is:
𝑦̂ = 118.9136 − 7.90785(6.2) + 1.862584(3) = $75.473 (thousand)
The observed value of sales is 𝑦10 = $76.4. Thus, the residual is calculated as 𝑒 = 76.4 − 75.473 = 0.927.
Computing such residuals for all 75 observations and finding the sum of squared residuals, the estimated
variance and the standard error of estimate are, respectively
var(𝑒) =
∑(𝑦 − 𝑦̂)2
𝑛−π‘˜
=
1718.943
= 23.874
72
se(𝑒) = 4.886
5. Sampling Properties of the Least Squares Estimators
As they were in simple regression, the least squares estimators, or the coefficients of the estimated
regression, are sample statistics obtained from a random sample. Thus they are all random variables each
with its own expected value and variance. Again, like the simple regression coefficients, these estimators are
BLUE.
ο‚·
ο‚·
ο‚·
𝑏1 , 𝑏2 , and 𝑏3 are each a linear function of the dependent variable 𝑦.
The coefficients are unbiased estimators: E(𝑏1 ) = 𝛽1 , E(𝑏2 ) = 𝛽2 , and E(𝑏3 ) = 𝛽3
var(𝑏1 ), var(𝑏2 ), and var(𝑏3 ) are all minimum variances.
Once again, the linearity assumption is important for inferences about the parameters of the regression. If
the disturbance terms 𝑒 , and correspondingly the 𝑦 values, are normally distributed about the regression
plane, then linearly related coefficients are also normally distributed. If the disturbance terms are not
normal, then according to the central limit theorem, the coefficients will be approximately normal for large
samples.
5.1.
The Variances and Covariances of the Least Squares Estimators
Since the least squares estimators are random variables, their variances show how closely or widely
dispersed they tend to scatter around their respective population parameters. In a multiple regression model
with three parameters 𝛽1 , 𝛽2 , and 𝛽3 , the variance of each estimator 𝑏1 , 𝑏2 , and 𝑏3 is estimated from the
sample data. We can determine the variances and covariances of the least square estimators using
elementary matrix algebra. Using the inverse matrix X −1 determined above, we can generate a covariance
matrix as follows:
Chapter 6—Multiple Regression
5 of 26
var(𝑏1 )
[covar(𝑏1 , 𝑏2 )
covar(𝑏1 , 𝑏3 )
covar(𝑏1 , 𝑏2 )
var(𝑏2 )
covar(𝑏2 , 𝑏3 )
var(𝑒)𝑋 −1 =
23.87421
×
covar(𝑏1 , 𝑏3 )
covar(𝑏2 , 𝑏3 )] = var(𝑒)𝑋 −1
var(𝑏3 )
1.6898
-0.2846
-0.0313
-0.2846
0.0503
-0.0008
-0.0313
-0.0008
0.0196
=
40.3433
-6.7951
-0.7484
-6.7951
1.2012
-0.0197
-0.7484
-0.0197
0.4668
From the covariance matrix on the extreme right hand side we have the following variances and covariances:
var(𝑏1 ) = 40.3433
var(𝑏2 ) = 1.2012
var(𝑏3 ) = 0.4668
covar(𝑏1 , 𝑏2 ) = −6.7951
covar(𝑏1 , 𝑏3 ) = −0.7484
covar(𝑏2 , 𝑏3 ) = −0.0197
6. Inferences about the Population Regression Parameters
To build an interval estimate or perform a hypothesis test about the 𝛽’s in the population regression, the least
square estimators of these parameters must be normally distributed. The normality is established if the
disturbance terms and, correspondingly, the 𝑦 values for the various values of the explanatory variables are
normally distributed. And, as long as the least squares estimators are linear functions of the 𝑦, then the
estimator are also normally distributed.
6.1.
Interval Estimates for the Coefficients of the Regression
Following the same reasoning as in simple regression, the interval estimates for each 𝛽 parameter take the
following form, for a given confidence level 1 − 𝛼 (confidence level):
For 𝛽1 : 𝐿, π‘ˆ = 𝑏1 ± 𝑑α⁄2,𝑑𝑓 se(𝑏1 )
For 𝛽2 : 𝐿, π‘ˆ = 𝑏2 ± 𝑑α⁄2,𝑑𝑓 se(𝑏2 )
For 𝛽3 : 𝐿, π‘ˆ = 𝑏3 ± 𝑑α⁄2,𝑑𝑓 se(𝑏3 )
The standard error for each regression coefficient can be obtained from the variance-covariance matrix. The
95% interval estimates for the slope coefficients in the above example, based on the following information,
are given below.
𝑏1 = 18.914
𝑏2 = −7.908
𝑏3 = 1.863
𝑑0.025,(72) = 1.993
se(𝑏1 ) = 6.352
se(𝑏2 ) = 1.096
se(𝑏3 ) = 0.683
The 95% interval estimate for the response of 𝑆𝐴𝐿𝐸𝑆 to a 𝑃𝑅𝐼𝐢𝐸 change, 𝛽2 , is
𝐿, π‘ˆ = 𝑏2 ± 𝑑0.025,(𝑛−3) se(𝑏2 )
𝐿, π‘ˆ = −7.908 ± (1.993)(1.096) = [−10.093, −5.723]
Chapter 6—Multiple Regression
6 of 26
This implies that a $1 decrease in price will result in an increase in revenue somewhere between $5,723 and
$10,093.
The 95% interval estimate for the response of 𝑆𝐴𝐿𝐸𝑆 to a change in 𝐴𝐷𝑉𝐸𝑅𝑇, 𝛽3 , is
𝐿, π‘ˆ = 𝑏3 ± 𝑑0.025,(𝑛−3) se(𝑏3 )
𝐿, π‘ˆ = 1.863 ± (1.993)(0.683) = [0.501,3.225]
An increase in advertising expenditure of $1,000 would increase sales between $501 and $3,225. This is a
relatively wide interval and, therefore, does not convey good information about the range of responses in
sales to advertising. This wide interval arises from the fact that the sampling variability of the 𝑏3 coefficient is
large. One way to reduce this variability is to increase the sample size. But this in many cases is impractical
because of the limitations on data availability. An alternative way is to introduce some kind of non-sample
information on the coefficient, to be explained later.
6.2. Interval Estimate for the Mean Value of y for Given Values of 𝒙𝒋
The mean value of 𝑦 for given values of the independent variables is 𝑦̂0 , the predicted value of 𝑦 for the given
values xβ‚€β‚‚, x₀₃, ... The confidence interval for the mean value of y is therefore,
𝐿, π‘ˆ = 𝑦̂0 ± 𝑑α⁄2,(𝑛−π‘˜) se(𝑦̂0 )
To determine se(𝑦̂0 ), start with var(𝑦̂0 ). For a model with two independent variables,
𝑦̂0 = 𝑏1 + 𝑏2 π‘₯02 + 𝑏3 π‘₯03
var(𝑦̂0 ) = var(𝑏1 + 𝑏2 π‘₯02 + 𝑏3 π‘₯03 )
2
2
var(𝑦̂0 ) = var(𝑏1 ) + π‘₯02
var(𝑏2 ) + π‘₯03
var(𝑏3 ) + 2π‘₯02 covar(𝑏1 , 𝑏2 ) + 2π‘₯03 covar(𝑏1 , 𝑏3 ) + 2π‘₯02 π‘₯03 covar(𝑏2 , 𝑏3 )
To build a 95% interval for the mean value of 𝑆𝐴𝐿𝐸𝑆 in the model under consideration,
Μ‚ = 118.914 − 7.908𝑃𝑅𝐼𝐢𝐸 + 1.863𝐴𝐷𝑉𝐸𝑅𝑇
𝑆𝐴𝐿𝐸𝑆
Let π‘₯02 = 𝑃𝑅𝐼𝐢𝐸0 = 5 and π‘₯03 = 𝐴𝐷𝑉𝐸𝑅𝑇0 = 3. Then 𝑦̂0 = 84.96.
𝑦̂0 = 118.914 − 7.908(5) + 1.863(3) = 84.96
From the covariance matrix,
40.3433
-6.7951
-0.7484
-6.7951
1.2012
-0.0197
-0.7484
-0.0197
0.4668
we have,
var(𝑦̂0 ) = 40.3433 + (52 )(1.2012) + (32 )(0.4668) + 2(5)(−6.7951) + 2(3)(−0.7484) + 2(5)(3)(−0.0197)
var(𝑦̂0 ) = 1.5407
se(𝑦̂0 ) = √1.5407 = 1.241
Chapter 6—Multiple Regression
7 of 26
𝐿, π‘ˆ = 84.96 ± (1.993)(1.241) = [82.49,87.44]
6.3. Prediction Interval for the Individual Value of π’š for a Given 𝒙
For the given value of π‘₯𝑗 , π‘₯0𝑗 , the difference between an individual value 𝑦0 and the mean value 𝑦̂0 is the error
term,
𝑦0 − 𝑦̂0 = 𝑒
Therefore,
var(𝑦0 ) = var(𝑒) + var(𝑦̂0 )
In the above example,
var(𝑦0 ) = 23.8742 + 1.5407 = 25.4149
se(𝑦0 ) = 5.0413
𝐿, π‘ˆ = 84.96 ± (1.993)(5.0413) = [74.91,95.01]
6.4. Interval Estimation for a Linear Combination of Coefficients
In the above example, suppose we want to increase advertising expenditure by $800 and reduce the price by
$0.40. Build a 95% interval estimate for the change in the mean (expected) sales. The change in the mean
sales is:
βˆ†π‘¦Μ‚ = 𝑦̂1 − 𝑦̂0
βˆ†π‘¦Μ‚ = [𝑏1 + 𝑏2 (π‘₯02 − 0.4) + 𝑏3 (π‘₯03 + 0.8)] − [𝑏1 + 𝑏2 π‘₯02 + 𝑏3 π‘₯03 ]
βˆ†π‘¦Μ‚ = −0.4𝑏2 + 0.8𝑏3
The interval estimate is:
𝐿, π‘ˆ = βˆ†π‘¦Μ‚ ± 𝑑0.025,𝑑𝑓 se(βˆ†π‘¦Μ‚)
𝐿, π‘ˆ = (−0.4𝑏2 + 0.8𝑏3 ) ± 𝑑0.025,𝑑𝑓 se(−0.4𝑏2 + 0.8𝑏3 )
var(−0.4𝑏2 + 0.8𝑏3 ) = 0.42 var(𝑏2 ) + 0.82 var(𝑏3 ) − 2 × 0.4 × 0.8 × cov(𝑏2 , 𝑏3 )
var(−0.4𝑏2 + 0.8𝑏3 ) = 0.42 (1.2012) + 0.82 (0.4668) − 2 × 0.4 × 0.8(−0.0197)
var(−0.4𝑏2 + 0.8𝑏3 ) = 0.5036
se(−0.4𝑏2 + 0.8𝑏3 ) = 0.7096
βˆ†π‘¦Μ‚ = −0.4𝑏2 + 0.8𝑏3 = −0.4(−7.908) + 0.8(1.8626) = 4.653
𝐿, π‘ˆ = 4.653 ± 1.993 × 0.7096 = [3.239,6.068]
We are 95% confident the mean increase in sales will be between $3,239 and $6,068.
Chapter 6—Multiple Regression
8 of 26
6.5. Hypothesis Testing for a Single Coefficient
6.5.1. Two-Tail Test of Significance
Generally, a computer output such as the Excel regression output provides the 𝑑 test statistic for the test of
null hypothesis 𝐻0 : 𝛽𝑗 = 0 against 𝐻1 : 𝛽𝑗 ≠ 0. If the null hypothesis is not rejected, then we conclude that the
independent variable π‘₯𝑗 does not influence 𝑦. The test statistic for such a test is
𝑇𝑆 = |𝑑| =
𝑏𝑗
se(𝑏𝑗 )
And, since this is a two-tail test, the critical value, for a level of significance α, is 𝐢𝑉 = 𝑑α⁄2,(𝑛−π‘˜) . For example,
𝐻0 : 𝛽2 = 0
𝐻1 : 𝛽2 ≠ 0
𝑇𝑆 = 7.908⁄1.096 = 7.215, and 𝐢𝑉 = 𝑑0.025,(72) = 1.993. Since 𝑇𝑆 > 𝐢𝑉, we reject the null hypothesis and
conclude that revenue is related to price. Also note that the probability value for the test statistic is 2 ×
P(𝑑 > 7.215) ≈ 0. Since this is less than α = 0.05, we reject the null. The test for 𝛽3 is as follows:
𝐻0 : 𝛽3 = 0
𝐻1 : 𝛽3 ≠ 0
𝑇𝑆 = 1.863⁄0.683 = 2.726 and 𝐢𝑉 = 1.993. Since 𝑇𝑆 > 𝐢𝑉, we reject the null hypothesis and conclude that
revenue is related to advertising expenditure. The probability value for the test statistic is 2 × P(𝑑 > 2.726) =
0.0080.
All these figures are presented in the Excel regression output shown above.
6.5.2. One-Tailed Tests
We will use the current example to provide an example of one-tailed test of hypothesis in regression.
6.5.2.1. Example 1: Testing for Price Elasticity of Demand
Suppose we are interested to test the hypothesis that the demand is price inelastic against the alternative
hypothesis that demand is price elastic. According to the total-revenue test of elasticity, if demand is
inelastic, then a decrease in price would lead to a decrease in revenue. If demand is elastic, a decrease in
price will lead to an increase in revenue.
𝐻0 : 𝛽2 ≥ 0
(When demand is price inelastic, price and revenue change in the same direction.)
𝐻1 : 𝛽2 < 0
(When demand is price elastic, price and revenue change in the opposite direction.)
Note that here we are giving the benefit of the doubt to “inelasticity”. If demand is elastic, we want to have
strong evidence of that. Also note that the regression result already provides some evidence that demand is
elastic (𝑏2 = −7.908 < 0). But is this evidence significant?
The test statistic is
Chapter 6—Multiple Regression
9 of 26
𝑇𝑆 =
𝑏2 − (𝛽2 )0
se(𝑏2 )
Since by the null hypothesis (𝛽2 )0 is equal to (or greater than) zero, the test statistic simplifies to 𝑇𝑆 =
𝑏2 ⁄se(𝑏2 ). However, since this is a one-tailed test, the critical value is 𝐢𝑉 = 𝑑α,(𝑛−π‘˜) .
𝑇𝑆 = 7.908⁄1.096 = 7.215 > 𝐢𝑉 = 𝑑0.05,(72) = 1.666
This leads us to reject the null hypothesis and conclude that there is strong evidence that 𝛽2 is negative, and
hence demand is elastic.
6.5.2.2.
Example 2: Testing for Effectiveness of Advertising
We can also perform a test for the effectiveness of advertising. Does an increase in advertising expenditure
bring an increase in total revenue above that spent on advertising? That is, is 𝑑𝑦⁄𝑑π‘₯3 > 1? Again, the sample
regression provides that 𝑏3 = 1.863 > 1. But we want to prove if this evidence is significant. Thus,
𝐻0 : 𝛽3 ≤ 1
𝐻1 : 𝛽3 > 1
The test statistic is,
𝑑=
𝑏3 − (𝛽3 )0 1.862 − 1
=
= 1.263
se(𝑏2 )
0.683
Since 𝑑 = 1.263 < 𝐢𝑉 = 𝑑0.05,(72) = 1.666, do not reject the null hypothesis. The p-value for the test is P(𝑑 >
1.263) = 0.1053, which exceeds 𝛼 = 0.05. The test does not prove that advertising is effective. That is, 𝛽3 is
not significantly greater than 1.
6.5.2.3.
Example 3: Testing for a Linear Combination of Coefficients
Test the hypothesis that dropping price by $0.20 will be more effective for increasing sales revenue than
increasing advertising expenditure by $500, that is: −0.20𝛽2 > 0.5𝛽3
Note that the regression model provides that −0.2𝑏2 = 1.582 > 0.5𝑏3 = 0.931. The test is to prove that this
is significant. Therefore, the null and alternative hypotheses are:
𝐻0 : − 0.2𝛽2 − 0.5𝛽3 ≤ 0
𝐻1 : − 0.2𝛽2 − 0.5𝛽3 > 0
The test statistic is,
𝑑=
−0.2𝑏2 − 0.5𝑏3 − (−0.2𝛽2 − 0.5𝛽3 )0
se(−0.2𝑏2 − 0.5𝑏3 )
𝑇𝑆 =
−0.2𝑏2 − 0.5𝑏3
se(−0.2𝑏2 − 0.5𝑏2 )
The problem here is to find the standard error of the linear combination of the two coefficients in the
denominator. For that, first determine var(−0.20𝑏2 − 0.5𝑏2 ):
var(−0.2𝑏2 − 0.5𝑏3 ) = (−0.2)2 var(𝑏2 ) + (−0.5)2 var(𝑏3 ) + 2(−0.2)(−0.5)covar(𝑏2 , 𝑏3 )
Chapter 6—Multiple Regression
10 of 26
Form the covariance matrix above, we obtain the following:
var(−0.2𝑏2 − 0.5𝑏3 ) = (0.2)2 (1.2012) + (0.5)2 (0.4668) + 2(0.2)(0.5)(−0.0197)
var(−0.2𝑏2 − 0.5𝑏3 ) = 0.1608
se(−0.2𝑏2 − 0.5𝑏3 ) = 0.4010
Then,
𝑑=
(−0.2)(−7.9079) − (0.5)(1.8626)
= 1.622
0.4010
𝐢𝑉 = 𝑑0.05,(72) = 1.666
Since 𝑑 = 1.622 < 𝐢𝑉 = 𝑑0.05,(72) = 1.666, do not reject the null hypothesis. The p-value for the test is P(𝑑 >
1.622) = 0.055, which exceeds 𝛼 = 0.05. The test does not prove that −0.20𝛽2 > 500𝛽3 .
7.
Extension of the Regression Model
We can extend a regression model by altering the existing independent variables and treat them as new
variables. Let’s use the current example of sales-price and advertising expenditure model. We want to
address the issue that sales does not rise indefinitely and at a constant rate in response to increases in
advertising expenditure. As expenditure on advertising rises revenues may rise at a decreasing (rather than a
constant) rate, implying diminishing returns to advertising expenditure. To take into account the impact of
the diminishing returns on advertising is to include the squared value of advertising, π‘₯32 in the model.
𝑦 = 𝛽1 + 𝛽2 π‘₯2 + 𝛽3 π‘₯3 + 𝛽4 π‘₯32 + 𝑒
E(𝑦) = 𝛽1 + 𝛽2 π‘₯2 + 𝛽3 π‘₯3 + 𝛽4 π‘₯32
Thus,
πœ•E(𝑦)
= 𝛽3 + 2𝛽4 π‘₯3
πœ•π‘₯3
We expect revenues to increase with each additional unit increase in advertising expenditure. Therefore,
𝛽3 > 0. We also expect the rate of increase in revenues to decrease with each additional unit increase in
advertising. Therefore, 𝛽4 < 0.
Once we point out the characteristics of the extended model, we can treat π‘₯32 as a new variable π‘₯4 . Using the
same data, the Excel regression output is show below.
Chapter 6—Multiple Regression
11 of 26
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.7129061
R Square
0.5082352
Adjusted R Square
0.4874564
Standard Error
4.645283
Observations
75
ANOVA
df
SS
Regression
MS
F
3
1583.397408
527.79914
Residual
71
1532.084459
21.578654
Total
74
3115.481867
Coefficients
Intercept
Standard Error
t Stat
Significance F
24.459316
P-value
5.59996E-11
Lower 95%
Upper 95%
109.719
6.799
16.137
1.87E-25
96.162
PRICE
-7.640
1.046
-7.304
3.236E-10
-9.726
123.276
-5.554
ADVERT
12.151
3.556
3.417
0.0010516
5.060
19.242
ADVERT²
-2.768
0.941
-2.943
0.0043927
-4.644
-0.892
From the regression table, the coefficient of 𝐴𝐷𝑉𝐸𝑅𝑇 2 , 𝑏4 = −2.768. It has the anticipated sign, and it is also
significantly different from zero (𝑝­value = 0.0044 < 0.05). There is diminishing returns to advertising.
To interpret the role of the coefficients 𝑏3 = 12.151 and 𝑏4 = −2.768, consider the following table where 𝑦̂
(𝑆𝐴𝐿𝐸𝑆) is computed for a fixed value of π‘₯2 (𝑃𝑅𝐼𝐢𝐸) = $5 and for various values of π‘₯3 (𝐴𝐷𝑉𝐸𝑅𝑇). Starting
from (π‘₯3 )0 = $1.8 and increasing advertising by the increment of βˆ†π‘₯3 = 0.10 to , π‘₯3 = 1.9, the predicted sales
increases by βˆ†π‘¦Μ‚ = $0.19. As advertising expenditure is increased by the same increment, the increment in
predicted sales decreases to 0.14 and then to 0.08.
(π‘₯3 )0
1.8
1.9
2.0
2.1
7.1.
𝑦̂
84.42
84.61
84.75
84.83
βˆ†π‘¦Μ‚
0.19
0.14
0.08
The Optimal Level of Advertising
What is the optimum level of advertising? Optimality in economics always implies marginal benefit of an
action be equal to its marginal cost. If marginal benefit exceeds the marginal cost, the action should be taken.
If marginal benefit is less than the marginal cost, the action should be curtailed. The optimum is, therefore,
where the two are equal.
The marginal benefit of advertising is the contribution of each additional one thousand dollar ($1) of
advertising expenditure to total revenue. Form the model,
𝑦 = 𝛽1 + 𝛽2 π‘₯2 + 𝛽3 π‘₯3 + 𝛽4 π‘₯32 + 𝑒
the marginal benefit of advertising is:
Chapter 6—Multiple Regression
12 of 26
πœ•E(𝑦)
= 𝛽3 + 2𝛽4 π‘₯3
πœ•π‘₯3
Ignoring the marginal cost of additional sales, marginal cost is each additional $1 (thousand) spent on
advertising. Thus, the optimality requirement is 𝑀𝐡 = 𝑀𝐢:
𝛽3 + 2𝛽4 π‘₯3 = $1
Using the estimated least squares coefficients, we thus have:
12.151 + 2(−2.768)π‘₯3 = 1
Yielding, π‘₯3 = $2.014 thousand.
We want to build an interval estimate for the optimal level of advertising:
(π‘₯3 )π‘œ =
1 − 𝛽3
2𝛽4
Note that the sample statistic
1−𝑏3
2𝑏4
is obtained as a nonlinear combination of the two coefficients 𝑏3 and 𝑏4 .
Therefore, we cannot use the same formula that we used to find the variance of the linear combination of the
two coefficients. The (approximate) variance of the nonlinear combination of two random variables is
obtained using the delta method. Let
𝑑=
1 − 𝑏3
2𝑏4
Then,
πœ•π‘‘ 2
πœ•π‘‘ 2
πœ•π‘‘
πœ•π‘‘
var(𝑑) = (
) var(𝑏3 ) + (
) var(𝑏4 ) + 2 (
)(
) covar(𝑏3 , 𝑏4 )
πœ•π‘3
πœ•π‘4
πœ•π‘3 πœ•π‘4
Using partial derivatives,
πœ•π‘‘
1
=−
πœ•π‘3
2𝑏4
and
πœ•π‘‘
1 − 𝑏3
=−
πœ•π‘4
2𝑏42
Hence,
2
1 2
1 − 𝑏3
1
1 − 𝑏3
var(𝑑) = (−
) var(𝑏3 ) + (−
) var(𝑏4 ) + 2 (−
) (−
) covar(𝑏3 , 𝑏4 )
2𝑏4
2𝑏4
2𝑏42
2𝑏42
We can obtain var(𝑏3 ) and var(𝑏4 ) by simply squaring the standard errors from the regression output.
Unfortunately, the Excel regression output does not provide the covariance value. We can, however, still use
Excel to compute covar(𝑏3 , 𝑏4 ). If you recall, using matrix algebra we can determine the variance-covariance
matrix: var(𝑒)𝑋 −1 . Adding π‘₯32 to the model, we have now have three independent variables. The solution for
this problem is simple because the 𝑋 matrix can be expanded to incorporate any number of independent
variables. For a 3-variable model we have,
Chapter 6—Multiple Regression
13 of 26
∑π‘₯2
∑π‘₯2 ∑π‘₯22
𝐗=
∑π‘₯3 ∑π‘₯2 π‘₯3
[∑π‘₯4 ∑π‘₯2 π‘₯4
𝑛
∑π‘₯3
∑π‘₯2 π‘₯3
∑π‘₯32
∑π‘₯3 π‘₯4
∑π‘₯4
∑π‘₯2 π‘₯4
∑π‘₯3 π‘₯4
∑π‘₯42 ]
Using Excel we can compute these quantities as the elements of the X matrix:
75.0
426.5
138.3
306.2
426.5
2445.7
787.4
1746.5
138.3
787.4
306.2
755.0
306.2
1746.5
755.0
1982.6
Then determine the inverse matrix 𝑋 −1 :
2.1423
-0.2978
-0.5376
0.1362
-0.2978
0.0507
0.0139
-0.0040
-0.5376
0.0139
0.5861
-0.1524
0.1362
-0.0040
-0.1524
0.0410
and find the variance-covariance matrix by finding the product
var(𝑒)𝑋 −1 = 21.579𝑋 −1
46.227
-6.426
-11.601
2.939
-6.426
1.094
0.300
-0.086
-11.601
0.300
12.646
-3.289
2.939
-0.086
-3.289
0.885
Thus,
2
1
1 − 12.151 2
1
1 − 12.151
var(𝑑) = (
) (12.646) + (
) (0.885) + 2 (−
) (−
) (−3.289)
2
2 × 2.768
2 × 2.768
2 × 2.768
2 × 2.7682
var(𝑑) = 0.41265 + .46857 + 2(1.18064)(0.72773)(−3.28875) = 0.01657
se(𝑑) = 0.12872
The 95% confidence interval for the optimal level of advertising then is,
𝐿, π‘ˆ = 𝑑 ± 𝑑𝛼⁄2,(𝑛−4) se(𝑑)
𝐿, π‘ˆ = 2.014 ± (1.994)(0.12872) = ($1.757, $2.271)
7.2.
Interaction Variables
An example to illustrate the use of interaction variables is the life-cycle model of consumption behavior. Here
the model involves the response of 𝑃𝐼𝑍𝑍𝐴 consumption (𝑦) to 𝐴𝐺𝐸 (π‘₯2 ) and 𝐼𝑁𝐢𝑂𝑀𝐸 (π‘₯3 ) of the consumer.
First consider the simple model without the interaction variables.
𝑦 = 𝛽1 + 𝛽2 π‘₯2 + 𝛽3 π‘₯3 + 𝑒
Using the data in the Excel file “CH6 DATA” in the tab “pizza”, the estimated regression equation is
Chapter 6—Multiple Regression
14 of 26
𝑦̂ = 342.88 − 7.576π‘₯2 + 1.832π‘₯3
Thus, holding 𝐼𝑁𝐢𝑂𝑀𝐸 constant, for each additional year of 𝐴𝐺𝐸 the expenditure on 𝑃𝐼𝑍𝑍𝐴 decreases by
$7.58:
𝑏2 =
πœ•π‘¦Μ‚
= −7.576
πœ•π‘₯2
And, holding 𝐴𝐺𝐸 constant, for each additional $1,000 increase in 𝐼𝑁𝐢𝑂𝑀𝐸 expenditure on 𝑃𝐼𝑍𝑍𝐴 rises by
$1.83.
However, we would expect that people of different ages would not spend similar amount on pizza for each
additional $1,000 increase in income. It is reasonable to expect that older persons spend smaller amount of
additional income on pizza than younger persons. Thus, we expect there to be an interaction between age
and income variables. This gives rise to the interaction variable in the model, represented by the product of
the variables 𝐴𝐺𝐸 and 𝐼𝑁𝐢𝑂𝑀𝐸: 𝐴𝐺𝐸 × πΌπ‘πΆπ‘‚π‘€πΈ (π‘₯2 π‘₯3 ).
𝑦 = 𝛽1 + 𝛽2 π‘₯2 + 𝛽3 π‘₯3 + 𝛽4 π‘₯2 π‘₯3 + 𝑒
The estimated regression equation is now:
𝑦̂ = 161.465 − 2.977π‘₯2 + 6.980π‘₯3 − 0.123π‘₯2 π‘₯3
πœ•π‘¦Μ‚
= 𝑏2 + 𝑏4 π‘₯3
πœ•π‘₯2
πœ•π‘¦Μ‚
= −2.977 − 0.123π‘₯3
πœ•π‘₯2
πœ•π‘¦Μ‚
= 𝑏3 + 𝑏4 π‘₯2
πœ•π‘₯3
πœ•π‘¦Μ‚
= 6.98 − 0.123π‘₯2
πœ•π‘₯2
For π‘₯3 = 30 ($1,000)
For π‘₯2 = 25 years
πœ•π‘¦Μ‚
= −2.977 − 0.123(30) = −6.67
πœ•π‘₯2
πœ•π‘¦Μ‚
= 6.98 − 0.123(25) = 3.90
πœ•π‘₯2
When income is $30,000, for each additional year,
expenditure on pizza is reduced by $6.67
When age is 25, for each additional $1,000 income,
expenditure on pizza is increased by $3.90.
For π‘₯3 = 80 ($1,000)
For π‘₯3 = 50 years
πœ•π‘¦Μ‚
= −2.977 − 0.123(80) = −12.84
πœ•π‘₯2
πœ•π‘¦Μ‚
= 6.98 − 0.123(50) = 0.82
πœ•π‘₯2
When income is $80,000, for each additional year,
expenditure on pizza is reduced by $12.84
When age is 50, for each additional $1,000 income,
expenditure on pizza is increased by $0.82.
7.3. Log-Linear Model
Let’s start with a model where the percent change in π‘Šπ΄πΊπΈ (𝑦) is a stochastic function of two independent
variables πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘ (π‘₯2 ) and years of 𝐸𝑋𝑃𝐸𝑅𝐼𝐸𝑁𝐢𝐸 (π‘₯3 )
ln(𝑦) = 𝛽1 + 𝛽2 π‘₯2 + 𝛽3 π‘₯3 + 𝑒
Chapter 6—Multiple Regression
15 of 26
Now if we believe that the effect of each additional year of experience also depends on the level of education,
then we may include the interaction variable πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘ × πΈπ‘‹π‘ƒπΈπ‘…πΌπΈπ‘πΆπΈ (π‘₯2 π‘₯3 ) in the model is a third
variable.
ln(𝑦) = 𝛽1 + 𝛽2 π‘₯2 + 𝛽3 π‘₯3 + 𝛽4 π‘₯2 π‘₯3 + 𝑒
Using the data in the tab “π‘€π‘Žπ‘”π‘’”, the estimated regression equation is
Μ‚
ln
(𝑦) = 1.3923 + 0.09494π‘₯2 + 0.006329514π‘₯3 − 0.0000364π‘₯2 π‘₯3
The effect of another year of πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘, holding 𝐸𝑋𝑃𝐸𝑅𝐼𝐸𝑁𝐢𝐸 constant, is,
1 𝑑𝑦̂
= 𝑏2 + 𝑏4 π‘₯3
𝑦̂ 𝑑π‘₯2
For example, given 𝐸𝑋𝑃𝐸𝑅𝐼𝐸𝑁𝐢𝐸, π‘₯3 = 5 years, the increase in π‘Šπ΄πΊπΈ from an extra year of πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘ is
9.48%.
𝑑𝑦̂⁄𝑑π‘₯2
= 0.09494 − 0.000036(5) = 0.09476 (9.476%)
𝑦̂
At a higher level of 𝐸𝑋𝑃𝐸𝑅𝐼𝐸𝑁𝐢𝐸, say π‘₯3 = 10 years, the percentage increase in π‘Šπ΄πΊπΈ for an additional year
of πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘ decreases slightly to 9.46%.
𝑑𝑦̂⁄𝑑π‘₯2
= 0.09494 − 0.000036(10) = 0.09457 (9.457%)
𝑦̂
Note that these percentage changes in π‘Šπ΄πΊπΈ are the result of a very small change in the variable
πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘. The results of a discrete change in the variable πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘, where βˆ†π‘₯2 = 1, are shown in the
calculations in the following table. The results show that at a higher level of 𝐸𝑋𝑃𝐸𝑅𝐼𝐸𝑁𝐢𝐸, any additional
year of education has a slightly smaller impact on π‘Šπ΄πΊπΈ.
𝐴
πΌπ‘›π‘‘π‘’π‘Ÿπ‘π‘’π‘π‘‘
πΈπ·π‘ˆπΆ (π‘₯2 )
𝐸𝑋𝑃𝐸𝑅 (π‘₯2 )
𝐸𝐷𝐸𝑋 (π‘₯2 π‘₯3 )
Μ‚
ln
(𝑦)
𝑦̂
βˆ†π‘¦Μ‚
(βˆ†π‘¦Μ‚ ⁄𝑦̂)%
𝑏𝑗
1.39232
0.09494
0.00633
-0.000036
π‘₯02
1
16
5
80
2.94
18.917
𝐡
π‘₯12
1
17
5
85
3.03
20.797
1.880
9.94%
π‘₯02
1
16
10
160
2.97
19.469
π‘₯12
1
17
10
170
3.06
21.400
1.931
9.92%
The effect of another year of 𝐸𝑋𝑃𝐸𝑅𝐼𝐸𝑁𝐢𝐸, holding πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘ constant, is
1 𝑑𝑦̂
= 𝑏3 + 𝑏4 π‘₯2
𝑦̂ 𝑑π‘₯3
Holding πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘constant at π‘₯2 = 8,
Chapter 6—Multiple Regression
16 of 26
𝑑𝑦̂⁄𝑑π‘₯3
= 0.00633 − 0.000036(8) = 0.00604 (0.604%)
𝑦̂
Holding πΈπ·π‘ˆπΆπ΄π‘‡πΌπ‘‚π‘constant at π‘₯2 = 16,
𝑑𝑦̂⁄𝑑π‘₯3
= 0.00633 − 0.000036(16) = 0.00575 (0.575%)
𝑦̂
The results of a discrete change in the variable 𝐸𝑋𝑃𝐸𝑅𝐼𝐸𝑁𝐢𝐸, where βˆ†π‘₯3 = 1, are shown the calculation in
the following table.
𝐴
πΌπ‘›π‘‘π‘’π‘Ÿπ‘π‘’π‘π‘‘
πΈπ·π‘ˆπΆ (π‘₯2 )
𝐸𝑋𝑃𝐸𝑅 (π‘₯2 )
𝐸𝐷𝐸𝑋 (π‘₯2 π‘₯3 )
Μ‚
ln
(𝑦)
𝑦̂
βˆ†π‘¦Μ‚
(βˆ†π‘¦Μ‚ ⁄𝑦̂)%
𝑏𝑗
1.39232
0.09494
0.00633
-0.000036
𝑋0
𝐡
𝑋1
1
8
10
80
2.212
9.136
𝑋0
1
8
11
88
2.218
9.192
0.055
0.617%
1
16
10
160
2.969
19.469
𝑋1
1
16
11
176
2.975
19.585
0.116
0.598%
The greater the number of years of education, the less valuable is an extra year of experience.
8. Measuring Goodness of Fit—π‘ΉπŸ
The coefficient of determination 𝑅2 in a multiple regression measures the combined effects of all independent
variables on 𝑦. It simply measures the proportion of total variation in 𝑦 that is explained by the regression
model.
𝑅2 =
𝑆𝑆𝑅 ∑(𝑦̂ − 𝑦̅)2
=
𝑆𝑆𝑇 ∑(𝑦 − 𝑦̅)2
These quantities are easily calculated and they are also shown in the ANOVA part of the regression output for
the π΅π‘ˆπ‘…πΊπΈπ‘… example.
ANOVA
df
Regression
Residual
Total
𝑅2 =
2
72
74
SS
1396.539
1718.943
3115.482
1396.539
= 0.4483
3115.482
This implies that nearly 45% of the variations in 𝑦 (monthly sales) is explained by the variations in price and
advertising expenditure.
In Chapter 3 it was shown that 𝑅2 is a measure of goodness of fit, that is, how well the estimated regression
2
2
fits the data: 𝑅2 = π‘Ÿπ‘¦π‘¦
Μ‚ . The same argument applies here. A high 𝑅 value means there is a close association
between the predicted and observed values of 𝑦.
Chapter 6—Multiple Regression
17 of 26
8.1.
Adjusted π‘ΉπŸ
In multiple regression 𝑅2 is affected by the number of independent variables. As we add more explanatory
variables to the model, 𝑅2 will increase. This would artificially “improve” the model. To see this, consider the
regression model
𝑦̂ = 𝑏1 + 𝑏2 π‘₯2 + 𝑏3 π‘₯3
According to the formula
𝑅2 = 1 −
∑(𝑦 − 𝑦̂)2
∑(𝑦 − 𝑦̅)2
Now, if we add another variable to the regression model, then the quantity 𝑆𝑆𝐸 = ∑(𝑦 − 𝑦̂)2 becomes smaller
and 𝑅2 = 1 − 𝑆𝑆𝐸 becomes larger.
An alternative measure devised to address this problem is the adjusted 𝑅2 :
𝑅2 = 1 −
1
𝑆𝑆𝐸 ⁄(𝑛 − π‘˜)
𝑆𝑆𝑇 ⁄(𝑛 − 1)
For our example,
RA2 ο€½ 1 ο€­
1718.943 72
= 0.4329
3115.482 74
This figure is shown in the computer regression output right below the regular 𝑅2 .
Note that 𝑅2 = 1 − 𝑆𝑆𝐸⁄𝑆𝑆𝑇. Divide the numerator and denominator of the quotient on the right-hand-side by their
respective degrees of freedom. Thus,
1
𝑅𝐴2 = 1 −
𝑆𝑆𝐸⁄(𝑛 − π‘˜)
𝑆𝑆𝑇⁄(𝑛 − 1)
To show why 𝑅𝐴2 does not increase with the increase in π‘˜ (the number of independent variables), we can rewrite it as:
𝑅𝐴2 = 1 −
𝑆𝑆𝐸 𝑛 − 1
𝑛−1
= 1 − (1 − 𝑅2 )
𝑆𝑆𝑇 𝑛 − π‘˜
𝑛−π‘˜
As π‘˜ increases, the negative adjustment on the right-hand-side rises, reducing 𝑅𝐴2 .
Chapter 6—Multiple Regression
18 of 26
Appendix
A Brief Introduction to Matrix Algebra
1.
Introduction
1.1. The Algebra of Matrices
1.1.1. Addition and subtraction of matrices
1.1.1.1. Scalar Multiplication
1.1.1.2. Matrix Multiplication
1.1.2. Identity Matrices
1.1.3. Transpose of a Matrix
1.1.4. Inverse of a Matrix
1.1.4.1. How to Find the Inverse of a Matrix
1.1.4.1.1.
The Determinant of a Matrix
1.1.4.1.2.
Properties of Determinants
1.1.5. How to use the Inverse of Matrix A to find the solutions for an equation system
1. Introduction
Matrix algebra enables us to:
ο‚· write an equation system in a compact way,
ο‚· develop a method to test the existence of a solution by evaluation of a “determinant”, and
ο‚· devise a method to find the solution.
For example, consider the following equation system with three variables π‘₯1 , π‘₯2 , and π‘₯3 :
6π‘₯1 + 3π‘₯2 + 1π‘₯3 = 22
1π‘₯1 + 4π‘₯2 − 2π‘₯3 = 12
4π‘₯1 − 1π‘₯2 + 5π‘₯3 = 10
This equation system can be written in the matrix format as follows:
6
[1
4
3
1 π‘₯1
22
4 −2] [π‘₯2 ] = [12]
−1
5 π‘₯3
10
The lead matrix on the left hand side is the coefficient matrix, denoted by 𝐴. The lag matrix is the variable
matrix π‘₯ and the matrix on the right hand side is the matrix of the constant terms, 𝑑.
6
𝐴 = [1
4
3
4
−1
1
−2]
5
π‘₯1
π‘₯
π‘₯ = [ 2]
π‘₯3
22
𝑑 = [12]
10
Thus, the short hand version of an equation system is:
𝐴π‘₯ = 𝑑
Any given matrix can be defined by its dimension: The number of rows (π‘š) and number of columns (𝑛). The
dimension of 𝐴 is 3 × 3, that of π‘₯ and 𝑑 are both 3 × 1.
1.1. The Algebra of Matrices
1.1.1. Addition and subtraction of matrices
Two matrices can be added if and only if they are conformable for addition. That is, they have the same
dimension. For example:
Chapter 6—Multiple Regression
19 of 26
𝐴=[
4
5
]
6 12
6
𝐢=[
12
6
3
10
]
15
𝐴+𝐡 =[
8
𝐷=[
3
5
]
11
𝐢−𝐷 =[
𝐡=[
8
]
19
4+6
6+3
6−8
12 − 3
5 + 10
10
]=[
12 + 15
9
−2
8−5
]=[
9
19 − 11
15
]
27
3
]
8
1.1.1.1.
Scalar Multiplication
When every element of a matrix is multiplied by a number (scalar), then we are performing a scalar
multiplication.
4
4[
6
16
5
]=[
24
12
20
]
48
1.1.1.2.
Matrix Multiplication
To multiply two matrices, they must be conformable for multiplication. This requires that the number of
columns of the lead matrix be equal to the number of rows of the lag matrix. For example, let dimension of 𝐴
be 3 × 2 and that of 𝐡, 2 × 1, then 𝐴 and 𝐡 are conformable for multiplication because 𝐴 has two columns and
𝐡 has 2 rows. The resulting product matrix will have a dimension of 3 × 1.
A ο‚΄ B ο€½ AB
(3ο‚΄2) ( 2ο‚΄1)
(3ο‚΄1)
The following example shows how the multiplication rule applies to two matrices.
π‘Ž11
𝐴 = [π‘Ž21
π‘Ž31
5
𝐴 = [6
2
π‘Ž12
π‘Ž22 ] 𝐡 = [𝑏11 ]
𝑏21
π‘Ž32
4
1]
9
π‘Ž11 𝑏11 + π‘Ž12 𝑏21
𝐴𝐡 = [ π‘Ž21 𝑏11 + π‘Ž22 𝑏21 ]
π‘Ž31 𝑏𝑏11 + π‘Ž32 𝑏21
5(7) + 4(5)
55
𝐴𝐡 = [6(7) + 1(5)] = [47]
59
2(7) + 9(5)
7
𝐡=[ ]
5
Note that if 𝐡 is used as the lead matrix and 𝐴 as the lag matrix, the two are no longer conformable for
multiplication. 𝐡: 2 × 1 and 𝐴: 3 × 2. The number of columns of the lead matrix is not the same as the
number of rows of the lag matrix. Even if switching the lead and lag matrices preserved the conformability,
still the resulting product matrix would not be the same. That is: 𝐴𝐡 ≠ 𝐡𝐴. Matrix multiplication is not
commutative.
Another example,
1
A =[
3
2ο‚΄3
2
1
−1
]
4
4
AB = [
6
2ο‚΄2 
−2
]
16
−2
B =[ 4
3ο‚΄2 
2
5
−3]
1
13
BA = [−5
3ο‚΄3
5
1
5
5
22
−16]
2
Also note that we have used the matrix multiplication rule to write an equation system in the matrix format:
6
[1
4
3
1 π‘₯1
22
4 −2] [π‘₯2 ] = [12]
−1
5 π‘₯3
10
Chapter 6—Multiple Regression
→
6π‘₯1 + 3π‘₯2 + 1π‘₯3 = 22
1π‘₯1 + 4π‘₯2 − 2π‘₯3 = 12
4π‘₯1 − 1π‘₯2 + 5π‘₯3 = 10
20 of 26
A
x ο€½ d
(3ο‚΄3) (3ο‚΄1)
(3ο‚΄1)
1.1.2. Identity Matrices
An identity matrix is a matrix with the same number of rows and columns (a square matrix) which has 1’s in
its principal diagonal and 0’s everywhere else. The identity matrix is denoted by 𝐼. The numeric subscript, if
shown, indicates the dimension.
𝐼2 = [
1
𝐼3 = [0
0
1 0
]
1 0
0
1
0
0
0]
1
Pre or post multiplying a matrix by an identity matrix leaves the matrix unchanged.
𝐴𝐼 = 𝐼𝐴 = 𝐴
Example
2
𝐴 = [3
1
4
2
4
6
3]
9
2 4
𝐴𝐼 = [3 2
1 4
6 1
3] [0
9 0
0
1
0
0
2
0] = [3
1
1
4
2
4
6
3] = 𝐴
9
1 0
𝐼𝐴 = [0 1
0 0
0 2
0] [3
1 1
4
2
4
6
2
3] = [3
9
1
4
2
4
6
3] = 𝐴
9
1.1.3. Transpose of a Matrix
A matrix is transposed by interchanging its rows and columns. The transpose of the matrix 𝐴 is denoted by
𝐴′.
π‘Ž11
𝐴 = [π‘Ž21
π‘Ž31
π‘Ž12
π‘Ž22 ]
π‘Ž32
π‘Ž11
𝐴′ = [π‘Ž
12
π‘Ž21
π‘Ž22
π‘Ž31
π‘Ž32 ]
For example,
5
𝐴 = [6
2
4
1]
9
5
𝐴′ = [
4
6
1
2
]
9
1.1.4. Inverse of a Matrix
The inverse of the square matrix 𝐴 (if it exists) is another matrix, denoted by 𝐴−1 , such that if 𝐴 is pre or post
multiplied by 𝐴−1 , the resulting product is an identity matrix.
𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼
If 𝐴 does not have an inverse, then it is called a singular matrix. Otherwise it is a nonsingular matrix.
Chapter 6—Multiple Regression
21 of 26
1.1.4.1.
How to Find the Inverse of a Matrix
Finding the inverse of a matrix is a complicated process. First we must understand several concepts required
in determining the inverse of a matrix.
1.1.4.1.1.
The Determinant of a Matrix
The determinant of a matrix 𝐴 is a scalar quantity (a number) and is denoted by |𝐴|. It is obtained by
summing various products of the elements of 𝐴. For example, the determinant of a 2 × 2 matrix is defined to
be:
π‘Ž
|𝐴| = |π‘Ž11
21
π‘Ž12
π‘Ž22 | = π‘Ž11 π‘Ž22 − π‘Ž12 π‘Ž21
The determinant of a 3 × 3 matrix is obtained as follows:
π‘Ž11
|𝐴| = |π‘Ž21
π‘Ž31
π‘Ž12
π‘Ž22
π‘Ž32
π‘Ž13
π‘Ž
π‘Ž23 | = π‘Ž11 | 22
π‘Ž32
π‘Ž33
π‘Ž23
π‘Ž21
π‘Ž33 | − π‘Ž12 |π‘Ž31
π‘Ž23
π‘Ž21
π‘Ž33 | + π‘Ž13 |π‘Ž31
π‘Ž22
π‘Ž32 |
|𝐴| = π‘Ž11 π‘Ž22 π‘Ž33 − π‘Ž11 π‘Ž23 π‘Ž32 − π‘Ž12 π‘Ž21 π‘Ž33 + π‘Ž12 π‘Ž23 π‘Ž31 + π‘Ž13 π‘Ž21 π‘Ž32 − π‘Ž13 π‘Ž22 π‘Ž31
In the latter case, each element in the top row is multiplied by a “sub determinant”. The first sub determinant
π‘Ž22
|π‘Ž
π‘Ž23
π‘Ž33 |
32
multiplied by π‘Ž11 , is obtained by eliminating the first row and the first column. The sub determinant
associated with π‘Ž11 is called the minor of that element and is denoted by π‘š11 . The minor of π‘Ž11 , π‘š11 , is
obtained by eliminating the first row and second column, and so on. If the sum of the subscript of the element
is odd, then the sign is negative. The calculation of the determinant of A then can be presented as:
|𝐴| = π‘Ž11 π‘š11 − π‘Ž12 π‘š12 + π‘Ž13 π‘š13
2
|4
7
1
5
8
3
5
6| = 2 |
8
9
4
6
|−|
7
9
6
4
| + 3|
9
7
5
| = 2(5 × 9 − 6 × 8) − (4 × 9 − 6 × 7) + 3(4 × 8 − 5 × 7) = −9
8
Here the determinant is obtained by expanding the first row. The same determinant can be found by
expanding any other row or column. In the previous example, find the determinant by expanding the third
column:
|𝐴| = π‘Ž13 π‘š13 − π‘Ž23 π‘š23 + π‘Ž33 π‘š33
2
|4
7
1
5
8
3
4
6| = 3 |
7
9
2
5
| −6|
7
8
1
2
| + 9|
8
4
1
| = 3(4 × 8 − 5 × 7) − 6(2 × 8 − 1 × 7) + 9(2 × 5 − 1 × 4) = −9
5
Now let’s introduce a related concept to the minor, called the cofactor. A cofactor, denoted by 𝑐𝑖𝑗 , is a minor
with a prescribed algebraic sign attached. If the sum of the two subscripts in the minor π‘šπ‘–π‘— is odd then the
cofactor is negative:
𝑐𝑖𝑗 = (−1)(𝑖+𝑗) π‘šπ‘–π‘—
In short, the value of determinant |𝐴| of order 𝑛 can be found by the expansion of any row or any column as
follows:
Chapter 6—Multiple Regression
22 of 26
𝑛
|𝐴| = ∑ π‘Žπ‘–π‘— 𝑐𝑖𝑗
[expansion by the 𝑖th row]
𝑗=1
𝑛
|𝐴| = ∑ π‘Žπ‘–π‘— 𝑐𝑖𝑗
[expansion by the 𝑗th column]
𝑖=1
For example, for n = 3,
𝑛
|𝐴| = ∑ π‘Ž1𝑗 𝑐1𝑗 = π‘Ž11 |𝑐11 | + π‘Ž12 |𝑐12 | + π‘Ž13 |𝑐13 |
Expansion by the first row:
𝑗=1
𝑛
Expansion by the first column:
|𝐴| = ∑ π‘Žπ‘–1 𝑐𝑖1 = π‘Ž11 |𝑐11 | + π‘Ž21 |𝑐21 | + π‘Ž31 |𝑐31 |
𝑖=1
1)
1.1.4.1.2.
Properties of Determinants
Transposing a matrix does not affect the value of the determinant: |𝐴| = |𝐴′|
1
|𝐴| = |
3
2)
2
| = 1 × 4 − 2 × 3 = −2
4
2
| = 1 × 4 − 2 × 3 = −2
4
|
3
1
4
|= 3×2−4×1 =2
2
|
1×5
3
2×5
| = 5 × 4 − 10 × 3 = −10
4
The addition (subtraction) of a multiple of any row (or column) to another row (or column) will leave
the value of the determinant unchanged.
In the determinant|
1
|
3−3
5)
3
| = 1 × 4 − 2 × 3 = −2
4
Multiplying any one row (or any one column) by a scalar will change the value of the determinant by
the multiple of that scalar.
1
|
3
4)
1
|𝐴′| = |
2
The interchange of any two rows (or any two columns) will alter the sign, leaving the absolute value
of the determinant unchanged.
1
|
3
3)
2
| = 1 × 4 − 2 × 3 = −2
4
1
3
2
1
|=|
4−6
0
2
|, multiply the first row by −3 and add to the second row:
4
2
| = −2
−2
If one row (or column) is a multiple of another row (or column), the value of the determinant is zero;
the determinant will vanish. In other words, if two rows (or columns) are linearly dependent, the
determinant vanishes.
In the following example the second row is the first row multiplied by 4.
1
|
4
2
|=8−2×4 =0
8
A very important conclusion relating the existence of the determinant to the existence of a unique
solution for an equation system:
Chapter 6—Multiple Regression
23 of 26
Consider the equation system shown at beginning of the discussion of matrices:
6π‘₯1 + 3π‘₯2 + 1π‘₯3 = 22
1π‘₯1 + 4π‘₯2 − 2π‘₯3 = 12
4π‘₯1 − 1π‘₯2 + 5π‘₯3 = 10
The coefficient matrix of the equation system is
6
𝐴 = [1
4
3
4
−1
1
−2]
5
This equation system has a unique solution because there is a non-vanishing determinant |𝐴| = 52.
The matrix A is a nonsingular matrix.
Now consider the following equation system:
16π‘₯1 + 3π‘₯2 + 1π‘₯3 = 22
12π‘₯1 + 6π‘₯2 + 2π‘₯3 = 11
14π‘₯1 − 1π‘₯2 + 5π‘₯3 = 10
The determinant of the coefficient matrix is
6
|𝐴| = |12
4
3
6
−1
1
2| = 6(6 × 5 + 2) − 3(12 × 5 − 2 × 4) + (−1 × 12 − 6 × 4) = 0
5
The determinant vanishes because rows 1 and 2 are linearly dependent: The second row is first row
multiplied by 2. Thus the equation system will not have a unique solution.
6)
The expansion of the determinant by alien cofactors (the cofactor of a “wrong” row or column)
always yields a value of zero.
Expand the following determinant by using the first row elements but the cofactors of the second row
elements
6 3
𝐴 = |1 4
4 −1
1
−2|
5
𝑐21 = −[3 × 5 − (−1) × 1] = −16
𝑐22 = (6 × 5 − 1 × 4) = 26
𝑐23 = −[6 × (−1) − 3 × 4] = 18
π‘Ž11 = 6
π‘Ž12 = 3
π‘Ž12 = 1
π‘Ž11 𝑐21 = −96
π‘Ž12 𝑐22 = 78
π‘Ž13 𝑐23 = 18
3
∑ π‘Žπ‘–π‘— 𝑐𝑖𝑗 = π‘Ž11 𝑐21 + π‘Ž12 𝑐22 + π‘Ž13 𝑐23 = −96 + 78 + 18 = 0
𝑗=1
This last property of the determinants finally leads us to the method to find the inverse of a determinant.
Finding the inverse of A involves the following steps:
π‘Ž11
𝐴 = [π‘Ž21
π‘Ž31
π‘Ž12
π‘Ž22
π‘Ž32
π‘Ž13
π‘Ž23 ]
π‘Ž33
Chapter 6—Multiple Regression
24 of 26
1)
Replace each element π‘Žπ‘–π‘— of A by its cofactors 𝑐𝑖𝑗 .
𝑐11
𝐢 = [𝑐21
𝑐31
2)
𝑐12
𝑐22
𝑐32
𝑐13
𝑐23 ]
𝑐33
Find the transpose of 𝐢. This transpose matrix, 𝐢′, is called the π’‚π’…π’‹π’π’Šπ’π’• matrix of 𝐴 .
𝑐11
𝐢 ′ ≡ adjoint 𝐴 = [𝑐12
𝑐13
3)
𝑐21
𝑐22
𝑐23
𝑐31
𝑐32 ]
𝑐33
Multiply 𝐴 by adjoint 𝐴
π‘Ž11
𝐴𝐢 = [π‘Ž21
π‘Ž31
′
π‘Ž12
π‘Ž22
π‘Ž32
π‘Ž13 𝑐11
π‘Ž23 ] [𝑐12
π‘Ž33 𝑐13
𝑐21
𝑐22
𝑐23
∑π‘Ž1𝑗 𝑐1𝑗
𝑐31
𝑐32 ] = [∑π‘Ž2𝑗 𝑐1𝑗
𝑐33
∑π‘Ž3 𝑐1𝑗
∑π‘Ž1𝑗 𝑐2𝑗
∑π‘Ž2𝑗 𝑐2𝑗
∑π‘Ž3𝑗 𝑐2𝑗
∑π‘Ž1𝑗 𝑐3𝑗
∑π‘Ž2𝑗 𝑐3𝑗 ]
∑π‘Ž3𝑗 𝑐3𝑗
Now note that the elements in principal diagonal of the product matrix simply provide the determinant of 𝐴.
All other elements outside the principal diagonal are the expansions of the determinant by an alien cofactor.
Thus, they are all zeros.
∑π‘Ž1𝑗 𝑐1𝑗
[∑π‘Ž2𝑗 𝑐1𝑗
∑π‘Ž3 𝑐1𝑗
∑π‘Ž1𝑗 𝑐2𝑗
∑π‘Ž2𝑗 𝑐2𝑗
∑π‘Ž3𝑗 𝑐2𝑗
∑π‘Ž1𝑗 𝑐3𝑗
|𝐴|
∑π‘Ž2𝑗 𝑐3𝑗 ] = [ 0
0
∑π‘Ž3𝑗 𝑐3𝑗
0
|𝐴|
0
0
1
0 ] = |𝐴| [0
|𝐴|
0
0
1
0
1
0] = |𝐴|𝐼
1
Thus,
𝐴𝐢 ′ = |𝐴|𝐼
Now divide both sides of the equation by the determinant |𝐴|
𝐴𝐢′
=𝐼
|𝐴|
Pre multiply both sides by 𝐴−1 ,
𝐴−1 𝐴𝐢′
= 𝐴−1 𝐼
|𝐴|
Since 𝐴−1 𝐴 = 𝐼, 𝐼𝐢 ′ = 𝐢′, and 𝐴−1 𝐼 = 𝐴−1 , then:
𝐴−1 =
𝐢′
|𝐴|
The inverse of matrix A is obtained by dividing the π‘Žπ‘‘π‘—π‘œπ‘–π‘›π‘‘ A by the determinant.
6
Find the inverse of 𝐴 = [1
4
3
4
−1
1
−2]
5
First find the |𝐴|
Chapter 6—Multiple Regression
25 of 26
|𝐴| = 6 |
4
−1
−2
1
| − 3|
5
4
−2
1
|+|
5
4
4
| = 6(18) − 3(13) + (−17) = 52
−1
Next find the cofactor matrix
4 −2
|
−1
5
3 1
𝐢 = −|
|
−1 5
3
1
[ |4 −2|
|
−2
|
5
6 1
|
|
4 5
6
1
−|
|
1 −2
−|
1
4
1
4
6
−|
4
|
4
|
−1
18
3
| = [−16
−1
−10
6 3
|
|]
1 4
−13
26
13
−17
18]
21
Find adjoint 𝐴
18
𝐢 ′ = [−13
−17
𝐴−1 =
−16
26
18
−10
13]
21
18
1 ′
1
𝐢 =
[−13
|𝐴|
52
−17
−16
26
18
−10
13]
21
With each element in 𝐴−1 rounded to three decimal points,
0.346
𝐴−1 = [−0.250
−0.327
−0.308
0.500
0.346
−0.192
0.250]
0.404
1.1.5. How to use the Inverse of Matrix A to find the solutions for an equation system
Note that in the previous example the matrix A was the coefficient matrix of the equation system
6π‘₯1 + 3π‘₯2 + 1π‘₯3 = 22
1π‘₯1 + 4π‘₯2 − 2π‘₯3 = 12
4π‘₯1 − 1π‘₯2 + 5π‘₯3 = 10
6
[1
4
3
4
−1
1 π‘₯1
22
−2] [π‘₯2 ] = [12]
5 π‘₯3
10
𝐴π‘₯ = 𝑑
Now pre multiply both sides of the matrix notation of the equation system by 𝐴−1 :
𝐴−1 𝐴 = 𝐴−1 𝑑
which results in
π‘₯ = 𝐴−1 𝑑
π‘₯1
0.346
[π‘₯2 ] = [−0.250
π‘₯3
−0.327
−0.308
0.500
0.346
−0.192 22
2
0.250] [12] = [3]
0.404 10
1
Thus, π‘₯1 = 2, π‘₯2 = 3, and π‘₯3 = 1.
Chapter 6—Multiple Regression
26 of 26
Download