Chapter 6
Multiple Linear Regression
Analysis
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Learning Objectives
• Understand the goals of multiple linear regression
analysis
• Understand the “holding all other variables
constant” condition in multiple linear regression
analysis
• Understand the multiple linear regression
assumption required for OLS to be BLUE
• Interpret multiple linear regression output in excel
• Assess the goodness-of-fit of the estimated
sample regression function
6-2
Learning Objectives
• Perform hypothesis tests for the overall
significance of the estimated sample regression
function
• Perform hypothesis tests for the individual
significance of an estimated slope coefficient
• Perform hypothesis tests for the joint
significance of a subset of estimated slope
coefficients
• Perform the chow test for structural differences
between two subsets of data
6-3
6-4
The Multiple Regression Model
Idea: Examine the linear relationship between one dependent
variable, y, and two or more independent variables, x1, x2,…xk
Population model:
Y-intercept
Population slopes
Random Error
y  β0  β1x1  β2 x 2    βk xk  ε
Estimated multiple regression model:
Estimated
(or predicted)
value of y
Estimated
intercept
Estimated slope coefficients
ŷ  ˆ0  ˆ1x1  ˆ2 x 2    ˆk x k
6-5
A Visual Depiction of the Estimated Sample
Multiple Linear Regression Function
6-6
A Visual Depiction of the Predicted Value of and
the Calculated Residual for a Given Observation
6-7
A Visual Depiction of the Predicted Values of
and the Calculated Residuals for Multiple
Observations
6-8
How are the Multiple Linear
Regression Estimates Obtained?
Minimize the sum of squared residuals
min
n
n
i 1
i 1
2
2
ˆ
(
y

y
)

e
 i i i
n
  ( yi  ˆ0  ˆ1 x1i  ˆ1 x2i  ...  ˆ1 xki ) 2
i 1
Unlike simple linear regression, there is no
formula in summation notation for the intercept
and slope coefficient estimates.
6-9
Understand the “Holding All Other
Independent Variables Constant”
Condition
The idea behind holding all other factors
constant (or ceteris paribus) is that we want to
isolate the effects of a specific x on the
dependent variable without any other factors
changing the independent variable or the
dependent variable.
6-10
A Venn Diagram of the Estimated Linear
Relationship between y and x1 Assuming
No Factors in the Error Term Affect x1
6-11
A Venn Diagram of the Estimated Multiple
Linear Relationship between y, x1, and x2
6-12
A Comparison of the Estimated Marginal
Effect of y and x1 in the Given Venn
Diagrams
This is omitted variable bias. It is the bias in x1 that
results from x2 not being included in the model and x2
being related to both x1 and y.
6-13
When is Omitted Variable Bias Not
Present?
(1) If x2 is not related to y – then x2 is not in the
error term and does not have to be held
constant when x1 changes.
(2) If x2 is not related to x1 – then x2 will not
change when x1 changes.
6-14
Understand the Multiple Linear Regression
Assumptions Required for OLS to be the Best
Linear Unbiased Estimator
Assumptions Required for OLS to be Unbiased
Assumption M1: The model is linear in the parameters
Assumption M2: The data are collected through independent, random sampling
Assumption M3: The data are not perfectly multicollinear.
Assumption M4: The error term has zero mean
Assumption M5: The error term is uncorrelated with each independent variable
and all functions of each independent variable.
Additional Assumption Required for OLS to be BLUE
Assumption M6: The error term has constant variance.
Note that these assumptions are theoretical and typically can’t be proven or disproven.
6-15
Assumption S1: Linear in the Parameters
This assumption states that for OLS to be
unbiased, the population model must be
correctly specified as linear in the parameters.
y i  β 0  β1x1,i  β 2 x 2,i    β k x k,i  ε i
6-16
When is Assumption S1 Violated?
(1) If the population regression model is non-linear
in the parameters, i.e.
y i  β 0  (ln β1 )x 1,i  β 2 x 2,i    β k x k,i  ε i
(2) If the true population model is not specified
correctly, i.e. if the true model is
y i  β 0  β1 (ln x 1,i )  β 2 x 2,i    β k x k,i  ε i
but the model on the previous slide is the one
that is estimated.
6-17
Assumption S2: The Data are Collected
through Simple Random Sampling
This assumption states that for OLS to be
unbiased, the data must be obtained through
simple random sampling.
This assumption ensures that the observations
are statistically independent of each other
across the units of observations.
6-18
When is Assumption S2 Violated?
(1) If the data are time series data such as GDP
and interest rates for the US collected over
time. In this circumstance observations from
this time period are likely related to
observations in previous time periods.
(2) If there is some type of selection bias in the
sampling. For example, if individuals opt to
be in a job training program, go to college, or
the response rate for a survey is low.
6-19
Assumption S3: The Data are Not
Perfectly Multicollinear
This assumption states that for OLS to be unbiased,
each independent variable cannot be all the same value
or
for j = 1, …., k
( x  x )2  0

j ,i
j
This assumption also states that one of the
independent variables is not a linear combination of
another independent variable.
This assumption ensures that slope estimator is
defined. This assumption is only violated if the model
falls into the dummy variable trap.
6-20
Assumption S4: The Error Term has Zero Mean
This assumption states that for OLS to be
unbiased, the average value of the population
error term is zero or
E ( )  0
This assumption will hold as long as an intercept
is included in the model. This is because if the
average value of the error term equals a value
other than zero then the intercept will change
accordingly.
6-21
Assumption S5: The Error Term is Not
Correlated with each Independent Variable or
Any Function of each Independent Variable
This assumption states that for OLS to be
unbiased, the error term is uncorrelated with
the independent variable and all functions of
the independent variable
E ( | xij )  0 for all j.
This is read as the expected value of ε given xij is
equal to 0.
6-22
How to Determine if Assumption S5
Violated?
(1) Think of all the factors that affect the dependent
variable that are not specified in the model. For
the salary vs. education example variables that
are in the error term include experience, ability,
job type, gender, and many other factors.
(2) If any of these factors, say ability, are related to
any of the independent variables, say education,
then violation S5 is violated.
Note that the error term is never observed so
determining whether S5 is violated is only a thought
experiment.
6-23
The Importance of S1 through S5
If assumptions S1 through S5 hold, then the OLS
estimates are unbiased.
This assumption is less likely to be violated in
multiple linear regression analysis than simple
linear regression analysis but for nonexperimental data (i.e. the type of data
economists use) that these assumptions almost
always fail and therefore the OLS estimates are
typically biased.
6-24
Assumption S6: The Error Term has
Constant Variance
This assumption states that the error term is has
a constant variance or in equation form
VAR( )  
2
This is called homoskedasticity.
If this assumption fails then the error term is
heteroskedastic or the error term has a nonconstant variance.
VAR( )  
2
i
6-25
How to Determine if Assumption S6
Violated?
Create a scatter plot of y against each x and
decide whether the points are scattered in a
constant manner around the line.
Heteroskedasticity does not have to look like the
graph on the right on the next slide, there just
has to be a non-constant distribution of the data
points along the line.
Chapter 9 gives a more in depth coverage of this
topic.
6-26
Visual Depiction of Homoskedasticity
versus Heteroskedasticity
6-27
The Importance of S1 through S6
If assumptions S1 through S6 hold, then the OLS
estimates are BLUE or the Best Linear Unbiased
Estimators.
In this instance Best means minimum variance.
This means that among all linear unbiased
estimators of the population slope and population
intercept, the OLS estimates have the lowest
variance.
As before, in simple linear regression analysis in
economics these assumption rarely hold.
6-28
Interpret Multiple Linear Regression in
Excel: Data Set
Model: housepricei  β 0  β1sqfeeti  β 2 bedroomsi  ε i
6-29
Scatter Diagrams
From these scatter diagrams it is
evident that both square feet and
bedrooms have a positive linear
association with the price of a
house
House Price vs. Square Feet
450000
400000
300000
250000
200000
150000
House Price vs. Bedrooms
100000
50000
450000
0
400000
0
500
1000
1500
2000
2500
3000
350000
Square Feet
House Price
House Price
350000
300000
250000
200000
150000
100000
50000
0
0
1
2
3
4
5
6
Bedrooms
6-30
Interpret Multiple Linear Regression in Excel:
Regression output
Estimated Sample Regression Function :
housepricei  89,267.43  56.11sqfeeti  30,606.62bedroomsi
β̂ 0
β̂1
x1
β̂ 2
x2
6-31
Interpret Multiple Linear Regression in
Excel: Interpreting the Output
Estimated Sample Regression Function :
housepricei  89,267.43  56.11sqfeeti  30,606.62bedroomsi
β̂ 0 : On average, if square feet and bedrooms are 0, then the
predicted house price is $89,267.43.
β̂1 : On average, holding bedrooms constant, if square footage
increases by one foot then the price of the house increases by
$56.11.
β̂ 2 : On average, holding square footage constant, if the number
of bedrooms increases by one then the price of the house
increases by $30,606.62.
6-32
Interpret Multiple Linear Regression in Excel:
Obtaining a Predicted Value
Estimated Sample Regression Function :
housepricei  89,267.43  56.11sqfeeti  30,606.62bedroomsi
Suppose we wish to predict the price of a house with
2,000 square feet and 3 bedrooms.
housepricei  89,267.43  (56.11)(2,000)  (30,606.62)(3)
housepricei  $293,309.71
The predicted price of a house is $293,309.71.
6-33
Assess the Goodness of Fit of the Sample
Multiple Linear Regression Function: R2
ExplainedSS 21,998,347,856
R 

 0.6748
TotalSS
32,600,500,000
2
The R2 means that 67.48% of the variation in housing
price can be explained by square feet and bedrooms.
6-34
Assess the Goodness of Fit of the Sample
Multiple Linear Regression Function: Adjusted R2
Un exp lainedSS
R 2  1
21,998,347,856
n  k 1 
10  3  0.5819
TotalSS
32,600,500,000
n 1
9
The adjusted R2 imposes a penalty for adding in additional
explanatory variables. The penalty is that in the numerator as k goes
up the adjusted R2 goes down (if USS is held constant).
6-35
Assess the Goodness of Fit of the Sample Multiple Linear
Regression Function: Standard Error of the Regression
Model
s y| x
Un exp lainedSS
10,602,152,154


 38,917.77
n  k 1
7
The standard error of the regression can also be calculated
by taking the square root of the MSUnexplained.
6-36
Perform Hypothesis Tests for the Overall
Significance of the Sample Regression
Function
• F-Test for Overall Significance of the Model
• Shows if there is a linear relationship between any of the
independent variables considered together and the
dependent variable, y
• Use F test statistic
• Hypotheses:
– H0: β1 = β2 = … = βk = 0 (no linear relationship)
– H1: at least one βi ≠ 0 (at least one independent variable
affects y)
6-37
F-Statistic for Overall Significance
• Test statistic:
ExplainedSS
MSExplained
k
F  Stat 

Un exp lainedSS MSUn exp lained
n  k 1
where F has (numerator) D1 = k and
(denominator) D2 = (n – k - 1)
degrees of freedom
6-38
Rejection Rules for the F-Test for the Overall
Significance of the Regression Model
Critical Value:
Reject H0 if F-Stat > Fα, k, n-k-1
P-value:
Reject H0 if p-value < α
(the p-value for this test is found under
Significance F in the ANOVA table in Excel)
6-39
F-Test for Overall Significance
MSExplaine d
10,999,173,923
F

 7.2621
MSUnexplai ned 1,514,593,165
With 2 and 7 degrees of
freedom
P-value for
the F-Test
6-40
F-Test for Overall Significance
Test Statistic:
MSR
F
 7.2621
MSE
H0: β1 = β2 = 0
H1: β1 and β2 not both zero
p  value  0.0196
 = .05
df1= 2 df2 = 7
Rejection Rule:
Critical
Value:
Reject H0 if F-stat > 4.737
or
F = 4.737
Reject H0 if p-value < .05
 = .05
0
Do not
reject H0
Reject H0
F.05 = 4.737
Conclusion:
F
Because 7.2621 (or alternatively because
0.0196 < .05), we reject H0 and conclude
that at least one of square footage or
bedrooms affects the price of a house. 6-41
Are Individual Independent
Variables Significant?
• Use t-tests of individual variable slopes
• Shows if there is a linear relationship between the
variable xi and y
• Hypotheses:
– H0: βi = 0 (no linear relationship)
– H1: βi ≠ 0 (linear relationship does exist between
xi and y)
6-42
Are Individual Independent
Variables Significant?
H0: βi = 0 (no linear relationship)
H1: βi ≠ 0 (linear relationship does exist
between xi and y)
Three ways to test this hypothesis
(1) Confidence Interval
(2) Critical Value
(3) p-value
6-43
Using a Confidence Interval to test
Individual Statistical Significance
H0: βi = 0 (no linear relationship between xi and y)
H1: βi ≠ 0 (linear relationship exists between xi and y)
ˆi  t / 2,nk 1sˆ
i
Reject H0 if 0 is not within the confidence interval.
The α is 1 – the confidence level. The confidence
level is usually 95% so α = .05
6-44
Confidence Interval Estimate
for the Slope
Confidence interval for the population slope β1
(the effect of changes of square feet on house prices):
ˆi  t.05/ 2,1021sˆ
i
56.1112  (2.36)( 48.8592)
(59.1965, 171.4189)
Decision: This confidence interval includes 0 so we fail to reject
H0 and conclude that square feet does not have a statistically
significant effect on the price of a house at the 5% level. The
interval is different from the Excel output due to rounding.
6-45
Using Critical Values to test
Individual Statistical Significance
H0: βi = 0 (no linear relationship)
H1: βi ≠ 0 (linear relationship exists between xi and y)
Test Statistic:
t  statistic 
ˆi  0
sˆ
i
Rejection Rule: Reject H0 if |t-statistic| > t α, n-k-1
6-46
Using Critical Values to Test for Individual
Significance of Square Feet (x1)
t  statistic 
ˆ1  0
sˆ
56.1112  0

 1.1484 t.025, 7  2.36
48.8592
1
Rejection Rule: Reject H0 if |t-statistic| > 2.36
Decision: Because 1.1484 < 2.36, we fail to reject H0 and
conclude that square feet does not have a statistically
significant effect on the price of a house at the 5% level.
6-47
Using p-values to Test Individual
Statistical Significance
H0: βi = 0 (no linear relationship)
H1: βi ≠ 0 (linear relationship exists between xi and y)
Test Statistic: t  statistic 
ˆi  0
sˆ
i
p  value  2 * P( Z | t  statistic |)
(Usually the p-value is found on the Excel Output)
Rejection Rule: Reject H0 if p-value < α
6-48
Using p-value to Test for Individual
Significance of Square Feet (x1)
t  statistic 
ˆ1  0
sˆ

56.1112  0
 1.1484 p  value  0.2885
48.8592
1
Rejection Rule: Reject H0 if p-value < .05
Decision: Because 0.2885 > .05, we fail to reject H0 and
conclude that square feet does not have a statistically
significant effect on the price of a house at the 5% level.
6-49
Things to note about the different methods
for tests of individual significance
(1) All three methods yield the same
conclusions.
(2) To test for individual significance of
bedrooms instead of square footage follow
the same process but use the row below
square footage
Using any of the three methods we see that bedrooms is also
statistically insignificant at the 5% level
6-50
What is multicollinearity?*
Multicollinearity is when two of the independent
variables are highly linearly related.
Note that multicollinearity is not perfect
multicollinearity. Perfect multicollinearity implies
that the correlation coefficient is 1 in absolute
value. Multicollinearity means that the
correlation coefficient is high but not perfect
between two independent random variables.
*Note: This material is not covered in the textbook
6-51
Venn Diagram Explanation of
Multicollinearity
6-52
What are the Implications of
Multicollinearity?
Unlike perfect multicollinearity OLS estimates can
still be obtained.
OLS estimates are still unbiased.
Standard errors are large because there is very
little information that goes into the estimation of
each of the slopes.
6-53
Perform Hypothesis Tests for the Joint
Significance of a Subset of Slope Coefficients
The original regression model is
y i  β 0  β1x1,i  β 2 x 2,i  β 3 x 3,i  ε i
After testing for individual significance, x2 and x3
are individually statistically significant at the 5%
level.
The researcher would like to know if x2 and x3
are jointly statistically significant.
6-54
Perform Hypothesis Tests for a Subset of
Explanatory Variables
This is an F-Test for joint statistical significance
Hypothesis:
H0: β2 = β3 = 0 (no linear relationship)
H1: at least one of β2 or β3 explains y
Unrestricted model (the original model)
y i  β 0  β1x1,i  β 2 x 2,i  β 3 x 3,i  ε i
Restricted model (the model with the null hypothesis imposed, in
this case β3 = β4 = 0)
yi  β0  β1x1,i  ε
*
i
6-55
F-Statistic for Overall Significance
Test statistic:
(Un exp lainedSS restricted  Un exp lainedSS unrestricted )
q
F  Statistic 
Un exp lainedSS unrestricted
n  k 1
or
2
2
( Runrestrict

R
ed
restricted )
q
F  Statistic 
2
(1  Runrestrict
ed )
n  k 1
where q is the number of restrictions (the number of
equal signs in the null hypothesis, in this case 2)
6-56
Rejection Rules for the F-Test for the Overall
Significance of the Regression Model
Critical Value:
Reject H0 if F-Stat > Fα, q, n-k-1
For this test, it is necessary to run two
regressions
(1) Unrestricted Regression
(2) Restricted Regression
6-57
For the Housing Price Example
The original model is
housepricei  β 0  β1lotsizei  β 2 sqfeeti  β3bedroomsi  ε i
UnexplainedSSunrestricted
Using the p-values, lot size is individually statistically significant
at the 5% level but square feet and bedrooms are statistically
insignificant at the 5% level.
6-58
Testing if Square Feet and Bedrooms are
Jointly Equal to 0
Hypothesis:
H0: β2 = β3 = 0 (no joint linear relationship)
H1: at least one of β2 or β3 explains y
Restricted model (the model with the null hypothesis imposed, in
*
this case β3 = β4 = 0) housepricei  β0  β1lotsize1,i  ε i
UnexplainedSSrestricted
6-59
F-Statistic for Joint Significance
Test statistic:
(621,726,567  456,205,475)
2
F  Statistic 
 1.0885
456,205,475
6
Reject H0 if F-Stat > F.05, 2, 6
Reject H0 if F-Stat > 5.143
Decision: Because 1.0885 is not greater than 5.143 we
reject H0 and conclude that square feet and bedrooms do
not jointly affect house price.
6-60
F-Statistic for Joint Significance Using R2
Test statistic:
2
2
( Runrestrict

R
(0.9860  0.9809)
ed
restricted )
q
2
F  Statistic 

 1.0885
2
(1  0.9860)
(1  Runrestricted )
6
n  k 1
Reject H0 if F-Stat > F.05, 2, 6
Reject H0 if F-Stat > 5.143
Decision: Because 1.0885 is not greater than 5.143 we reject
H0 and conclude that square feet and bedrooms do not
jointly affect house price.
Notice that we obtained the same F-Statistic using
SSUnexplained as we did using R2.
6-61
Chow Test
Use to test if there are statistical differences between
two groups such as men and women, those who
have graduated from college and those who haven’t,
ect.
For the Chow test run three regressions
(1) The entire data set all together and the USS is the
UnexplainedSSrestricted
(2) One subset of the data (i.e. only the men) and the
USS is the UnexplainedSS1
(3) The other subset of the data (i.e. only the
women) and the USS is UnexplainedSS2
6-62
The Hypothesis, Test Statistic, and
Rejection Rule
H0: There are no differences between the two groups
H1: There is at least one difference between the two groups
(USSrestricted  USS1  USS2 )
k 1
F  Statistic 
USS1  USS2
2(n  k  1)
Rejection Rule: Reject H0 if F-Stat > Fα, k+1,2( n-k-1)
If the null hypothesis is rejected then we conclude
that a difference exists between the two groups
either in the intercepts, slopes or both.
6-63
Creating a Confidence Interval Around a
Prediction in Multiple Linear Regression *
The formula for the confidence interval is
yˆ p  t / 2,n  k 1s pˆ
where ŷ p is the predicted value, t / 2,n k 1 is the
critical value from the t-table, and s pis
ˆ the
standard error of the prediction. The only
component that we don’t know how to obtain is
the standard error of the prediction.
*Note: This material is not covered in the textbook
6-64
Finding the Standard Error of the
Prediction
There is not a straightforward formula for the
standard error of the prediction like there is in
simple linear regression
To find this standard error we need to
create new variables and run an additional
regression.
The new variables that need to be created
are for each observation and for each
independent variable subtract off the
value you are interested in predicting for.
6-65
Original Regression Results
6-66
The Housing Price Example from Before:
Estimated Sample Regression Function :
housepricei  89,267.43  56.11sqfeeti  30,606.62bedroomsi
Suppose we wish to predict the price of a house with
2,000 square feet and 3 bedrooms.
housepricei  89,267.43  (56.11)(2,000)  (30,606.62)(3)
housepricei  $293,309.71
Say we want to put a confidence interval about this
prediction.
6-67
An Example of How to Find the Standard
Error of the Prediction
Create two new variables in Excel by subtracting
2,000 from each square feet observation and 3
from each bedroom observation and then run a
regression with price as the dependent variable
and the two new variables that were just created
as the independent variables.
6-68
Example of Making New Independent
Variables:
Same
Dependent
variable
6-69
Excel Regression Results to Find a 95%
Prediction Interval for a Mean Value
Predicted Value
$293,309.71
Note this is the same value we found earlier
Standard Error of the Mean Prediction
$22,932.24
Confidence interval around the prediction
95% Confidence interval for the mean is
($239,083.58, $347,535.84)
6-70
Excel Regression Results to find a 95%
prediction interval for an individual value
Predicted Value $293,309.71
Critical Value = 2.36
Standard Error of an Individual Prediction
$22,932.24 + 38,917.77 = $61,850.01
Confidence interval around a prediction
95% Prediction interval for an individual is
293,309.71 (2.36)(61,850.01)
293,309.71 145,966.02
(147,343.69, 439,275.73)
Notice how much bigger the interval is than before
6-71
How to Test if Two Coefficient
Estimates are Equal*
Say the original regression model is
y i  β 0  β1x1,i  β 2 x 2,i  β 3 x 3,i  ε i
and you want to test if β1 is equal to β 2
H0: β1 = β2 or β1 - β2 = 0
H1: β1 ≠ β2 or β1 - β2 ≠ 0
This is a t-test and sˆ  ˆ is difficult to obtain in
Excel.
ˆ  ˆ  0
1
2
t  statistic 
1
2
sˆ  ˆ
1
2
*Note: This material is not covered in the textbook
6-72
How to Obtain
sˆ  ˆ
1
2
in Excel
(1) Set β1 - β2 = θ and solve for β1 or β1 = β2 + θ
(2) Substitute β2 + θ for β1 in the regression model
and isolate the parameters
y i  β 0  β1x1,i  β 2 x 2,i  β 3 x 3,i  ε i
y i  β 0  (β2 + θ)x 1,i  β 2 x 2,i  β 3 x 3,i  ε i
y i  β 0 θ x1,i  β 2 (x 1,i  x 2,i )  β 3 x 3,i  ε i
(3) Create a new variable (x1,i + x2,i) and regress y
on x1,i , (x1,i + x2,i) and x3,i. The t-test and the pvalue for the t-test is in the row with x1,i.
6-73
Original Regression
houseprice i  89,2383.27  32,416.38bedroomsi
 4257.54bathroomsi  61.61sqfeeti
Say we want to test if the coefficients
bedrooms and bathrooms are equal
Point Estimate of
32,416.39 – (-4,257.54)= $36,673.93
6-74
Create a New Variable
(Bedrooms + Bathrooms)
Dependent
variable
6-75
Excel Regression Results to Find a 95%
Prediction Interval for a Mean Value
Point Estimate ˆ1  ˆ2 = $36,673.93
Standard Error of
ˆ1  ˆ2 = $67,466.01
t-stat for this test is 0.5436
p – value = 0.6063
We fail to reject H0 and conclude β1 = β2
6-76