Chapter 14, Multiple Regression Analysis

advertisement
Ch. 14: The Multiple Regression Model
building
Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (Xi)
Multiple Regression Model with k Independent Variables:
Y-intercept
Population slopes
Random Error
Yi  β0  β1 X 1i  β2 X 2i    βk X ki  ε
• The coefficients of the multiple regression model
are estimated using sample data with k
independent variables
Estimated
(or predicted)
value of Y
Estimated
intercept
Estimated slope coefficients
Ŷi  b0  b1 X 1i  b2 X 2i    bk X ki
• Interpretation of the Slopes: (referred to as a Net
Regression Coefficient)
– b1=The change in the mean of Y per unit change in X1,
taking into account the effect of X2 (or net of X2)
– b0 Y intercept. It is the same as simple regression.
Graph of a Two-Variable Model
• Three dimension
Y
Ŷ  b0  b1 X 1  b2 X 2
X2
X1
Example:
• Simple Regression Results
Intercept (b0)
Lotsize (b1)
Coefficients Standard Error t Stat
165.0333581 16.50316094 10.000106
6.931792143 2.203156234 3.1463008
F-Value
Adjusted R Square
Standard Error
9.89
0.108
36.34
• Multiple Regression Results
Intercept
Lotsize
Rooms
Coefficients Standard Error
59.32299284 20.20765695
3.580936283 1.794731507
18.25064446 2.681400117
t Stat
2.935669
1.995249
6.806386
F-Value
Adjusted R Square
Standard Error
31.23
0.453
28.47
• Check the size and significance level of the
coefficients, the F-value, the R-Square, etc. You
will see what the “net of “ effects are.
Using The Equation to Make Predictions
• Predict the appraised value at average lot size
(7.24) and average number of rooms (7.12).
App.Val .  59.32  3.58 (7.24)  18.25(7.12)
 215.18 or $215,180
• What is the total effect from 2000 sf increase in lot
size and 2 additional rooms?
Increse in app. value
 (3.58)(200 0)  (18.25)(2)
 $43,660
Coefficient of Multiple Determination, r2
and Adjusted r2
• Reports the proportion of total variation in Y
explained by all X variables taken together (the
model)
2
Y.12..k
r
SSR regression sum of squares


SST
total sum of squares
• Adjusted r2
• r2 never decreases when a new X variable is added to the
model
– This can be a disadvantage when comparing models
• What is the net effect of adding a new variable?
– We lose a degree of freedom when a new X variable is added
– Did the new X variable add enough explanatory power to offset
the loss of one degree of freedom?
• Shows the proportion of variation in Y explained
by all X variables adjusted for the number of X
variables used

 n  1 
2
radj  1  ( 1  rY .12.. k )

 n  k  1 

2
(where n = sample size, k = number of independent variables)
– Penalize excessive use of unimportant independent
variables
– Smaller than r2
– Useful in comparing among models
Multiple Regression Assumptions
•
•
•
•
•
Assumptions:
The errors are normally distributed
Errors have a constant variance
The model errors are independent
Errors (residuals) from the regression model:
ei = (Yi – Yi)
• These residual plots are used in multiple
regression:
– Residuals vs. Yi
– Residuals vs. X1i
– Residuals vs. X2i
– Residuals vs. time (if time series data)
Two variable model
Y
Yi
Ŷ  b0  b1 X 1  b2 X 2
<
Residual = ei
= (Yi – Yi)
Sample
observation
<
Yi
x2i
X1
<
x1i
X2
The best fit equation, Y ,
is found by minimizing the
sum of squared errors, e2
Are Individual Variables Significant?
•
Use t-tests of individual variable slopes
•
Shows if there is a linear relationship between the
variable Xi and Y; Hypotheses:
•
H0: βi = 0 (no linear relationship)
•
H1: βi ≠ 0 (linear relationship does exist between Xi and Y)
•
Test Statistic:
bi  0
tn  k  1 
Sb
i
•
Confidence interval for the population slope βi
bi  tnk 1 Sb
i
Is the Overall Model Significant?
• F-Test for Overall Significance of the Model
• Shows if there is a linear relationship between all of the X
variables considered together and Y
• Use F test statistic; Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent
variable affects Y)
• Test statistic:
SSR
MSR
k
F 

SSE
MSE
n  k 1
Testing Portions of the Multiple
Regression Model
•
•
To find out if inclusion of an individual Xj or a
set of Xs, significantly improves the model,
given that other independent variables are
included in the model
Two Measures:
1. Partial F-test criterion
2. The Coefficient of Partial Determination
Contribution of a Single Independent
Variable Xj
SSR(Xj | all variables except Xj)
= SSR (all variables) – SSR(all variables except Xj)
• Measures the contribution of Xj in explaining the total
variation in Y (SST)
• consider here a 3-variable model:
SSR(X1 | X2 and X3)
= SSR (all variablesX1-x3) – SSR(X2 and X3)
SSRUR
Model
SSRR
Model
The Partial F-Test Statistic
• Consider the hypothesis test:
H0: variable Xj does not significantly improve the model after all
other variables are included
H1: variable Xj significantly improves the model after all other
variables are included
(SSRUR - SSRR)/(df  number of restrictio n)
F
MSE  SSEUR/(n - k - 1)
Note that the numerator is the contribution of Xj to the regression.
If Actual F Statistic is > than the Critical F, then
Conclusion is: Reject H0; adding X1 does improve model
Coefficient of Partial Determination for
one or a set of variables
• Measures the proportion of total variation in the dependent
variable (SST) that is explained by Xj while controlling for
(holding constant) the other explanatory variables
2
rYj.(allvariablesexcept j)
SSRUR - SSRR

SSTUR  SSRR
Using Dummy Variables
• A dummy variable is a categorical
explanatory variable with two levels:
– yes or no, on or off, male or female
– coded as 0 or 1
• Regression intercepts are different if the
variable is significant
• Assumes equal slopes for other variables
• If more than two levels, the number of
dummy variables needed is (number of
levels - 1)
• Different Intercepts, same slope
Ŷ  b 0  b1 X1  b 2 (1)  (b 0  b 2 )  b1 X1
Fire Place
Ŷ  b 0  b1 X1  b 2 (0) 
No Fire Place
b 0  b 1 X1
Y (sales)
b0 + b2
b0
If H0: β2 = 0 is
rejected, then
“Fire Place” has a
significant effect
on Values
Interaction Between Explanatory
Variables
• Hypothesizes interaction between pairs of X variables
• Response to one X variable may vary at different levels of
another X variable
• Contains two-way cross product terms
Ŷ  b 0  b1X1  b 2 X 2  b 3 X 3
 b 0  b1X1  b 2 X 2  b 3 (X1 X 2 )
• Effect of Interaction
– Without interaction term, effect of X1 on Y is measured by β1
– With interaction term, effect of X1 on Y is measured by β1 + β3 X2
– Effect changes as X2 changes
• Example: Suppose X2 is a dummy variable
and the estimated regression equation is
Yˆ = 1 + 2X1 + 3X2 + 4X1X2
Y
0
0.5
1
1.5
X1
Slopes are different if the effect of X1 on Y depends on X2 value
Download