chap14

advertisement
Basic Business Statistics
(9th Edition)
Chapter 14
Introduction to Multiple
Regression
© 2004 Prentice-Hall, Inc.
Chap 14-1
Chapter Topics







The Multiple Regression Model
Residual Analysis
Testing for the Significance of the Regression
Model
Inferences on the Population Regression
Coefficients
Testing Portions of the Multiple Regression
Model
Dummy-Variables and Interaction Terms
Logistic Regression Model
© 2004 Prentice-Hall, Inc.
Chap 14-2
The Multiple Regression Model
Relationship between 1 dependent & 2 or more
independent variables is a linear function
Population
Y-intercept
Population slopes
Random
error
Yi      X1i    X 2i    k X ki   i
Dependent (Response)
variable
© 2004 Prentice-Hall, Inc.
Independent (Explanatory)
variables
Chap 14-3
Multiple Regression Model
Y
Response
Response
Plane
Plane
X
X11
© 2004 Prentice-Hall, Inc.
+  1X
YYi i= 00 
X1i1i + 22XX2i2+i i i
(Observed Y)
(Observed
Y)
 00
i
X22
X 1i ,,X
X
(X
1i 2i2)i
+ 1XX1i +
2X2i
Y| XY|X= 00 
1 1i
2 X 2i
Chap 14-4
Multiple Regression Equation
Y
Y
Response
Response
Plane
Plane
X
X11
Yii =
+ b11X
X11ii 
+ bb22X 2i2i +eeii
 b0 
(Observed
(ObservedYY))
bb00
ei
X
X22
X 11ii , X2i2i)
(X
^
ˆ
+ b 2i
YYi i=bb00+bb1 X
1X11i
i  b22X2i
© 2004 Prentice-Hall, Inc.
Multiple Regression Equation
Chap 14-5
Multiple Regression Equation
Too
complicated
by hand!
© 2004 Prentice-Hall, Inc.
Ouch!
Chap 14-6
Interpretation of Estimated
Coefficients

Slope (bj )



Estimated that the average value of Y changes by
bj for each 1 unit increase in Xj , holding all other
variables constant (ceterus paribus)
Example: If b1 = -2, then fuel oil usage (Y) is
expected to decrease by an estimated 2 gallons for
each 1 degree increase in temperature (X1), given
the inches of insulation (X2)
Y-Intercept (b0)

The estimated average value of Y when all Xj = 0
© 2004 Prentice-Hall, Inc.
Chap 14-7
Multiple Regression Model:
Example
Develop a model for estimating
heating oil used for a single
family home in the month of
January, based on average
temperature and amount of
insulation in inches.
© 2004 Prentice-Hall, Inc.
Oil (Gal) Temp (0F) Insulation
275.30
40
3
363.80
27
3
164.30
40
10
40.80
73
6
94.30
64
6
230.90
34
6
366.70
9
6
300.60
8
10
237.80
23
10
121.40
63
3
31.40
65
10
203.50
41
6
441.10
21
3
323.00
38
3
52.50
58
10
Chap 14-8
Multiple Regression Equation:
Example
Yˆi  b0  b1 X1i  b2 X 2i 
Excel Output
Intercept
X Variable 1
X Variable 2
 bk X ki
Coefficients
562.1510092
-5.436580588
-20.01232067
Yˆi  562.151  5.437 X1i  20.012 X 2i
For each degree increase in
temperature, the estimated average
amount of heating oil used is
decreased by 5.437 gallons,
holding insulation constant.
© 2004 Prentice-Hall, Inc.
For each increase in one inch
of insulation, the estimated
average use of heating oil is
decreased by 20.012 gallons,
holding temperature constant.
Chap 14-9
Multiple Regression in PHStat

PHStat | Regression | Multiple Regression …

Excel spreadsheet for the heating oil example
© 2004 Prentice-Hall, Inc.
Chap 14-10
Venn Diagrams and
Explanatory Power of Regression
Variations in
Temp not used
in explaining
variation in Oil
Temp
© 2004 Prentice-Hall, Inc.
Oil
Variations in
Oil explained
by the error
term  SSE 
Variations in Oil
explained by Temp
or variations in
Temp used in
explaining variation
in Oil  SSR 
Chap 14-11
Venn Diagrams and
Explanatory Power of Regression
(continued)
r 
2
Oil
Temp
© 2004 Prentice-Hall, Inc.
SSR

SSR  SSE
Chap 14-12
Venn Diagrams and
Explanatory Power of Regression
Variation NOT
explained by
Temp nor
Insulation
 SSE 
Temp
© 2004 Prentice-Hall, Inc.
Overlapping
variation in
both Temp and
Oil
Insulation are
used in
explaining the
variation in Oil
but NOT in the
Insulation estimation of 
1
nor  2
Chap 14-13
Coefficient of
Multiple Determination

Proportion of Total Variation in Y Explained by
All X Variables Taken Together
2
Y 12 k
r


SSR Explained Variation


SST
Total Variation
Never Decreases When a New X Variable is
Added to Model

Disadvantage when comparing among models
© 2004 Prentice-Hall, Inc.
Chap 14-14
Venn Diagrams and
Explanatory Power of Regression
Oil
2
Y 12
r

Temp
Insulation
© 2004 Prentice-Hall, Inc.
SSR

SSR  SSE
Chap 14-15
Adjusted Coefficient of Multiple
Determination

Proportion of Variation in Y Explained by All the X
Variables Adjusted for the Sample Size and the
Number of X Variables Used





2
adj
r

2
 1  1  rY 12

n 1 
k 
n  k  1 
Penalizes excessive use of independent variables
2
r
Smaller than Y 12 k
Useful in comparing among models
Can decrease if an insignificant new X variable is
added to the model
© 2004 Prentice-Hall, Inc.
Chap 14-16
Coefficient of Multiple
Determination
Excel Output
2
Y 12
r
R e g re ssi o n S ta ti sti c s
M u lt ip le R
0.982654757
R S q u a re
0.965610371
A d ju s t e d R S q u a re
0.959878766
S t a n d a rd E rro r
26.01378323
O b s e rva t io n s
15
SSR

SST
Adjusted r2
 reflects the number
of explanatory
variables and sample
size
 is smaller than r2
© 2004 Prentice-Hall, Inc.
Chap 14-17
Interpretation of Coefficient of
Multiple Determination

2
Y 12
r


SSR

 .9656
SST
96.56% of the total variation in heating oil can be
explained by temperature and amount of insulation
r  .9599
2
adj

95.99% of the total fluctuation in heating oil can
be explained by temperature and amount of
insulation after adjusting for the number of
explanatory variables and sample size
© 2004 Prentice-Hall, Inc.
Chap 14-18
Simple and Multiple Regression
Compared


The slope coefficient in a simple regression picks up
the impact of the independent variable plus the
impacts of other variables that are excluded from the
model, but are correlated with the included
independent variable and the dependent variable
Coefficients in a multiple regression net out the
impacts of other variables in the equation
 Hence, they are called the net regression coefficients

© 2004 Prentice-Hall, Inc.
They still pick up the effects of other variables that are
excluded from the model, but are correlated with the
included independent variables and the dependent
variable
Chap 14-19
Simple and Multiple Regression
Compared: Example

Two Simple Regressions:



Oil   0  1 Temp  
The three  ’s
are different
Oil   0   2 Insulation  
Multiple Regression:
 Oil    
0
1 Temp   2 Insulation  
The three  0’s do not
have the same value
© 2004 Prentice-Hall, Inc.
The two  2’s do not
have the same value
The two 1’s do not
have the same value
Chap 14-20
Simple and Multiple Regression
Compared: Slope Coefficients
Oil  b0  b1 Temp  b2 Insulation  e
Intercept
Temp
Insulation
Coefficients
562.1510092
-5.436580588
-20.01232067
Oil  b0  b1 Temp  e
Coefficients
436.4382299
-5.462207697
Intercept
Temp
-5.4366  -5.4622
© 2004 Prentice-Hall, Inc.
-20.0123  -20.3503
Oil  b0  b2 Insulation  e
Intercept
Insulation
The three
Coefficients
345.3783784
-20.35027027
e’s are different
Chap 14-21
Simple and Multiple Regression
Compared: r2
Oil  b0  b1 Temp  b2 Insulation  e
Oil  b0  b1 Temp  e
Regression Statistics
Multiple R
0.86974117
R Square
0.756449704
Adjusted R Square 0.737715065
Standard Error
66.51246564
Observations
15
© 2004 Prentice-Hall, Inc.
 0.97275

Regression Statistics
Multiple R
0.982654757
R Square
0.965610371
Adjusted R Square
0.959878766
Standard Error
26.01378323
Observations
15
0.96561 
 0.75645
 0.21630
Oil  b0  b1 Insulation  e
Regression Statistics
Multiple R
0.465082527
R Square
0.216301757
Adjusted R Square 0.156017277
Standard Error
119.3117327
Observations
15
Chap 14-22
Example: Adjusted r2
Can Decrease
Oil   0  1 Temp   2 Insulation  
Regression Statistics
Multiple R
0.982654757
R Square
0.965610371
Adjusted R Square
0.959878766
Standard Error
26.01378323
Observations
15
Try a 3rd explanatory variable
Oil  0  1 Temp   2 Insulation  3 Rainfall  
Regression Statistics
Multiple R
0.983482856
R Square
0.967238528
Adjusted R Square
0.958303581
Standard Error
25.72417272
Observations
15
© 2004 Prentice-Hall, Inc.
Adjusted r 2 decreases when
k increases from 2 to 3
Rainfall is not useful in explaining
the variation in oil consumption.
Chap 14-23
Using the Regression Equation
to Make Predictions
Predict the amount of heating oil used for a
home if the average temperature is 300 and
the insulation is 6 inches.
Yˆi  562.151  5.437 X 1i  20.012 X 2i
 562.151  5.437  30   20.012  6 
 278.969
© 2004 Prentice-Hall, Inc.
The predicted heating oil
used is 278.97 gallons.
Chap 14-24
Predictions in PHStat

PHStat | Regression | Multiple Regression …


Check the “Confidence and Prediction Interval
Estimate” box
Excel spreadsheet for the heating oil example
© 2004 Prentice-Hall, Inc.
Chap 14-25
Residual Plots

Residuals Vs



X1
May need to transform
Residuals Vs


May need to transform Y variable
Residuals Vs

Yˆ
X2
May need to transform
X1 variable
X 2variable
Residuals Vs Time

May have autocorrelation
© 2004 Prentice-Hall, Inc.
Chap 14-26
Residual Plots: Example
T em p eratu re R esid u al P lo t
Maybe some nonlinear relationship
60
Residuals
40
20
Insulation R esidual P lot
0
0
20
40
60
80
-20
-40
-60
0
2
4
6
8
10
12
No Discernable Pattern
© 2004 Prentice-Hall, Inc.
Chap 14-27
Testing for Overall Significance



Shows if Y Depends Linearly on All of the X
Variables Together as a Group
Use F Test Statistic
Hypotheses:




H0:     …  k = 0 (No linear relationship)
H1: At least one j   ( At least one independent
variable affects Y )
The Null Hypothesis is a Very Strong Statement
The Null Hypothesis is Almost Always Rejected
© 2004 Prentice-Hall, Inc.
Chap 14-28
Testing for Overall Significance
(continued)

Test Statistic:


MSR
SSR / k
F

MSE MSE /  n  k  1
Where F has k numerator and (n-k-1)
denominator degrees of freedom
© 2004 Prentice-Hall, Inc.
Chap 14-29
Test for Overall Significance
Excel Output: Example
ANOVA
df
Regression
Residual
Total
SS
MS
F
Significance F
2 228014.6 114007.3 168.4712
1.65411E-09
12 8120.603 676.7169
14 236135.2
k = 2, the number of
explanatory variables
p-value
n-1
MSR
 F Test Statistic
MSE
© 2004 Prentice-Hall, Inc.
Chap 14-30
Test for Overall Significance:
Example Solution
H0: 1 = 2 = … = k = 0
H1: At least one j  0
 = .05
df = 2 and 12
Test Statistic:
F 
168.47
(Excel Output)
Decision:
Reject at  = 0.05.
Critical Value:
Conclusion:
 = 0.05
0
© 2004 Prentice-Hall, Inc.
3.89
There is evidence that at
least one independent
variable affects Y.
F
Chap 14-31
Test for Significance:
Individual Variables



Show If Y Depends Linearly on a Single Xj
Individually While Holding the Effects of Other
X’s Fixed
Use t Test Statistic
Hypotheses:


H0: j  0 (No linear relationship)
H1: j  0 (Linear relationship between Xj and Y)
© 2004 Prentice-Hall, Inc.
Chap 14-32
t Test Statistic
Excel Output: Example
t Test Statistic for X1
(Temperature)
Coefficients Standard Error
t Stat
Intercept
562.1510092
21.09310433 26.65094
Temp
-5.436580588
0.336216167 -16.1699
Insulation -20.01232067
2.342505227 -8.543127
t
© 2004 Prentice-Hall, Inc.
bj
Sb j
P-value
4.77868E-12
1.64178E-09
1.90731E-06
t Test Statistic for X2
(Insulation)
Chap 14-33
t Test : Example Solution
Does temperature have a significant effect on monthly
consumption of heating oil? Test at  = 0.05.
Test Statistic:
H0: 1 = 0
t Test Statistic = -16.1699
H1: 1  0
Decision:
Reject H0 at  = 0.05.
df = 12
Critical Values:
Reject H0
Reject H0
.025
.025
-2.1788
© 2004 Prentice-Hall, Inc.
0 2.1788
t
Conclusion:
There is evidence of a
significant effect of
temperature on oil
consumption holding constant
the effect of insulation. Chap 14-34
Venn Diagrams and
Estimation of Regression Model
Only this
information is
used in the
estimation of
1
Oil
Only this
information is
used in the
estimation of  2
Temp
Insulation
© 2004 Prentice-Hall, Inc.
This
information
is NOT used
in the
estimation
of 1 nor  2
Chap 14-35
Confidence Interval Estimate for
the Slope
Provide the 95% confidence interval for the population
slope 1 (the effect of temperature on oil consumption).
Intercept
Temp
Insulation
Coefficients
562.151009
-5.4365806
-20.012321
b1  tn  p 1Sb1
Lower 95% Upper 95%
516.1930837 608.108935
-6.169132673 -4.7040285
-25.11620102
-14.90844
-6.169  1  -4.704
We are 95% confident that the estimated average consumption of
oil is reduced by between 4.7 gallons to 6.17 gallons per each
increase of 10 F holding insulation constant.
We can also perform the test for the significance of individual
variables, H0: 1 = 0 vs. H1: 1  0, using this confidence interval.
© 2004 Prentice-Hall, Inc.
Chap 14-36
Contribution of a Single
Independent Variable X


j
Let Xj Be the Independent Variable of
Interest
SSR  X j | all others except X j 
 SSR  all   SSR  all others except X j 

Measures the additional contribution of Xj in
explaining the total variation in Y with the inclusion
of all the remaining independent variables
© 2004 Prentice-Hall, Inc.
Chap 14-37
Contribution of a Single
Independent Variable X k
Measures the additional contribution of X1 in
explaining Y with the inclusion of X2 and X3.
SSR  X 1 | X 2 and X 3 
 SSR  X 1 , X 2 and X 3   SSR  X 2 and X 3 
From ANOVA section of
regression for
Yˆi  b0  b1 X1i  b2 X 2i  b3 X 3i
From ANOVA section
of regression for
Yˆi  b0  b2 X 2i  b3 X 3i
Note: the values of the coefficients b0 , b1 , and b2 change
in the two regression equations.
Chap 14-38
© 2004 Prentice-Hall, Inc.
Coefficient of Partial
Determination of X

2
Yj  all others
r
j

SSR  X j | all others 
SST  SSR  all   SSR  X j | all others 

Measures the proportion of variation in the
dependent variable that is explained by Xj
while controlling for (holding constant) the
other independent variables
© 2004 Prentice-Hall, Inc.
Chap 14-39
Coefficient of Partial
Determination for X
j
(continued)
Example: Model with two independent variables
2
Y 1 2
r
SSR  X 1 | X 2 

SST  SSR  X 1 , X 2   SSR  X 1 | X 2 
© 2004 Prentice-Hall, Inc.
Chap 14-40
Venn Diagrams and Coefficient of
Partial Determination for X j
2
Y1  2
r
SSR  X1 | X 2 
Oil

SSR  X1 | X 2 
SST  SSR  X 1 , X 2   SSR  X 1 | X 2 
=
Temp
Insulation
© 2004 Prentice-Hall, Inc.
Chap 14-41
Coefficient of Partial
Determination in PHStat

PHStat | Regression | Multiple Regression …


Check the “Coefficient of Partial Determination”
box
Excel spreadsheet for the heating oil example
© 2004 Prentice-Hall, Inc.
Chap 14-42
Contribution of a Subset of
Independent Variables

Let Xs Be the Subset of Independent Variables
of Interest
SSR  X s | all others except X s 

 SSR  all   SSR  all others except X s 

Measures the contribution of the subset Xs in
explaining SST with the inclusion of the remaining
independent variables
© 2004 Prentice-Hall, Inc.
Chap 14-43
Contribution of a Subset of
Independent Variables: Example
Let Xs be X1 and X3
SSR  X 1 and X 3 | X 2 
 SSR  X 1 , X 2 and X 3   SSR  X 2 
From ANOVA section of
regression for
Yˆi  b0  b1 X1i  b2 X 2i  b3 X 3i
© 2004 Prentice-Hall, Inc.
From ANOVA
section of
regression for
Yˆi  b0  b2 X 2i
Chap 14-44
Testing Portions of Model


Examines the Contribution of a Subset Xs of
Explanatory Variables to the Relationship with Y
Null Hypothesis:


Variables in the subset do not improve the model
significantly when all other variables are included
Alternative Hypothesis:

At least one variable in the subset is significant
when all other variables are included
© 2004 Prentice-Hall, Inc.
Chap 14-45
Testing Portions of Model
(continued)


One-Tailed Rejection Region
Requires Comparison of Two Regressions


One regression includes everything
Another regression includes everything except the
portion to be tested
© 2004 Prentice-Hall, Inc.
Chap 14-46
Partial F Test for the Contribution of
a Subset of X Variables

Hypotheses:



H0 : Variables Xs do not significantly improve the
model given all other variables included
H1 : Variables Xs significantly improve the model
given all others included
Test Statistic:

SSR  X s | all others  / m
F
MSE  all 


with df = m and (n-k-1)
m = # of variables in the subset Xs
© 2004 Prentice-Hall, Inc.
Chap 14-47
Partial F Test for the
Contribution of a Single X j

Hypotheses:



H0 : Variable Xj does not significantly improve
the model given all others included
H1 : Variable Xj significantly improves the
model given all others included
Test Statistic:
SSR  X j | all others 

F
MSE  all 


with df =1 and (n-k-1 )
m = 1 here
© 2004 Prentice-Hall, Inc.
Chap 14-48
Testing Portions of Model:
Example
Test at the  = .05
level to determine if
the variable of
average temperature
significantly improves
the model, given that
insulation is included.
© 2004 Prentice-Hall, Inc.
Chap 14-49
Testing Portions of Model:
Example
H0: X1 (temperature) does
not improve model with X2
(insulation) included
 = .05, df = 1 and 12
Critical Value = 4.75
H1: X1 does improve model
ANOVA
(For X1 and X2)
ANOVA
(For X2)
Regression
Residual
Total
SS
MS
228014.6263 114007.313
8120.603016 676.716918
236135.2293
SS
Regression 51076.47
Residual
185058.8
Total
236135.2
SSR  X 1 | X 2   228, 015  51, 076 
F

 261.47
MSE  X 1 , X 2 
676.717
© 2004 Prentice-Hall, Inc.
Conclusion: Reject H0; X1 does improve model.
Chap 14-50
Testing Portions of Model
in PHStat

PHStat | Regression | Multiple Regression …


Check the “Coefficient of Partial Determination”
box
Excel spreadsheet for the heating oil example
© 2004 Prentice-Hall, Inc.
Chap 14-51
Do We Need to Do This
for One Variable?


The F Test for the Contribution of a Single
Variable After All Other Variables are Included
in the Model is IDENTICAL to the t Test of
the Slope for that Variable
The Only Reason to Perform an F Test is to
Test Several Variables Together
© 2004 Prentice-Hall, Inc.
Chap 14-52
Dummy-Variable Models

Categorical Explanatory Variable with 2 or More
Levels
Only Intercepts are Different
Assumes Equal Slopes Across Categories
The Number of Dummy-Variables Needed is (# of
Levels - 1)
Regression Model Has Same Form:

Two Level Examples




Yi   0  1 X1i   2 X 2i       k X ki   i


© 2004 Prentice-Hall, Inc.
Yes or No, On or Off
Use Dummy-Variable (Coded as 0 or 1)
Chap 14-53
Dummy-Variable Models
(with 2 Levels)
Yˆi  b0  b1 X1i  b2 X 2i
Given:
Y = Assessed Value of House
X1 = Square Footage of House
X2 = Desirability of Neighborhood =
Desirable (X2 = 1)
Yˆi  b0  b1 X1i  b2 (1)  (b0  b2 )  b1 X1i
Undesirable (X2 = 0)
Yˆ  b  b X  b (0)  b  b X
i
© 2004 Prentice-Hall, Inc.
0
1
1i
2
0
1
0 if undesirable
1 if desirable
Same
slopes
1i
Chap 14-54
Dummy-Variable Models
(with 2 Levels)
(continued)
Y (Assessed Value)
Same
slopes
b1
b0 + b2
Intercepts
different
b0
X1 (Square footage)
© 2004 Prentice-Hall, Inc.
Chap 14-55
Interpretation of the DummyVariable Coefficient (with 2 Levels)
Example:
Yˆi  b0  b1 X1i  b2 X 2i  20  5 X1i  6 X 2i
Y : Annual salary of college graduate in thousand $
X1 : GPA
X 2:
0 non-business degree
1 business degree
With the same GPA, college graduates with a business
degree are making an estimated 6 thousand dollars more
than graduates with a non-business degree, on average.
© 2004 Prentice-Hall, Inc.
Chap 14-56
Dummy-Variable Models
(with 3 Levels)
Given:
Y  Assessed Value of the House (1000 $)
X 1  Square Footage of the House
Style of the House = Split-level, Ranch, Tudor
(3 Levels; Need 2 Dummy Variables)
1 if Split-level
1 if Ranch
X2  
X3  
 0 if not
 0 if not
Yˆi  b0  b1 X 1  b2 X 2  b3 X 3
© 2004 Prentice-Hall, Inc.
Chap 14-57
Interpretation of the DummyVariable Coefficients (with 3 Levels)
Given the Estimated Model:
Yˆi  20.43  0.045 X 1i  18.84 X 2i  23.53 X 3i
For Split-level  X 2  1 :
Yˆi  20.43  0.045 X 1i  18.84
For Ranch  X 3  1 :
Yˆi  20.43  0.045 X 1i  23.53
For Tudor:
Yˆi  20.43  0.045 X 1i
© 2004 Prentice-Hall, Inc.
With the same footage, a Splitlevel will have an estimated
average assessed value of 18.84
thousand dollars more than a
Tudor.
With the same footage, a Ranch
will have an estimated average
assessed value of 23.53
thousand dollars more than a
Tudor.
Chap 14-58
Regression Model Containing
an Interaction Term

Hypothesizes Interaction between a Pair of X
Variables



Response to one X variable varies at different
levels of another X variable
Contains a Cross-Product Term
 Yi   0  1 X 1i   2 X 2i   3 X 1i X 2i   i
Can Be Combined with Other Models

E.g., Dummy-Variable Model
© 2004 Prentice-Hall, Inc.
Chap 14-59
Effect of Interaction




Given:
 Yi   0  1 X 1i   2 X 2 i   3 X 1i X 2i   i
Without Interaction Term, Effect of X1 on Y is
Measured by 1
With Interaction Term, Effect of X1 on Y is
Measured by 1 + 3 X2
Effect Changes as X2 Changes
© 2004 Prentice-Hall, Inc.
Chap 14-60
Interaction Example
Y
Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12
8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0
X1
0
0.5
1
1.5
Effect (slope) of X1 on Y depends on X2 value
© 2004 Prentice-Hall, Inc.
Chap 14-61
Interaction Regression Model
Worksheet
Case, i
Yi
X1i
X2i
X1i X2i
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
3
5
2
6
:
3
40
6
30
:
Multiply X1 by X2 to get X1X2
Run regression with Y, X1, X2 , X1X2
© 2004 Prentice-Hall, Inc.
Chap 14-62
Interpretation When There Are
3+ Levels
Consider the effects of gender (male or female) and
working status (working part-time, working full-time
or not working) on income (Y ).
Y   0  1 Male   2 Part-time   3 Full-time
  4 Male  Part-time  5 Male  Full-time  
Male = 0 if female; 1 if male
Part-time = 1 if working part-time; 0 if working full-time or not working
Full-time = 1 if working full-time; 0 if working part-time or not working
Male•Part-time = 1 if male and working part-time; 0 otherwise
= (Male times Part-time)
Male•Full-time = 1 if male working full-time; 0 otherwise
= (Male times Full-time)
© 2004 Prentice-Hall, Inc.
Chap 14-63
Interpretation When There Are
3+ Levels
(continued)
Y   0  1 Male   2 Part-time   3 Full-time
  4 Male  Part-time  5 Male  Full-time  
Not-working Part-time
Full-time
Female

  2
   3
Male
   1
    1    1
  2   4  3  5
© 2004 Prentice-Hall, Inc.
Chap 14-64
Interpreting Results
Female
Not-working:  0
Part-time:  0   2
Male
Not-working:  0  1
Part-time:  0  1
Difference
1
1  4
 2  4
Full-time:
0  3
Full-time:
 0  1
 3  5
1  5
Main Effects : Male, Part-time and Full-time
Interaction Effects : Male•Part-time and Male•Full-time
© 2004 Prentice-Hall, Inc.
Chap 14-65
Evaluating the Presence of
Interaction with Dummy-Variable




Suppose X1 and X2 are Numerical Variables and X3 is a
Dummy-Variable
To Test if the Slope of Y with X1 and/or X2 are the
Same for the Two Levels of X3
Model:
Yi  0  1 X 1i   2 X 2i  3 X 3i   4 X 1i X 3i  5 X 2i X 3i   i
Hypotheses:



H0: 4 = 5 = 0 (No Interaction between X1 and X3 or X2 and
X3 )
H1: 4 and/or 5  0 (X1 and/or X2 Interacts with X3)
Perform a Partial F Test
© 2004 Prentice-Hall, Inc.
SSR( X 1 , X 2 , X 3 , X 4 , X 5 )  SSR( X 1 , X 2 , X 3 )  / 2

F
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 )
Chap 14-66
Evaluating the Presence of
Interaction with Numerical Variables




Suppose X1, X2 and X3 are Numerical Variables
To Test If the Independent Variables Interact with
Each Other
Model:
Yi  0  1 X 1i  2 X 2i  3 X 3i  4 X 1i X 2i  5 X 1i X 3i  6 X 2i X 3i   i
Hypotheses:



H0: 4 = 5 = 6 = 0 (no interaction among X1, X2 and X3 )
H1: at least one of 4, 5, 6  0 (at least one pair of X1, X2,
X3 interact with each other)
Perform a Partial F Test
SSR( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 )  SSR( X 1 , X 2 , X 3 )  / 3

F
© 2004 Prentice-Hall, Inc.
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 )
Chap 14-67
Logistic Regression Model


Enables the Use of Regression Model to
Predict the Probability of a Particular
Categorical Response for a Given Set of
Explanatory Variables
Based on the Odds Ratio

Represents the probability of a success compared
with the probability of failure

© 2004 Prentice-Hall, Inc.
probability of success
Odds ratio 
1  probability of success
Chap 14-68
Logistic Regression Model
(continued)



Logistic Regression Model
 ln  odds ratio      X
 k X ki  i
0
1 1i  2 X 2i 
Logistic Regression Equation
 bk X ki
 ln  estimated odds ratio   b0  b1 X 1i  b2 X 2i 
Estimated Odds Ratio


lnestimated odds ratio
e
Estimated Probability of Success

estimated odds ratio
1  estimated odds ratio
© 2004 Prentice-Hall, Inc.
Chap 14-69
Interpretation of Estimated
Slope Coefficients


Logistic Regression Equation Has to be
Estimated Using Computer Statistical
Software, e.g. Minitab®
The Estimated Slope Coefficient bj Measures
the Estimated Change in the Natural
Logarithm of the Odds Ratio as a Result of a
One Unit Change in the Independent Variable
Xj Holding Constant the Effects of all the
Other Independent Variables
© 2004 Prentice-Hall, Inc.
Chap 14-70
The Deviance Statistic


Use to Test whether the Logistic
Regression is a Good-Fitting Model
Hypotheses



H0 : The model is a good-fitting model
H1 : The model is not a good-fitting model
Test Statistic


The deviance statistic has a c distribution with
(n – k – 1) degrees of freedom
The rejection region is always in the upper tail
© 2004 Prentice-Hall, Inc.
Chap 14-71
Testing Significance of an
Independent Variable

Hypotheses
H0 :  j  0
H1 :  j  0



(Xj is not significant)
(Xj is significant)
Test Statistic


The Wald statistic is normally distributed
A two-tail test with left and right-tail rejection
regions
© 2004 Prentice-Hall, Inc.
Chap 14-72
Chapter Summary







Developed the Multiple Regression Model
Discussed Residual Plots
Addressed Testing the Significance of the
Multiple Regression Model
Discussed Inferences on Population
Regression Coefficients
Addressed Testing Portions of the Multiple
Regression Model
Discussed Dummy-Variables and Interaction
Terms
Addressed Logistic Regression Model
© 2004 Prentice-Hall, Inc.
Chap 14-73
Download