Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah : I0174 – Analisis Regresi Tahun

advertisement
Matakuliah : I0174 – Analisis Regresi
Tahun
: Ganjil 2007/2008
Interaksi Dalam Regresi (Lanjutan)
Pertemuan 25
Pertemuan 25
Interaksi dalam Regresi (Lanjutan)
 Uji kesamaan koefisien arah regresi
 Suatu Contoh perhitungan
Bina Nusantara
The Multiple Regression Model
Relationship between 1 dependent & 2 or more
independent variables is a linear function
Population
Y-intercept
Population slopes
Random
error
Yi      X1i    X 2i    k X ki   i
Dependent (Response)
variable
Bina Nusantara
Independent (Explanatory)
variables
Bivariate model
Multiple Regression Model
+  1X
YYi i= 00 
X1i1i + 22XX2i2+i i i
Y
Response
Response
Plane
Plane
X
X11
Bina Nusantara
(Observed Y)
(Observed
Y)
 00
i
X22
X 1i ,,X
X
(X
1i 2i2)i
+ 1XX1i +
2X2i
Y| XY|X= 00 
1 1i
2 X 2i
Multiple Regression Equation
Yii =
+ b11X
X11ii 
+ bb22X 2i2i +eeii
 b0 
Bivariate model
Y
Y
Response
Response
Plane
Plane
X
X11
(Observed
(ObservedYY))
bb00
ei
X
X22
X 11ii , X2i2i)
(X
^
ˆ
+ b 2i
YYi i=bb00+bb1 X
1X11i
i  b22X2i
Bina Nusantara
Multiple Regression Equation
Interpretation of Estimated Coefficients
• Slope (bj )
– Estimated that the average value of Y changes by bj for each 1 unit
increase in Xj , holding all other variables constant (ceterus paribus)
– Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by
an estimated 2 gallons for each 1 degree increase in temperature
(X1), given the inches of insulation (X2)
• Y-Intercept (b0)
– The estimated average value of Y when all Xj = 0
Bina Nusantara
Multiple Regression Model: Example
Develop a model for estimating
heating oil used for a single
family home in the month of
January, based on average
temperature and amount of
insulation in inches.
Bina Nusantara
Oil (Gal) Temp (0F) Insulation
275.30
40
3
363.80
27
3
164.30
40
10
40.80
73
6
94.30
64
6
230.90
34
6
366.70
9
6
300.60
8
10
237.80
23
10
121.40
63
3
31.40
65
10
203.50
41
6
441.10
21
3
323.00
38
3
52.50
58
10
Multiple Regression Equation: Example
Yˆi  b0  b1 X1i  b2 X 2i 
Excel Output
Intercept
X Variable 1
X Variable 2
 bk X ki
Coefficients
562.1510092
-5.436580588
-20.01232067
Yˆi  562.151  5.437 X1i  20.012 X 2i
For each degree increase in
temperature, the estimated average
amount of heating oil used is
decreased by 5.437 gallons,
holding insulation constant.
Bina Nusantara
For each increase in one inch
of insulation, the estimated
average use of heating oil is
decreased by 20.012 gallons,
holding temperature constant.
Multiple Regression in PHStat
• PHStat | Regression | Multiple Regression …
• Excel spreadsheet for the heating oil example
Bina Nusantara
Venn Diagrams and
Explanatory Power of Regression
Variations in
Temp not used
in explaining
variation in Oil
Temp
Bina Nusantara
Oil
Variations in
Oil explained
by the error
term  SSE 
Variations in Oil
explained by Temp
or variations in
Temp used in
explaining variation
in Oil  SSR 
Venn Diagrams and
Explanatory Power of Regression
(continued)
r 
2
Oil
Temp
Bina Nusantara
SSR

SSR  SSE
Venn Diagrams and
Explanatory Power of Regression
Overlapping
Variation NOT
variation in
explained by
both Temp and
Temp nor
Oil
Insulation are
Insulation
used in
 SSE 
explaining the
variation in Oil
Temp
but NOT in the
Insulation estimation of 
1
nor  2
Bina Nusantara
Coefficient of
Multiple Determination
• Proportion of Total Variation in Y Explained by All X Variables Taken
Together
–
2
Y 12 k
r
SSR Explained Variation


SST
Total Variation
• Never Decreases When a New X Variable is Added to Model
– Disadvantage when comparing among models
Bina Nusantara
Venn Diagrams and
Explanatory Power of Regression
Oil
2
Y 12
r

Temp
Insulation
Bina Nusantara
SSR

SSR  SSE
Adjusted Coefficient of Multiple Determination
• Proportion of Variation in Y Explained by All the X Variables Adjusted
for the Sample Size and the Number of X Variables Used
n 1 
– 2

2
radj  1  1  rY 12 k 
n  k  1 

– Penalizes excessive use of independent variables
2
– Smaller than rY 12 k
– Useful in comparing among models
– Can decrease if an insignificant new X variable is added to the model
Bina Nusantara
Coefficient of Multiple Determination
Excel Output
rY2,12
R e g re ssi o n S ta ti sti c s
M u lt ip le R
0.982654757
R S q u a re
0.965610371
A d ju s t e d R S q u a re
0.959878766
S t a n d a rd E rro r
26.01378323
O b s e rva t io n s
15
SSR

SST
Adjusted r2
 reflects the number
of explanatory
variables and sample
size
 is smaller than r2
Bina Nusantara
Interpretation of Coefficient of Multiple
Determination
•
2
Y 12
r
SSR

 .9656
SST
– 96.56% of the total variation in heating oil can be explained by
temperature and amount of insulation
•
r  .9599
2
adj
– 95.99% of the total fluctuation in heating oil can be explained by
temperature and amount of insulation after adjusting for the number
of explanatory variables and sample size
Bina Nusantara
Simple and Multiple Regression Compared
• The slope coefficient in a simple regression picks up the impact of the
independent variable plus the impacts of other variables that are
excluded from the model, but are correlated with the included
independent variable and the dependent variable
• Coefficients in a multiple regression net out the impacts of other
variables in the equation
– Hence, they are called the net regression coefficients
– They still pick up the effects of other variables that are excluded
from the model, but are correlated with the included independent
variables and the dependent variable
Bina Nusantara
Simple and Multiple Regression Compared: Example
• Two Simple Regressions:
– Oil   0  1 Temp  
–
Oil   0   2 Insulation  
• Multiple Regression:
– Oil   0  1 Temp   2 Insulation  
Bina Nusantara
Simple and Multiple Regression Compared: Slope
Coefficients
Oil  b0  b1 Temp  b2 Insulation  e
Intercept
Temp
Insulation
Coefficients
562.1510092
-5.436580588
-20.01232067
Oil  b0  b1 Temp  e
Intercept
Temp
Bina Nusantara
Coefficients
436.4382299
-5.462207697
-20.0123  -20.3503
Oil  b0  b2 Insulation  e
Intercept
Insulation
-5.4366  -5.4622
Coefficients
345.3783784
-20.35027027
Simple and Multiple Regression Compared: r2
Oil   0  1 Temp   2 Insulation  
Oil   0  1 Temp  
Regression Statistics
Multiple R
0.86974117
R Square
0.756449704
Adjusted R Square 0.737715065
Standard Error
66.51246564
Observations
15
Bina Nusantara
 0.97275

Regression Statistics
Multiple R
0.982654757
R Square
0.965610371
Adjusted R Square
0.959878766
Standard Error
26.01378323
Observations
15
0.96561 
 0.75645
 0.21630
Oil   0  1 Insulation  
Regression Statistics
Multiple R
0.465082527
R Square
0.216301757
Adjusted R Square 0.156017277
Standard Error
119.3117327
Observations
15
Example: Adjusted r2
Can Decrease
Oil   0  1 Temp   2 Insulation  
Regression Statistics
Multiple R
0.982654757
R Square
0.965610371
Adjusted R Square
0.959878766
Standard Error
26.01378323
Observations
15
Oil   0  1 Temp   2 Insulation  3 Color  
Regression Statistics
Multiple R
0.983482856
R Square
0.967238528
Adjusted R Square
0.958303581
Standard Error
25.72417272
Observations
15
Bina Nusantara
Adjusted r 2 decreases when
k increases from 2 to 3
Color is not useful in explaining
the variation in oil consumption.
Using the Regression Equation to Make Predictions
Predict the amount of heating oil used for a
home if the average temperature is 300 and
the insulation is 6 inches.
Yˆi  562.151  5.437 X 1i  20.012 X 2i
 562.151  5.437  30   20.012  6 
 278.969
Bina Nusantara
The predicted heating oil
used is 278.97 gallons.
Test for Overall Significance
Excel Output: Example
ANOVA
df
Regression
Residual
Total
SS
MS
F
Significance F
2 228014.6 114007.3 168.4712
1.65411E-09
12 8120.603 676.7169
14 236135.2
k = 2, the number of
explanatory variables
p-value
n-1
MSR
 F Test Statistic
MSE
Bina Nusantara
Test for Overall Significance:
Example Solution
H0: 1 = 2 = … = k = 0
H1: At least one j  0
Test Statistic:
F 
 = .05
df = 2 and 12
168.47
(Excel Output)
Decision:
Reject at  = 0.05.
Critical Value:
Conclusion:
 = 0.05
0
Bina Nusantara
3.89
F
There is evidence that at
least one independent
variable affects Y.
Test for Significance:
Individual Variables
• Show If Y Depends Linearly on a Single Xj Individually While
Holding the Effects of Other X’s Fixed
• Use t Test Statistic
• Hypotheses:
– H0: j  0 (No linear relationship)
– H1: j  0 (Linear relationship between Xj and Y)
Bina Nusantara
t Test Statistic
Excel Output: Example
t Test Statistic for X1
(Temperature)
Coefficients Standard Error
t Stat
Intercept
562.1510092
21.09310433 26.65094
Temp
-5.436580588
0.336216167 -16.1699
Insulation -20.01232067
2.342505227 -8.543127
bi
t
Sbi
Bina Nusantara
P-value
4.77868E-12
1.64178E-09
1.90731E-06
t Test Statistic for X2
(Insulation)
t Test : Example Solution
Does temperature have a significant effect on monthly
consumption of heating oil? Test at  = 0.05.
Test Statistic:
H0: 1 = 0
t Test Statistic = -16.1699
H1: 1  0
Decision:
Reject H0 at  = 0.05.
df = 12
Critical Values:
Reject H0
Reject H0
.025
.025
-2.1788
Bina Nusantara
0 2.1788
t
Conclusion:
There is evidence of a
significant effect of
temperature on oil
consumption holding constant
the effect of insulation.
Venn Diagrams and
Estimation of Regression Model
Only this
information is
used in the
estimation of
1
Oil
Only this
information is
used in the
estimation of  2
Temp
Insulation
Bina Nusantara
This
information
is NOT used
in the
estimation
of 1 nor  2
Confidence Interval Estimate for the Slope
Provide the 95% confidence interval for the population
slope 1 (the effect of temperature on oil consumption).
Intercept
Temp
Insulation
Coefficients
562.151009
-5.4365806
-20.012321
b1  tn  p 1Sb1
Lower 95% Upper 95%
516.1930837 608.108935
-6.169132673 -4.7040285
-25.11620102
-14.90844
-6.169  1  -4.704
We are 95% confident that the estimated average consumption of
oil is reduced by between 4.7 gallons to 6.17 gallons per each
increase of 10 F holding insulation constant.
We can also perform the test for the significance of individual
variables, H0: 1 = 0 vs. H1: 1  0, using this confidence interval.
Bina Nusantara
Contribution of a Single
Independent Variable X j
• Let Xj Be the Independent Variable of Interest
• SSR X | all others except X
j
j


 SSR  all   SSR  all others except X j 
– Measures the additional contribution of Xj in explaining the total
variation in Y with the inclusion of all the remaining independent
variables
Bina Nusantara
Contribution of a Single Independent Variable X k
SSR  X 1 | X 2 and X 3 
 SSR  X 1 , X 2 and X 3   SSR  X 2 and X 3 
From ANOVA section of
regression for
Yˆi  b0  b1 X1i  b2 X 2i  b3 X 3i
From ANOVA section
of regression for
Yˆi  b0  b2 X 2i  b3 X 3i
Measures the additional contribution of X1 in
explaining Y with the inclusion of X2 and X3.
Bina Nusantara
Coefficient of Partial Determination of X j
• rYj all others 
2
SSR  X j | all others 
SST  SSR  all   SSR  X j | all others 
• Measures the proportion of variation in the dependent variable that is
explained by Xj while controlling for (holding constant) the other
independent variables
Bina Nusantara
Coefficient of Partial Determination for X j (continued)
Example: Model with two independent variables
2
Y 1 2
r
Bina Nusantara
SSR  X 1 | X 2 

SST  SSR  X 1 , X 2   SSR  X 1 | X 2 
Venn Diagrams and Coefficient of Partial Determination for X j
2
Y1  2
r
SSR  X1 | X 2 
Oil

SSR  X1 | X 2 
SST  SSR  X 1 , X 2   SSR  X 1 | X 2 
=
Temp
Insulation
Bina Nusantara
Coefficient of Partial Determination in PHStat
• PHStat | Regression | Multiple Regression …
– Check the “Coefficient of Partial Determination” box
• Excel spreadsheet for the heating oil example
Bina Nusantara
Contribution of a Subset of Independent Variables
• Let Xs Be the Subset of Independent Variables of Interest
–
SSR  X s | all others except X s 
 SSR  all   SSR  all others except X s 
– Measures the contribution of the subset Xs in explaining SST
with the inclusion of the remaining independent variables
Bina Nusantara
Contribution of a Subset of Independent Variables:
Example
Let Xs be X1 and X3
SSR  X 1 and X 3 | X 2 
 SSR  X 1 , X 2 and X 3   SSR  X 2 
From ANOVA section of
regression for
Yˆi  b0  b1 X1i  b2 X 2i  b3 X 3i
Bina Nusantara
From ANOVA
section of
regression for
Yˆi  b0  b2 X 2i
Testing Portions of Model
• Examines the Contribution of a Subset Xs of Explanatory Variables to
the Relationship with Y
• Null Hypothesis:
– Variables in the subset do not improve the model significantly when
all other variables are included
• Alternative Hypothesis:
– At least one variable in the subset is significant when all other
variables are included
Bina Nusantara
Testing Portions of Model
(continued)
• One-Tailed Rejection Region
• Requires Comparison of Two Regressions
– One regression includes everything
– Another regression includes everything except the portion to be
tested
Bina Nusantara
Partial F Test for the Contribution of a Subset of X Variables
• Hypotheses:
– H0 : Variables Xs do not significantly improve the model given all
other variables included
– H1 : Variables Xs significantly improve the model given all others
included
• Test Statistic:
–
SSR X | all others / m
F

s
MSE  all 

– with df = m and (n-k-1)
– m = # of variables in the subset Xs
Bina Nusantara
Partial F Test for the Contribution of a Single X j
• Hypotheses:
– H0 : Variable Xj does not significantly improve the model given all
others included
– H1 : Variable Xj significantly improves the model given all others
included
• Test Statistic:
– F  SSR  X j | all others 
MSE  all 
– with df = 1 and (n-k-1 )
– m = 1 here
Bina Nusantara
Testing Portions of Model: Example
Test at the  = .05
level to determine if
the variable of
average temperature
significantly improves
the model, given that
insulation is included.
Bina Nusantara
Testing Portions of Model: Example
H0: X1 (temperature) does
not improve model with X2
(insulation) included
 = .05, df = 1 and 12
Critical Value = 4.75
H1: X1 does improve model
ANOVA
(For X1 and X2)
ANOVA
(For X2)
Regression
Residual
Total
SS
MS
228014.6263 114007.313
8120.603016 676.716918
236135.2293
SS
Regression 51076.47
Residual
185058.8
Total
236135.2
SSR  X 1 | X 2   228, 015  51, 076 
F

 261.47
MSE  X 1 , X 2 
676.717
Bina Nusantara
Conclusion: Reject H0; X1 does improve model.
Testing Portions of Model
in PHStat
• PHStat | Regression | Multiple Regression …
– Check the “Coefficient of Partial Determination” box
• Excel spreadsheet for the heating oil example
Bina Nusantara
Do We Need to Do This
for One Variable?
• The F Test for the Contribution of a Single Variable After All Other
Variables are Included in the Model is IDENTICAL to the t Test of the
Slope for that Variable
• The Only Reason to Perform an F Test is to Test Several Variables
Together
Bina Nusantara
Dummy-Variable Models
•
•
•
•
•
•
•
Categorical Explanatory Variable with 2 or More Levels
Yes or No, On or Off, Male or Female,
Use Dummy-Variables (Coded as 0 or 1)
Only Intercepts are Different
Assumes Equal Slopes Across Categories
The Number of Dummy-Variables Needed is (# of Levels - 1)
Regression Model Has Same Form:
Yi   0  1 X1i   2 X 2i       k X ki   i
Bina Nusantara
Dummy-Variable Models
(with 2 Levels)
Given: Yˆi  b0  b1 X1i  b2 X 2i
Y = Assessed Value of House
X1 = Square Footage of House
X2 = Desirability of Neighborhood =
Desirable (X2 = 1)
Yˆi  b0  b1 X1i  b2 (1)  (b0  b2 )  b1 X1i
Undesirable (X2 = 0)
Yˆ  b  b X  b (0)  b  b X
i
Bina Nusantara
0
1
1i
2
0
1
1i
0 if
undesirable
1 if desirable
Same
slopes
Dummy-Variable Models
(with 2 Levels)
(continued)
Y (Assessed Value)
Same
slopes
b1
b0 + b2
Intercepts
different
b0
X1 (Square footage)
Bina Nusantara
Interpretation of the Dummy-Variable Coefficient (with 2
Levels)
Example:
Yˆi  b0  b1 X1i  b2 X 2i  20  5 X1i  6 X 2i
Y : Annual salary of college graduate in thousand $
X1 : GPA
X 2:
0 non-business degree
1 business degree
With the same GPA, college graduates with a business
degree are making an estimated 6 thousand dollars more
than graduates with a non-business degree, on average.
Bina Nusantara
Dummy-Variable Models
(with 3 Levels)
Given:
Y  Assessed Value of the House (1000 $)
X 1  Square Footage of the House
Style of the House = Split-level, Ranch, Condo
(3 Levels; Need 2 Dummy Variables)
1 if Split-level
1 if Ranch
X2  
X3  
 0 if not
 0 if not
Yˆi  b0  b1 X 1  b2 X 2  b3 X 3
Bina Nusantara
Interpretation of the Dummy-Variable Coefficients (with 3
Levels)
Given the Estimated Model:
Yˆi  20.43  0.045 X 1i  18.84 X 2i  23.53 X 3i
For Split-level  X 2  1 :
Yˆi  20.43  0.045 X 1i  18.84
For Ranch  X 3  1 :
Yˆi  20.43  0.045 X 1i  23.53
For Condo:
Yˆ  20.43  0.045 X
i
Bina Nusantara
1i
With the same footage, a Splitlevel will have an estimated
average assessed value of 18.84
thousand dollars more than a
Condo.
With the same footage, a Ranch
will have an estimated average
assessed value of 23.53
thousand dollars more than a
Condo.
Regression Model Containing
an Interaction Term
• Hypothesizes Interaction between a Pair of X Variables
– Response to one X variable varies at different levels of another X
variable
• Contains a Cross-Product Term
– Yi   0  1 X1i   2 X 2i   3 X 1i X 2i   i
• Can Be Combined with Other Models
– E.g., Dummy-Variable Model
Bina Nusantara
Effect of Interaction
• Given:
– Yi   0  1 X 1i   2 X 2i   3 X 1i X 2i   i
• Without Interaction Term, Effect of X1 on Y is Measured by 1
• With Interaction Term, Effect of X1 on Y is Measured by 1 + 3 X2
• Effect Changes as X2 Changes
Bina Nusantara
Interaction Example
Y
Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12
8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0
X1
0
0.5
1
1.5
Effect (slope) of X1 on Y depends on X2 value
Bina Nusantara
Interaction Regression Model Worksheet
Case, i
Yi
X1i
X2i
X1i X2i
1
1
1
3
3
2
3
4
1
8
3
5
2
40
6
4
3
5
6
30
:
:
:
:
:
Multiply X1 by X2 to get X1X2
Run regression with Y, X1, X2 , X1X2
Bina Nusantara
Interpretation When There Are 3+ Levels
Y   0  1MALE   2 MARRIED   3DIVORCED
  4 MALE  MARRIED   5 MALE  DIVORCED
MALE = 0 if female and 1 if male
MARRIED = 1 if married; 0 if not
DIVORCED = 1 if divorced; 0 if not
MALE•MARRIED = 1 if male married; 0 otherwise
= (MALE times MARRIED)
MALE•DIVORCED = 1 if male divorced; 0 otherwise
= (MALE times DIVORCED)
Bina Nusantara
Interpretation When There Are 3+ Levels (continued)
Y   0  1MALE   2 MARRIED   3DIVORCED
  4 MALE  MARRIED   5 MALE  DIVORCED
SINGLE
MARRIED
DIVORCED
FEMALE

MALE
   1
   1     1
  2   4  3  5
Bina Nusantara
  2
   3
Interpreting Results
FEMALE
Single:
Married:
Divorced:
MALE
Difference
0
1
Single:  0  1
 0   2 Married:  0  1   2   4
1  4
 0   3 Divorced:  0  1   3   5 1  5
Main Effects : MALE, MARRIED and DIVORCED
Interaction Effects : MALE•MARRIED and
MALE•DIVORCED
Bina Nusantara
Evaluating the Presence of Interaction with DummyVariable
• Suppose X1 and X2 are Numerical Variables and X3 is a Dummy-Variable
• To Test if the Slope of Y with X1 and/or X2 are the Same for the Two
Levels of X3
• Model:
Yi  0  1 X 1i   2 X 2i  3 X 3i   4 X 1i X 3i  5 X 2i X 3i   i
• Hypotheses:
– H0: 4 = 5 = 0 (No Interaction between X1 and X3 or X2 and X3 )
– H1: 4 and/or 5  0 (X1 and/or X2 Interacts with X3)
• Perform a Partial F Test
F
Bina Nusantara
 SSR( X 1, X 2 , X 3 , X 4 , X 5 )  SSR( X 1, X 2 , X 3 )  / 2
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 )
Evaluating the Presence of Interaction with Numerical
Variables
• Suppose X1, X2 and X3 are Numerical Variables
• To Test If the Independent Variables Interact with Each Other
• Model:
Yi  0  1 X 1i  2 X 2i  3 X 3i  4 X 1i X 2i  5 X 1i X 3i  6 X 2i X 3i   i
• Hypotheses:
– H0: 4 = 5 = 6 = 0 (no interaction among X1, X2 and X3 )
– H1: at least one of 4, 5, 6  0 (at least one pair of X1, X2, X3
interact with each other)
• Perform a Partial F Test
F
Bina Nusantara
 SSR( X 1, X 2 , X 3 , X 4 , X 5 , X 6 )  SSR( X 1, X 2 , X 3 )  / 3
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 )
Download