Pemeriksaan Sisa dan Data Berpengaruh Pertemuan 17 Tahun

advertisement
Matakuliah : I0174 – Analisis Regresi
Tahun
: Ganjil 2007/2008
Pemeriksaan Sisa dan Data Berpengaruh
Pertemuan 17
Pemeriksaan Sisa dan Data Berpengaruh
Kekurang Cocokan
Gunanya Pemeriksaan Sisa
Bina Nusantara
Chapter Topics
• The Multiple Regression Model
• Residual Analysis
• Testing for the Significance of the Regression Model
• Inferences on the Population Regression Coefficients
• Testing Portions of the Multiple Regression Model
Bina Nusantara
The Multiple Regression Model
Relationship between 1 dependent & 2 or more
independent variables is a linear function
Population
Y-intercept
Population slopes
Random
error
Yi      X1i    X 2i    k X ki   i
Dependent (Response)
variable
Bina Nusantara
Independent (Explanatory)
variables
Bivariate model
Multiple Regression Model
+  1X
YYi i= 00 
X1i1i + 22XX2i2+i i i
Y
Response
Response
Plane
Plane
X
X11
Bina Nusantara
(Observed Y)
(Observed
Y)
 00
i
X22
X 1i ,,X
X
(X
1i 2i2)i
+ 1XX1i +
2X2i
Y| XY|X= 00 
1 1i
2 X 2i
Multiple Regression Equation
Yii =
+ b11X
X11ii 
+ bb22X 2i2i +eeii
 b0 
Bivariate model
Y
Y
Response
Response
Plane
Plane
X
X11
(Observed
(ObservedYY))
bb00
ei
X
X22
X 11ii , X2i2i)
(X
^
ˆ
+ b 2i
YYi i=bb00+bb1 X
1X11i
i  b22X2i
Bina Nusantara
Multiple Regression Equation
Multiple Regression Equation
Too
complicated
by hand!
Bina Nusantara
Ouch!
Interpretation of Estimated Coefficients
• Slope (bj )
– Estimated that the average value of Y changes by bj for each 1
unit increase in Xj , holding all other variables constant (ceterus
paribus)
– Example: If b1 = -2, then fuel oil usage (Y) is expected to
decrease by an estimated 2 gallons for each 1 degree increase in
temperature (X1), given the inches of insulation (X2)
• Y-Intercept (b0)
– The estimated average value of Y when all Xj = 0
Bina Nusantara
Multiple Regression Model: Example
Develop a model for estimating
heating oil used for a single
family home in the month of
January, based on average
temperature and amount of
insulation in inches.
Bina Nusantara
Oil (Gal) Temp (0F) Insulation
275.30
40
3
363.80
27
3
164.30
40
10
40.80
73
6
94.30
64
6
230.90
34
6
366.70
9
6
300.60
8
10
237.80
23
10
121.40
63
3
31.40
65
10
203.50
41
6
441.10
21
3
323.00
38
3
52.50
58
10
Multiple Regression Equation: Example
Yˆi  b0  b1 X1i  b2 X 2i 
Excel Output
Intercept
X Variable 1
X Variable 2
 bk X ki
Coefficients
562.1510092
-5.436580588
-20.01232067
Yˆi  562.151  5.437 X1i  20.012 X 2i
For each degree increase in
temperature, the estimated average
amount of heating oil used is
decreased by 5.437 gallons,
holding insulation constant.
Bina Nusantara
For each increase in one inch
of insulation, the estimated
average use of heating oil is
decreased by 20.012 gallons,
holding temperature constant.
Venn Diagrams and
Explanatory Power of Regression
Variations in
Temp not used
in explaining
variation in Oil
Temp
Bina Nusantara
Oil
Variations in
Oil explained
by the error
term  SSE 
Variations in Oil
explained by Temp
or variations in
Temp used in
explaining variation
in Oil  SSR 
Venn Diagrams and
Explanatory Power of Regression
(continued)
r 
2
Oil
Temp
Bina Nusantara
SSR

SSR  SSE
Venn Diagrams and
Explanatory Power of Regression
Overlapping
Variation NOT
variation in
explained by
both Temp and
Temp nor
Oil
Insulation are
Insulation
used in
 SSE 
explaining the
variation in Oil
Temp
but NOT in the
Insulation estimation of 
1
nor  2
Bina Nusantara
Coefficient of
Multiple Determination
• Proportion of Total Variation in Y Explained by All X Variables
Taken Together
–
2
Y 12 k
r
SSR Explained Variation


SST
Total Variation
• Never Decreases When a New X Variable is Added to Model
– Disadvantage when comparing among models
Bina Nusantara
Venn Diagrams and
Explanatory Power of Regression
Oil
2
Y 12
r

Temp
Insulation
Bina Nusantara
SSR

SSR  SSE
Adjusted Coefficient of Multiple Determination
• Proportion of Variation in Y Explained by All the X Variables Adjusted
for the Sample Size and the Number of X Variables Used
n 1 
– r2  1  1 r2
adj
–
–
–
–
Bina Nusantara

Y 12
k
 n  k  1
Penalizes excessive
use of independent variables
2
rY 12 k
Smaller than
Useful in comparing among models
Can decrease if an insignificant new X variable is added to the
model
Coefficient of Multiple Determination
Excel Output
rY2,12
R e g re ssi o n S ta ti sti c s
M u lt ip le R
0.982654757
R S q u a re
0.965610371
A d ju s t e d R S q u a re
0.959878766
S t a n d a rd E rro r
26.01378323
O b s e rva t io n s
15
SSR

SST
Adjusted r2
 reflects the number
of explanatory
variables and sample
size
 is smaller than r2
Bina Nusantara
Interpretation of Coefficient of Multiple
Determination
•
2
Y 12
r
SSR

 .9656
SST
– 96.56% of the total variation in heating oil can be explained by
temperature and amount of insulation
•
r  .9599
2
adj
– 95.99% of the total fluctuation in heating oil can be explained by
temperature and amount of insulation after adjusting for the
number of explanatory variables and sample size
Bina Nusantara
Simple and Multiple Regression Compared
• The slope coefficient in a simple regression picks up the impact of the
independent variable plus the impacts of other variables that are
excluded from the model, but are correlated with the included
independent variable and the dependent variable
• Coefficients in a multiple regression net out the impacts of other
variables in the equation
– Hence, they are called the net regression coefficients
– They still pick up the effects of other variables that are excluded
from the model, but are correlated with the included independent
variables and the dependent variable
Bina Nusantara
Simple and Multiple Regression Compared: Example
• Two Simple Regressions:
–
–
Oil   0  1 Temp  
Oil   0   2 Insulation  
• Multiple Regression:
–
Bina Nusantara
Oil   0  1 Temp   2 Insulation  
Simple and Multiple Regression Compared: Slope
Coefficients
Oil  b0  b1 Temp  b2 Insulation  e
Intercept
Temp
Insulation
Coefficients
562.1510092
-5.436580588
-20.01232067
Oil  b0  b1 Temp  e
Intercept
Temp
Bina Nusantara
Coefficients
436.4382299
-5.462207697
-20.0123  -20.3503
Oil  b0  b2 Insulation  e
Intercept
Insulation
-5.4366  -5.4622
Coefficients
345.3783784
-20.35027027
Simple and Multiple Regression Compared: r2
Oil   0  1 Temp   2 Insulation  
Oil   0  1 Temp  
Regression Statistics
Multiple R
0.86974117
R Square
0.756449704
Adjusted R Square 0.737715065
Standard Error
66.51246564
Observations
15
Bina Nusantara
 0.97275

Regression Statistics
Multiple R
0.982654757
R Square
0.965610371
Adjusted R Square
0.959878766
Standard Error
26.01378323
Observations
15
0.96561 
 0.75645
 0.21630
Oil   0  1 Insulation  
Regression Statistics
Multiple R
0.465082527
R Square
0.216301757
Adjusted R Square 0.156017277
Standard Error
119.3117327
Observations
15
Example: Adjusted r2
Can Decrease
Oil   0  1 Temp   2 Insulation  
Regression Statistics
Multiple R
0.982654757
R Square
0.965610371
Adjusted R Square
0.959878766
Standard Error
26.01378323
Observations
15
Oil   0  1 Temp   2 Insulation  3 Color  
Regression Statistics
Multiple R
0.983482856
R Square
0.967238528
Adjusted R Square
0.958303581
Standard Error
25.72417272
Observations
15
Bina Nusantara
Adjusted r 2 decreases when
k increases from 2 to 3
Color is not useful in explaining
the variation in oil consumption.
Using the Regression Equation to Make Predictions
Predict the amount of heating oil used for a
home if the average temperature is 300 and
the insulation is 6 inches.
Yˆi  562.151  5.437 X 1i  20.012 X 2i
 562.151  5.437  30   20.012  6 
 278.969
Bina Nusantara
The predicted heating oil
used is 278.97 gallons.
Predictions in PHStat
• PHStat | Regression | Multiple Regression …
– Check the “Confidence and Prediction Interval Estimate” box
• Excel spreadsheet for the heating oil example
Bina Nusantara
Residual Plots
• Residuals Vs Yˆ
– May need to transform Y variable
• Residuals Vs X1
– May need to transform
X1 variable
• Residuals Vs X 2
– May need to transform X 2variable
• Residuals Vs Time
– May have autocorrelation
Bina Nusantara
Residual Plots: Example
T em p eratu re R esid u al P lo t
Maybe some nonlinear relationship
60
Residuals
40
20
Insulation R esidual P lot
0
0
20
40
60
80
-20
-40
-60
0
No Discernable Pattern
Bina Nusantara
2
4
6
8
10
12
Testing for Overall Significance
• Shows if Y Depends Linearly on All of the X Variables Together as a
Group
• Use F Test Statistic
• Hypotheses:
– H0:     …  k = 0 (No linear relationship)
– H1: At least one i   ( At least one independent variable affects Y )
• The Null Hypothesis is a Very Strong Statement
• The Null Hypothesis is Almost Always Rejected
Bina Nusantara
Testing for Overall Significance
(continued)
• Test Statistic:
–
MSR SSR  all  / k
F

MSE
MSE  all 
• Where F has k numerator and (n-k-1) denominator degrees of
freedom
Bina Nusantara
Test for Overall Significance
Excel Output: Example
ANOVA
df
Regression
Residual
Total
SS
MS
F
Significance F
2 228014.6 114007.3 168.4712
1.65411E-09
12 8120.603 676.7169
14 236135.2
k = 2, the number of
explanatory variables
p-value
n-1
MSR
 F Test Statistic
MSE
Bina Nusantara
Test for Overall Significance:
Example Solution
H0: 1 = 2 = … = k = 0
H1: At least one j  0
Test Statistic:
F 
 = .05
df = 2 and 12
168.47
(Excel Output)
Decision:
Reject at  = 0.05.
Critical Value:
Conclusion:
 = 0.05
0
Bina Nusantara
3.89
F
There is evidence that at
least one independent
variable affects Y.
Test for Significance:
Individual Variables
• Show If Y Depends Linearly on a Single Xj Individually While Holding
the Effects of Other X’s Fixed
• Use t Test Statistic
• Hypotheses:
– H0: j  0 (No linear relationship)
– H1: j  0 (Linear relationship between Xj and Y)
Bina Nusantara
t Test Statistic
Excel Output: Example
t Test Statistic for X1
(Temperature)
Coefficients Standard Error
t Stat
Intercept
562.1510092
21.09310433 26.65094
Temp
-5.436580588
0.336216167 -16.1699
Insulation -20.01232067
2.342505227 -8.543127
bi
t
Sbi
Bina Nusantara
P-value
4.77868E-12
1.64178E-09
1.90731E-06
t Test Statistic for X2
(Insulation)
t Test : Example Solution
Does temperature have a significant effect on monthly
consumption of heating oil? Test at  = 0.05.
Test Statistic:
H0: 1 = 0
t Test Statistic = -16.1699
H1: 1  0
Decision:
Reject H0 at  = 0.05.
df = 12
Critical Values:
Reject H0
Reject H0
.025
.025
-2.1788
Bina Nusantara
0 2.1788
t
Conclusion:
There is evidence of a
significant effect of
temperature on oil
consumption holding constant
the effect of insulation.
Venn Diagrams and
Estimation of Regression Model
Only this
information is
used in the
estimation of
1
Oil
Only this
information is
used in the
estimation of  2
Temp
Insulation
Bina Nusantara
This
information
is NOT used
in the
estimation
of 1 nor  2
Confidence Interval Estimate for the Slope
Provide the 95% confidence interval for the population
slope 1 (the effect of temperature on oil consumption).
Intercept
Temp
Insulation
Coefficients
562.151009
-5.4365806
-20.012321
b1  tn  p 1Sb1
Lower 95% Upper 95%
516.1930837 608.108935
-6.169132673 -4.7040285
-25.11620102
-14.90844
-6.169  1  -4.704
We are 95% confident that the estimated average consumption of
oil is reduced by between 4.7 gallons to 6.17 gallons per each
increase of 10 F holding insulation constant.
We can also perform the test for the significance of individual
variables, H0: 1 = 0 vs. H1: 1  0, using this confidence interval.
Bina Nusantara
Contribution of a Single
Independent VariableX j
• Let Xj Be the Independent Variable of Interest
• SSR X j | all others except X j


 SSR  all   SSR  all others except X j 
– Measures the additional contribution of Xj in explaining the total
variation in Y with the inclusion of all the remaining independent
variables
Bina Nusantara
Contribution of a Single Independent Variable X k
SSR  X 1 | X 2 and X 3 
 SSR  X 1 , X 2 and X 3   SSR  X 2 and X 3 
From ANOVA section of
regression for
Yˆi  b0  b1 X1i  b2 X 2i  b3 X 3i
From ANOVA section
of regression for
Yˆi  b0  b2 X 2i  b3 X 3i
Measures the additional contribution of X1 in
explaining Y with the inclusion of X2 and X3.
Bina Nusantara
Coefficient of Partial Determination of X j
2
• rYj all others 
SSR  X j | all others 
SST  SSR  all   SSR  X j | all others 
• Measures the proportion of variation in the dependent variable that
is explained by Xj
while controlling for (holding constant) the
other independent variables
Bina Nusantara
Coefficient of Partial Determination for X j
(continued)
Example: Model with two independent variables
2
Y 1 2
r
Bina Nusantara
SSR  X 1 | X 2 

SST  SSR  X 1 , X 2   SSR  X 1 | X 2 
Venn Diagrams and Coefficient of Partial Determination for X j
2
Y1  2
r
SSR  X1 | X 2 
Oil

SSR  X1 | X 2 
SST  SSR  X 1 , X 2   SSR  X 1 | X 2 
=
Temp
Insulation
Bina Nusantara
Coefficient of Partial Determination in PHStat
• PHStat | Regression | Multiple Regression …
– Check the “Coefficient of Partial Determination” box
• Excel spreadsheet for the heating oil example
Bina Nusantara
Contribution of a Subset of Independent Variables
• Let Xs Be the Subset of Independent Variables of Interest
– SSR X | all others except X

s
s

 SSR  all   SSR  all others except X s 
– Measures the contribution of the subset Xs in explaining SST
with the inclusion of the remaining independent variables
Bina Nusantara
Contribution of a Subset of Independent Variables:
Example
Let Xs be X1 and X3
SSR  X 1 and X 3 | X 2 
 SSR  X 1 , X 2 and X 3   SSR  X 2 
From ANOVA section of
regression for
Yˆi  b0  b1 X1i  b2 X 2i  b3 X 3i
Bina Nusantara
From ANOVA
section of
regression for
Yˆi  b0  b2 X 2i
Testing Portions of Model
• Examines the Contribution of a Subset Xs of Explanatory Variables to
the Relationship with Y
• Null Hypothesis:
– Variables in the subset do not improve the model significantly
when all other variables are included
• Alternative Hypothesis:
– At least one variable in the subset is significant when all other
variables are included
Bina Nusantara
Testing Portions of Model
(continued)
• One-Tailed Rejection Region
• Requires Comparison of Two Regressions
– One regression includes everything
– Another regression includes everything except the
portion to be tested
Bina Nusantara
Partial F Test for the Contribution of a Subset of X Variables
• Hypotheses:
– H0 : Variables Xs do not significantly improve the model given all other
variables included
– H1 : Variables Xs significantly improve the model given all others included
• Test Statistic:
–
SSR  X s | all others  / m
F
MSE  all 
– with df = m and (n-k-1)
– m = # of variables in the subset Xs
Bina Nusantara
Partial F Test for the Contribution of a Single X j
• Hypotheses:
– H0 : Variable Xj does not significantly improve the model given
all others included
– H1 : Variable Xj significantly improves the model given all
others included
• Test Statistic:
–
SSR  X j | all others 
F
MSE  all 
Bina Nusantara
– with df = 1 and (n-k-1 )
– m = 1 here
Testing Portions of Model: Example
Test at the  = .05
level to determine if
the variable of
average temperature
significantly improves
the model, given that
insulation is included.
Bina Nusantara
Testing Portions of Model: Example
H0: X1 (temperature) does
not improve model with X2
(insulation) included
 = .05, df = 1 and 12
Critical Value = 4.75
H1: X1 does improve model
ANOVA
(For X1 and X2)
ANOVA
(For X2)
Regression
Residual
Total
SS
MS
228014.6263 114007.313
8120.603016 676.716918
236135.2293
SS
Regression 51076.47
Residual
185058.8
Total
236135.2
SSR  X 1 | X 2   228, 015  51, 076 
F

 261.47
MSE  X 1 , X 2 
676.717
Bina Nusantara
Conclusion: Reject H0; X1 does improve model.
Do We Need to Do This
for One Variable?
• The F Test for the Contribution of a Single Variable After All Other
Variables are Included in the Model is IDENTICAL to the t Test of the
Slope for that Variable
• The Only Reason to Perform an F Test is to Test Several Variables
Together
Bina Nusantara
Download