Regresi dan Analisis Varians Pertemuan 21 Matakuliah : I0174 – Analisis Regresi Tahun

advertisement
Matakuliah : I0174 – Analisis Regresi
Tahun
: Ganjil 2007/2008
Regresi dan Analisis Varians
Pertemuan 21
Regresi dan Analisis Varians
• Model Analisis Varians Eka Arah
• Pendekatan Regresi terhadap Klasifikasi satu arah
Bina Nusantara
The Multiple Regression Model
Relationship between 1 dependent & 2 or more
independent variables is a linear function
Population
Y-intercept
Population slopes
Random
error
Yi      X1i    X 2i    k X ki   i
Dependent (Response)
variable
Bina Nusantara
Independent (Explanatory)
variables
Bivariate model
Multiple Regression Model
+  1X
YYi i= 00 
X1i1i + 22XX2i2+i i i
Y
Response
Response
Plane
Plane
X
X11
Bina Nusantara
(Observed Y)
(Observed
Y)
 00
i
X22
X 1i ,,X
X
(X
1i 2i2)i
+ 1XX1i +
2X2i
Y| XY|X= 00 
1 1i
2 X 2i
Multiple Regression Equation
Yii =
+ b11X
X11ii 
+ bb22X 2i2i +eeii
 b0 
Bivariate model
Y
Y
Response
Response
Plane
Plane
X
X11
(Observed
(ObservedYY))
bb00
ei
X
X22
X 11ii , X2i2i)
(X
^
ˆ
+ b 2i
YYi i=bb00+bb1 X
1X11i
i  b22X2i
Bina Nusantara
Multiple Regression Equation
Multiple Regression Equation
Too
complicated
by hand!
Bina Nusantara
Ouch!
Interpretation of Estimated Coefficients
• Slope (bj )
– Estimated that the average value of Y changes by bj for each 1 unit
increase in Xj , holding all other variables constant (ceterus paribus)
– Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by
an estimated 2 gallons for each 1 degree increase in temperature
(X1), given the inches of insulation (X2)
• Y-Intercept (b0)
– The estimated average value of Y when all Xj = 0
Bina Nusantara
Multiple Regression Model: Example
Develop a model for estimating
heating oil used for a single
family home in the month of
January, based on average
temperature and amount of
insulation in inches.
Bina Nusantara
Oil (Gal) Temp (0F) Insulation
275.30
40
3
363.80
27
3
164.30
40
10
40.80
73
6
94.30
64
6
230.90
34
6
366.70
9
6
300.60
8
10
237.80
23
10
121.40
63
3
31.40
65
10
203.50
41
6
441.10
21
3
323.00
38
3
52.50
58
10
Multiple Regression Equation: Example
Yˆi  b0  b1 X1i  b2 X 2i 
Excel Output
Intercept
X Variable 1
X Variable 2
 bk X ki
Coefficients
562.1510092
-5.436580588
-20.01232067
Yˆi  562.151  5.437 X1i  20.012 X 2i
For each degree increase in
temperature, the estimated average
amount of heating oil used is
decreased by 5.437 gallons,
holding insulation constant.
Bina Nusantara
For each increase in one inch
of insulation, the estimated
average use of heating oil is
decreased by 20.012 gallons,
holding temperature constant.
Multiple Regression in PHStat
• PHStat | Regression | Multiple Regression …
• Excel spreadsheet for the heating oil example
Bina Nusantara
Venn Diagrams and
Explanatory Power of Regression
Variations in
Temp not used
in explaining
variation in Oil
Temp
Bina Nusantara
Oil
Variations in
Oil explained
by the error
term  SSE 
Variations in Oil
explained by Temp
or variations in
Temp used in
explaining variation
in Oil  SSR 
Venn Diagrams and
Explanatory Power of Regression
(continued)
r 
2
Oil
Temp
Bina Nusantara
SSR

SSR  SSE
Venn Diagrams and
Explanatory Power of Regression
Overlapping
Variation NOT
variation in
explained by
both Temp and
Temp nor
Oil
Insulation are
Insulation
used in
 SSE 
explaining the
variation in Oil
Temp
but NOT in the
Insulation estimation of 
1
nor  2
Bina Nusantara
Coefficient of
Multiple Determination
• Proportion of Total Variation in Y Explained by All X Variables
Taken Together
–
2
Y 12 k
r
SSR Explained Variation


SST
Total Variation
• Never Decreases When a New X Variable is Added to Model
– Disadvantage when comparing among models
Bina Nusantara
Venn Diagrams and
Explanatory Power of Regression
Oil
2
Y 12
r

Temp
Insulation
Bina Nusantara
SSR

SSR  SSE
Adjusted Coefficient of Multiple Determination
• Proportion of Variation in Y Explained by All the X Variables Adjusted
for the Sample Size and the Number of X Variables Used
n 1 
– r2  1  1 r2
adj
Y 12 k 

n  k  1 
–
–
–
–
Bina Nusantara
Penalizes excessive use of independent variables
2
Smaller than rY 12 k
Useful in comparing among models
Can decrease if an insignificant new X variable is added to the model
Example: Adjusted r2
Can Decrease
Oil   0  1 Temp   2 Insulation  
Regression Statistics
Multiple R
0.982654757
R Square
0.965610371
Adjusted R Square
0.959878766
Standard Error
26.01378323
Observations
15
Oil   0  1 Temp   2 Insulation  3 Color  
Regression Statistics
Multiple R
0.983482856
R Square
0.967238528
Adjusted R Square
0.958303581
Standard Error
25.72417272
Observations
15
Bina Nusantara
Adjusted r 2 decreases when
k increases from 2 to 3
Color is not useful in explaining
the variation in oil consumption.
Using the Regression Equation to Make Predictions
Predict the amount of heating oil used for a
home if the average temperature is 300 and
the insulation is 6 inches.
Yˆi  562.151  5.437 X 1i  20.012 X 2i
 562.151  5.437  30   20.012  6 
 278.969
Bina Nusantara
The predicted heating oil
used is 278.97 gallons.
Testing for Overall Significance
(continued)
• Test Statistic:
MSR SSR  all  / k
F

–
MSE
MSE  all 
• Where F has k numerator and (n-k-1) denominator
degrees of freedom
Bina Nusantara
Test for Overall Significance
Excel Output: Example
ANOVA
df
Regression
Residual
Total
SS
MS
F
Significance F
2 228014.6 114007.3 168.4712
1.65411E-09
12 8120.603 676.7169
14 236135.2
k = 2, the number of
explanatory variables
p-value
n-1
MSR
 F Test Statistic
MSE
Bina Nusantara
Test for Overall Significance:
Example Solution
H0: 1 = 2 = … = k = 0
H1: At least one j  0
Test Statistic:
F 
 = .05
df = 2 and 12
168.47
(Excel Output)
Decision:
Reject at  = 0.05.
Critical Value:
Conclusion:
 = 0.05
0
Bina Nusantara
3.89
F
There is evidence that at
least one independent
variable affects Y.
Test for Significance:
Individual Variables
• Show If Y Depends Linearly on a Single Xj Individually While
Holding the Effects of Other X’s Fixed
• Use t Test Statistic
• Hypotheses:
– H0: j  0 (No linear relationship)
– H1: j  0 (Linear relationship between Xj and Y)
Bina Nusantara
t Test Statistic
Excel Output: Example
t Test Statistic for X1
(Temperature)
Coefficients Standard Error
t Stat
Intercept
562.1510092
21.09310433 26.65094
Temp
-5.436580588
0.336216167 -16.1699
Insulation -20.01232067
2.342505227 -8.543127
bi
t
Sbi
Bina Nusantara
P-value
4.77868E-12
1.64178E-09
1.90731E-06
t Test Statistic for X2
(Insulation)
t Test : Example Solution
Does temperature have a significant effect on monthly
consumption of heating oil? Test at  = 0.05.
Test Statistic:
H0: 1 = 0
t Test Statistic = -16.1699
H1: 1  0
Decision:
Reject H0 at  = 0.05.
df = 12
Critical Values:
Reject H0
Reject H0
.025
.025
-2.1788
Bina Nusantara
0 2.1788
t
Conclusion:
There is evidence of a
significant effect of
temperature on oil
consumption holding constant
the effect of insulation.
Venn Diagrams and
Estimation of Regression Model
Only this
information is
used in the
estimation of
1
Oil
Only this
information is
used in the
estimation of  2
Temp
Insulation
Bina Nusantara
This
information
is NOT used
in the
estimation
of 1 nor  2
Confidence Interval Estimate for the Slope
Provide the 95% confidence interval for the population
slope 1 (the effect of temperature on oil consumption).
Intercept
Temp
Insulation
Coefficients
562.151009
-5.4365806
-20.012321
b1  tn  p 1Sb1
Lower 95% Upper 95%
516.1930837 608.108935
-6.169132673 -4.7040285
-25.11620102
-14.90844
-6.169  1  -4.704
We are 95% confident that the estimated average consumption of
oil is reduced by between 4.7 gallons to 6.17 gallons per each
increase of 10 F holding insulation constant.
We can also perform the test for the significance of individual
variables, H0: 1 = 0 vs. H1: 1  0, using this confidence interval.
Bina Nusantara
Contribution of a Single
Independent Variable X j
• Let Xj Be the Independent Variable of Interest
• SSR  X j | all others except X j 
 SSR  all   SSR  all others except X j 
– Measures the additional contribution of Xj in explaining
the total variation in Y with the inclusion of all the
remaining independent variables
Bina Nusantara
Contribution of a Single Independent Variable X k
SSR  X 1 | X 2 and X 3 
 SSR  X 1 , X 2 and X 3   SSR  X 2 and X 3 
From ANOVA section of
regression for
Yˆi  b0  b1 X1i  b2 X 2i  b3 X 3i
From ANOVA section
of regression for
Yˆi  b0  b2 X 2i  b3 X 3i
Measures the additional contribution of X1 in
explaining Y with the inclusion of X2 and X3.
Bina Nusantara
Coefficient of Partial Determination of X j
2
r
• Yj all others 
SSR  X j | all others 
SST  SSR  all   SSR  X j | all others 
• Measures the proportion of variation in the dependent variable that
is explained by Xj while controlling for (holding constant) the other
independent variables
Bina Nusantara
Coefficient of Partial Determination for X j
(continued)
Example: Model with two independent variables
2
Y 1 2
r
Bina Nusantara
SSR  X 1 | X 2 

SST  SSR  X 1 , X 2   SSR  X 1 | X 2 
Venn Diagrams and Coefficient of Partial Determination for X j
2
Y1  2
r
SSR  X1 | X 2 
Oil

SSR  X1 | X 2 
SST  SSR  X 1 , X 2   SSR  X 1 | X 2 
=
Temp
Insulation
Bina Nusantara
Contribution of a Subset of Independent Variables
• Let Xs Be the Subset of Independent Variables of Interest
– SSR  X | all others except X 
s
s
 SSR  all   SSR  all others except X s 
– Measures the contribution of the subset Xs in explaining SST
with the inclusion of the remaining independent variables
Bina Nusantara
Contribution of a Subset of Independent Variables:
Example
Let Xs be X1 and X3
SSR  X 1 and X 3 | X 2 
 SSR  X 1 , X 2 and X 3   SSR  X 2 
From ANOVA section of
regression for
Yˆi  b0  b1 X1i  b2 X 2i  b3 X 3i
Bina Nusantara
From ANOVA
section of
regression for
Yˆi  b0  b2 X 2i
Testing Portions of Model
• Examines the Contribution of a Subset Xs of Explanatory Variables to the
Relationship with Y
• Null Hypothesis:
– Variables in the subset do not improve the model significantly when
all other variables are included
• Alternative Hypothesis:
– At least one variable in the subset is significant when all other
variables are included
Bina Nusantara
Testing Portions of Model
(continued)
• One-Tailed Rejection Region
• Requires Comparison of Two Regressions
– One regression includes everything
– Another regression includes everything except the portion
to be tested
Bina Nusantara
Partial F Test for the Contribution of a Subset of X Variables
• Hypotheses:
– H0 : Variables Xs do not significantly improve the model given all
other variables included
– H1 : Variables Xs significantly improve the model given all others
included
• Test Statistic:
–
SSR  X s | all others  / m
F
MSE  all 
– with df = m and (n-k-1)
– m = # of variables in the subset Xs
Bina Nusantara
Partial F Test for the Contribution of a Single X j
• Hypotheses:
– H0 : Variable Xj does not significantly improve the model given all
others included
– H1 : Variable Xj significantly improves the model given all others
included
• Test Statistic:
SSR  X j | all others 
–
F
MSE  all 
– with df = 1 and (n-k-1 )
– m = 1 here
Bina Nusantara
Testing Portions of Model: Example
Test at the  = .05
level to determine if
the variable of
average temperature
significantly improves
the model, given that
insulation is included.
Bina Nusantara
Testing Portions of Model: Example
H0: X1 (temperature) does
not improve model with X2
(insulation) included
 = .05, df = 1 and 12
Critical Value = 4.75
H1: X1 does improve model
ANOVA
(For X1 and X2)
ANOVA
(For X2)
Regression
Residual
Total
SS
MS
228014.6263 114007.313
8120.603016 676.716918
236135.2293
SS
Regression 51076.47
Residual
185058.8
Total
236135.2
SSR  X 1 | X 2   228, 015  51, 076 
F

 261.47
MSE  X 1 , X 2 
676.717
Bina Nusantara
Conclusion: Reject H0; X1 does improve model.
Download