Regresi Linear Ganda dengan Peubah Boneka Pertemuan 07 Tahun

advertisement
Matakuliah : I0174 – Analisis Regresi
Tahun
: Ganjil 2007/2008
Regresi Linear Ganda dengan Peubah Boneka
Pertemuan 07
Regresi Linier Ganda Dengan Peubah Boneka
Peubah Boneka Dua katagori
Peubah Boneka Lebih Dari Dua Katagori
Bina Nusantara
Chapter Topics
• Dummy-Variables and Interaction Terms
Bina Nusantara
The Multiple Regression Model
Relationship between 1 dependent & 2 or more
independent variables is a linear function
Population
Y-intercept
Population slopes
Random
error
Yi      X1i    X 2i    k X ki   i
Dependent (Response)
variable
Bina Nusantara
Independent (Explanatory)
variables
Bivariate model
Multiple Regression Model
+  1X
YYi i= 00 
X1i1i + 22XX2i2+i i i
Y
Response
Response
Plane
Plane
X
X11
Bina Nusantara
(Observed Y)
(Observed
Y)
 00
i
X22
X 1i ,,X
X
(X
1i 2i2)i
+ 1XX1i +
2X2i
Y| XY|X= 00 
1 1i
2 X 2i
Multiple Regression Equation
Yii =
+ b11X
X11ii 
+ bb22X 2i2i +eeii
 b0 
Bivariate model
Y
Y
Response
Response
Plane
Plane
X
X11
(Observed
(ObservedYY))
bb00
ei
X
X22
X 11ii , X2i2i)
(X
^
ˆ
+ b 2i
YYi i=bb00+bb1 X
1X11i
i  b22X2i
Bina Nusantara
Multiple Regression Equation
Multiple Regression Equation
Too
complicated
by hand!
Bina Nusantara
Ouch!
Interpretation of Estimated Coefficients
• Slope (bj )
– Estimated that the average value of Y changes by bj for each 1 unit increase
in Xj , holding all other variables constant (ceterus paribus)
– Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an
estimated 2 gallons for each 1 degree increase in temperature (X1), given the
inches of insulation (X2)
• Y-Intercept (b0)
– The estimated average value of Y when all Xj = 0
Bina Nusantara
Multiple Regression Model: Example
Develop a model for estimating
heating oil used for a single
family home in the month of
January, based on average
temperature and amount of
insulation in inches.
Bina Nusantara
Oil (Gal) Temp (0F) Insulation
275.30
40
3
363.80
27
3
164.30
40
10
40.80
73
6
94.30
64
6
230.90
34
6
366.70
9
6
300.60
8
10
237.80
23
10
121.40
63
3
31.40
65
10
203.50
41
6
441.10
21
3
323.00
38
3
52.50
58
10
Multiple Regression Equation: Example
Yˆi  b0  b1 X1i  b2 X 2i 
Excel Output
Intercept
X Variable 1
X Variable 2
 bk X ki
Coefficients
562.1510092
-5.436580588
-20.01232067
Yˆi  562.151  5.437 X1i  20.012 X 2i
For each degree increase in
temperature, the estimated average
amount of heating oil used is
decreased by 5.437 gallons,
holding insulation constant.
Bina Nusantara
For each increase in one inch
of insulation, the estimated
average use of heating oil is
decreased by 20.012 gallons,
holding temperature constant.
Dummy-Variable Models
•
•
•
•
•
•
•
Categorical Explanatory Variable with 2 or More Levels
Yes or No, On or Off, Male or Female,
Use Dummy-Variables (Coded as 0 or 1)
Only Intercepts are Different
Assumes Equal Slopes Across Categories
The Number of Dummy-Variables Needed is (# of Levels - 1)
Regression Model Has Same Form:
Yi   0  1 X1i   2 X 2i       k X ki   i
Bina Nusantara
Dummy-Variable Models
(with 2 Levels)
Given: Yˆi  b0  b1 X1i  b2 X 2i
Y = Assessed Value of House
X1 = Square Footage of House
X2 = Desirability of Neighborhood =
Desirable (X2 = 1)
Yˆi  b0  b1 X1i  b2 (1)  (b0  b2 )  b1 X1i
Undesirable (X2 = 0)
Yˆ  b  b X  b (0)  b  b X
i
Bina Nusantara
0
1
1i
2
0
1
1i
0 if
undesirable
1 if desirable
Same
slopes
Dummy-Variable Models
(with 2 Levels)
(continued)
Y (Assessed Value)
Same
slopes
b1
b0 + b2
Intercepts
different
Bina Nusantara
b0
X1 (Square footage)
Interpretation of the Dummy-Variable Coefficient (with 2
Levels)
Example:
Yˆi  b0  b1 X1i  b2 X 2i  20  5 X1i  6 X 2i
Y : Annual salary of college graduate in thousand $
X1 : GPA
X 2:
0 non-business degree
1 business degree
With the same GPA, college graduates with a business
degree are making an estimated 6 thousand dollars more
than graduates with a non-business degree, on average.
Bina Nusantara
Dummy-Variable Models
(with 3 Levels)
Given:
Y  Assessed Value of the House (1000 $)
X 1  Square Footage of the House
Style of the House = Split-level, Ranch, Condo
(3 Levels; Need 2 Dummy Variables)
1 if Split-level
1 if Ranch
X2  
X3  
 0 if not
 0 if not
Yˆi  b0  b1 X 1  b2 X 2  b3 X 3
Bina Nusantara
Interpretation of the Dummy-Variable Coefficients (with 3
Levels)
Given the Estimated Model:
Yˆi  20.43  0.045 X 1i  18.84 X 2i  23.53 X 3i
For Split-level  X 2  1 :
Yˆi  20.43  0.045 X 1i  18.84
For Ranch  X 3  1 :
Yˆi  20.43  0.045 X 1i  23.53
For Condo:
Yˆ  20.43  0.045 X
i
Bina Nusantara
1i
With the same footage, a Splitlevel will have an estimated
average assessed value of 18.84
thousand dollars more than a
Condo.
With the same footage, a Ranch
will have an estimated average
assessed value of 23.53
thousand dollars more than a
Condo.
Regression Model Containing
an Interaction Term
• Hypothesizes Interaction between a Pair of X Variables
– Response to one X variable varies at different levels of another X variable
• Contains a Cross-Product Term
–
Yi   0  1 X1i   2 X 2i  3 X1i X 2i   i
• Can Be Combined with Other Models
– E.g., Dummy-Variable Model
Bina Nusantara
Effect of Interaction
• Given:
–
Yi   0  1 X1i   2 X 2i  3 X1i X 2i   i
• Without Interaction Term, Effect of X1 on Y is Measured by 1
• With Interaction Term, Effect of X1 on Y is Measured by 1 + 3 X2
• Effect Changes as X2 Changes
Bina Nusantara
Interaction Example
Y
Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12
8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0
X1
0
0.5
1
1.5
Effect (slope) of X1 on Y depends on X2 value
Bina Nusantara
Interaction Regression Model Worksheet
Case, i
Yi
X1i
X2i
X1i X2i
1
1
1
3
3
2
3
4
1
8
3
5
2
40
6
4
3
5
6
30
:
:
:
:
:
Multiply X1 by X2 to get X1X2
Run regression with Y, X1, X2 , X1X2
Bina Nusantara
Interpretation When There Are 3+ Levels
Y   0  1MALE   2 MARRIED   3DIVORCED
  4 MALE  MARRIED   5 MALE  DIVORCED
MALE = 0 if female and 1 if male
MARRIED = 1 if married; 0 if not
DIVORCED = 1 if divorced; 0 if not
MALE•MARRIED = 1 if male married; 0 otherwise
= (MALE times MARRIED)
MALE•DIVORCED = 1 if male divorced; 0 otherwise
= (MALE times DIVORCED)
Bina Nusantara
Interpretation When There Are 3+ Levels (continued)
Y   0  1MALE   2 MARRIED   3DIVORCED
  4 MALE  MARRIED   5 MALE  DIVORCED
SINGLE
MARRIED
DIVORCED
FEMALE

  2
   3
MALE
   1     1
2  4
Bina Nusantara
   1
 3  5
Interpreting Results
FEMALE
Single:
Married:
Divorced:
MALE
Difference
0
1
Single:  0  1
 0   2 Married:  0  1   2   4
1  4
 0   3 Divorced:  0  1   3   5 1  5
Main Effects : MALE, MARRIED and DIVORCED
Interaction Effects : MALE•MARRIED and
MALE•DIVORCED
Bina Nusantara
Evaluating the Presence of Interaction with DummyVariable
• Suppose X1 and X2 are Numerical Variables and X3 is a Dummy-Variable
• To Test if the Slope of Y with X1 and/or X2 are the Same for the Two Levels of
X3
• Model:
Yi  0  1 X 1i   2 X 2i  3 X 3i   4 X 1i X 3i  5 X 2i X 3i   i
• Hypotheses:
– H0: 4 = 5 = 0 (No Interaction between X1 and X3 or X2 and X3 )
– H1: 4 and/or 5  0 (X1 and/or X2 Interacts with X3)
• Perform a Partial F Test
SSR( X 1 , X 2 , X 3 , X 4 , X 5 )  SSR( X 1 , X 2 , X 3 )  / 2

F
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 )
Bina Nusantara
Evaluating the Presence of Interaction with Numerical
Variables
• Suppose X1, X2 and X3 are Numerical Variables
• To Test If the Independent Variables Interact with Each Other
• Model:
Yi  0  1 X 1i  2 X 2i  3 X 3i  4 X 1i X 2i  5 X 1i X 3i  6 X 2i X 3i   i
• Hypotheses:
– H0: 4 = 5 = 6 = 0 (no interaction among X1, X2 and X3 )
– H1: at least one of 4, 5, 6  0 (at least one pair of X1, X2, X3 interact with each
other)
• Perform a Partial F Test
SSR( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 )  SSR( X 1 , X 2 , X 3 )  / 3

F
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 )
Bina Nusantara
Download