Matakuliah : I0174 – Analisis Regresi Tahun : Ganjil 2007/2008 Regresi Linear Ganda dengan Peubah Boneka Pertemuan 07 Regresi Linier Ganda Dengan Peubah Boneka Peubah Boneka Dua katagori Peubah Boneka Lebih Dari Dua Katagori Bina Nusantara Chapter Topics • Dummy-Variables and Interaction Terms Bina Nusantara The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Yi X1i X 2i k X ki i Dependent (Response) variable Bina Nusantara Independent (Explanatory) variables Bivariate model Multiple Regression Model + 1X YYi i= 00 X1i1i + 22XX2i2+i i i Y Response Response Plane Plane X X11 Bina Nusantara (Observed Y) (Observed Y) 00 i X22 X 1i ,,X X (X 1i 2i2)i + 1XX1i + 2X2i Y| XY|X= 00 1 1i 2 X 2i Multiple Regression Equation Yii = + b11X X11ii + bb22X 2i2i +eeii b0 Bivariate model Y Y Response Response Plane Plane X X11 (Observed (ObservedYY)) bb00 ei X X22 X 11ii , X2i2i) (X ^ ˆ + b 2i YYi i=bb00+bb1 X 1X11i i b22X2i Bina Nusantara Multiple Regression Equation Multiple Regression Equation Too complicated by hand! Bina Nusantara Ouch! Interpretation of Estimated Coefficients • Slope (bj ) – Estimated that the average value of Y changes by bj for each 1 unit increase in Xj , holding all other variables constant (ceterus paribus) – Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1), given the inches of insulation (X2) • Y-Intercept (b0) – The estimated average value of Y when all Xj = 0 Bina Nusantara Multiple Regression Model: Example Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches. Bina Nusantara Oil (Gal) Temp (0F) Insulation 275.30 40 3 363.80 27 3 164.30 40 10 40.80 73 6 94.30 64 6 230.90 34 6 366.70 9 6 300.60 8 10 237.80 23 10 121.40 63 3 31.40 65 10 203.50 41 6 441.10 21 3 323.00 38 3 52.50 58 10 Multiple Regression Equation: Example Yˆi b0 b1 X1i b2 X 2i Excel Output Intercept X Variable 1 X Variable 2 bk X ki Coefficients 562.1510092 -5.436580588 -20.01232067 Yˆi 562.151 5.437 X1i 20.012 X 2i For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. Bina Nusantara For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant. Dummy-Variable Models • • • • • • • Categorical Explanatory Variable with 2 or More Levels Yes or No, On or Off, Male or Female, Use Dummy-Variables (Coded as 0 or 1) Only Intercepts are Different Assumes Equal Slopes Across Categories The Number of Dummy-Variables Needed is (# of Levels - 1) Regression Model Has Same Form: Yi 0 1 X1i 2 X 2i k X ki i Bina Nusantara Dummy-Variable Models (with 2 Levels) Given: Yˆi b0 b1 X1i b2 X 2i Y = Assessed Value of House X1 = Square Footage of House X2 = Desirability of Neighborhood = Desirable (X2 = 1) Yˆi b0 b1 X1i b2 (1) (b0 b2 ) b1 X1i Undesirable (X2 = 0) Yˆ b b X b (0) b b X i Bina Nusantara 0 1 1i 2 0 1 1i 0 if undesirable 1 if desirable Same slopes Dummy-Variable Models (with 2 Levels) (continued) Y (Assessed Value) Same slopes b1 b0 + b2 Intercepts different Bina Nusantara b0 X1 (Square footage) Interpretation of the Dummy-Variable Coefficient (with 2 Levels) Example: Yˆi b0 b1 X1i b2 X 2i 20 5 X1i 6 X 2i Y : Annual salary of college graduate in thousand $ X1 : GPA X 2: 0 non-business degree 1 business degree With the same GPA, college graduates with a business degree are making an estimated 6 thousand dollars more than graduates with a non-business degree, on average. Bina Nusantara Dummy-Variable Models (with 3 Levels) Given: Y Assessed Value of the House (1000 $) X 1 Square Footage of the House Style of the House = Split-level, Ranch, Condo (3 Levels; Need 2 Dummy Variables) 1 if Split-level 1 if Ranch X2 X3 0 if not 0 if not Yˆi b0 b1 X 1 b2 X 2 b3 X 3 Bina Nusantara Interpretation of the Dummy-Variable Coefficients (with 3 Levels) Given the Estimated Model: Yˆi 20.43 0.045 X 1i 18.84 X 2i 23.53 X 3i For Split-level X 2 1 : Yˆi 20.43 0.045 X 1i 18.84 For Ranch X 3 1 : Yˆi 20.43 0.045 X 1i 23.53 For Condo: Yˆ 20.43 0.045 X i Bina Nusantara 1i With the same footage, a Splitlevel will have an estimated average assessed value of 18.84 thousand dollars more than a Condo. With the same footage, a Ranch will have an estimated average assessed value of 23.53 thousand dollars more than a Condo. Regression Model Containing an Interaction Term • Hypothesizes Interaction between a Pair of X Variables – Response to one X variable varies at different levels of another X variable • Contains a Cross-Product Term – Yi 0 1 X1i 2 X 2i 3 X1i X 2i i • Can Be Combined with Other Models – E.g., Dummy-Variable Model Bina Nusantara Effect of Interaction • Given: – Yi 0 1 X1i 2 X 2i 3 X1i X 2i i • Without Interaction Term, Effect of X1 on Y is Measured by 1 • With Interaction Term, Effect of X1 on Y is Measured by 1 + 3 X2 • Effect Changes as X2 Changes Bina Nusantara Interaction Example Y Y = 1 + 2X1 + 3X2 + 4X1X2 Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 12 8 Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 4 0 X1 0 0.5 1 1.5 Effect (slope) of X1 on Y depends on X2 value Bina Nusantara Interaction Regression Model Worksheet Case, i Yi X1i X2i X1i X2i 1 1 1 3 3 2 3 4 1 8 3 5 2 40 6 4 3 5 6 30 : : : : : Multiply X1 by X2 to get X1X2 Run regression with Y, X1, X2 , X1X2 Bina Nusantara Interpretation When There Are 3+ Levels Y 0 1MALE 2 MARRIED 3DIVORCED 4 MALE MARRIED 5 MALE DIVORCED MALE = 0 if female and 1 if male MARRIED = 1 if married; 0 if not DIVORCED = 1 if divorced; 0 if not MALE•MARRIED = 1 if male married; 0 otherwise = (MALE times MARRIED) MALE•DIVORCED = 1 if male divorced; 0 otherwise = (MALE times DIVORCED) Bina Nusantara Interpretation When There Are 3+ Levels (continued) Y 0 1MALE 2 MARRIED 3DIVORCED 4 MALE MARRIED 5 MALE DIVORCED SINGLE MARRIED DIVORCED FEMALE 2 3 MALE 1 1 2 4 Bina Nusantara 1 3 5 Interpreting Results FEMALE Single: Married: Divorced: MALE Difference 0 1 Single: 0 1 0 2 Married: 0 1 2 4 1 4 0 3 Divorced: 0 1 3 5 1 5 Main Effects : MALE, MARRIED and DIVORCED Interaction Effects : MALE•MARRIED and MALE•DIVORCED Bina Nusantara Evaluating the Presence of Interaction with DummyVariable • Suppose X1 and X2 are Numerical Variables and X3 is a Dummy-Variable • To Test if the Slope of Y with X1 and/or X2 are the Same for the Two Levels of X3 • Model: Yi 0 1 X 1i 2 X 2i 3 X 3i 4 X 1i X 3i 5 X 2i X 3i i • Hypotheses: – H0: 4 = 5 = 0 (No Interaction between X1 and X3 or X2 and X3 ) – H1: 4 and/or 5 0 (X1 and/or X2 Interacts with X3) • Perform a Partial F Test SSR( X 1 , X 2 , X 3 , X 4 , X 5 ) SSR( X 1 , X 2 , X 3 ) / 2 F MSE ( X 1 , X 2 , X 3 , X 4 , X 5 ) Bina Nusantara Evaluating the Presence of Interaction with Numerical Variables • Suppose X1, X2 and X3 are Numerical Variables • To Test If the Independent Variables Interact with Each Other • Model: Yi 0 1 X 1i 2 X 2i 3 X 3i 4 X 1i X 2i 5 X 1i X 3i 6 X 2i X 3i i • Hypotheses: – H0: 4 = 5 = 6 = 0 (no interaction among X1, X2 and X3 ) – H1: at least one of 4, 5, 6 0 (at least one pair of X1, X2, X3 interact with each other) • Perform a Partial F Test SSR( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ) SSR( X 1 , X 2 , X 3 ) / 3 F MSE ( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ) Bina Nusantara