Matakuliah : I0174 – Analisis Regresi Tahun : Ganjil 2007/2008 Regresi dan Analisis Varians Pertemuan 21 Regresi dan Analisis Varians • Model Analisis Varians Eka Arah • Pendekatan Regresi terhadap Klasifikasi satu arah Bina Nusantara The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Yi X1i X 2i k X ki i Dependent (Response) variable Bina Nusantara Independent (Explanatory) variables Bivariate model Multiple Regression Model + 1X YYi i= 00 X1i1i + 22XX2i2+i i i Y Response Response Plane Plane X X11 Bina Nusantara (Observed Y) (Observed Y) 00 i X22 X 1i ,,X X (X 1i 2i2)i + 1XX1i + 2X2i Y| XY|X= 00 1 1i 2 X 2i Multiple Regression Equation Yii = + b11X X11ii + bb22X 2i2i +eeii b0 Bivariate model Y Y Response Response Plane Plane X X11 (Observed (ObservedYY)) bb00 ei X X22 X 11ii , X2i2i) (X ^ ˆ + b 2i YYi i=bb00+bb1 X 1X11i i b22X2i Bina Nusantara Multiple Regression Equation Multiple Regression Equation Too complicated by hand! Bina Nusantara Ouch! Interpretation of Estimated Coefficients • Slope (bj ) – Estimated that the average value of Y changes by bj for each 1 unit increase in Xj , holding all other variables constant (ceterus paribus) – Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1), given the inches of insulation (X2) • Y-Intercept (b0) – The estimated average value of Y when all Xj = 0 Bina Nusantara Multiple Regression Model: Example Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches. Bina Nusantara Oil (Gal) Temp (0F) Insulation 275.30 40 3 363.80 27 3 164.30 40 10 40.80 73 6 94.30 64 6 230.90 34 6 366.70 9 6 300.60 8 10 237.80 23 10 121.40 63 3 31.40 65 10 203.50 41 6 441.10 21 3 323.00 38 3 52.50 58 10 Multiple Regression Equation: Example Yˆi b0 b1 X1i b2 X 2i Excel Output Intercept X Variable 1 X Variable 2 bk X ki Coefficients 562.1510092 -5.436580588 -20.01232067 Yˆi 562.151 5.437 X1i 20.012 X 2i For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. Bina Nusantara For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant. Multiple Regression in PHStat • PHStat | Regression | Multiple Regression … • Excel spreadsheet for the heating oil example Bina Nusantara Venn Diagrams and Explanatory Power of Regression Variations in Temp not used in explaining variation in Oil Temp Bina Nusantara Oil Variations in Oil explained by the error term SSE Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil SSR Venn Diagrams and Explanatory Power of Regression (continued) r 2 Oil Temp Bina Nusantara SSR SSR SSE Venn Diagrams and Explanatory Power of Regression Overlapping Variation NOT variation in explained by both Temp and Temp nor Oil Insulation are Insulation used in SSE explaining the variation in Oil Temp but NOT in the Insulation estimation of 1 nor 2 Bina Nusantara Coefficient of Multiple Determination • Proportion of Total Variation in Y Explained by All X Variables Taken Together – 2 Y 12 k r SSR Explained Variation SST Total Variation • Never Decreases When a New X Variable is Added to Model – Disadvantage when comparing among models Bina Nusantara Venn Diagrams and Explanatory Power of Regression Oil 2 Y 12 r Temp Insulation Bina Nusantara SSR SSR SSE Adjusted Coefficient of Multiple Determination • Proportion of Variation in Y Explained by All the X Variables Adjusted for the Sample Size and the Number of X Variables Used n 1 – r2 1 1 r2 adj Y 12 k n k 1 – – – – Bina Nusantara Penalizes excessive use of independent variables 2 Smaller than rY 12 k Useful in comparing among models Can decrease if an insignificant new X variable is added to the model Example: Adjusted r2 Can Decrease Oil 0 1 Temp 2 Insulation Regression Statistics Multiple R 0.982654757 R Square 0.965610371 Adjusted R Square 0.959878766 Standard Error 26.01378323 Observations 15 Oil 0 1 Temp 2 Insulation 3 Color Regression Statistics Multiple R 0.983482856 R Square 0.967238528 Adjusted R Square 0.958303581 Standard Error 25.72417272 Observations 15 Bina Nusantara Adjusted r 2 decreases when k increases from 2 to 3 Color is not useful in explaining the variation in oil consumption. Using the Regression Equation to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches. Yˆi 562.151 5.437 X 1i 20.012 X 2i 562.151 5.437 30 20.012 6 278.969 Bina Nusantara The predicted heating oil used is 278.97 gallons. Testing for Overall Significance (continued) • Test Statistic: MSR SSR all / k F – MSE MSE all • Where F has k numerator and (n-k-1) denominator degrees of freedom Bina Nusantara Test for Overall Significance Excel Output: Example ANOVA df Regression Residual Total SS MS F Significance F 2 228014.6 114007.3 168.4712 1.65411E-09 12 8120.603 676.7169 14 236135.2 k = 2, the number of explanatory variables p-value n-1 MSR F Test Statistic MSE Bina Nusantara Test for Overall Significance: Example Solution H0: 1 = 2 = … = k = 0 H1: At least one j 0 Test Statistic: F = .05 df = 2 and 12 168.47 (Excel Output) Decision: Reject at = 0.05. Critical Value: Conclusion: = 0.05 0 Bina Nusantara 3.89 F There is evidence that at least one independent variable affects Y. Test for Significance: Individual Variables • Show If Y Depends Linearly on a Single Xj Individually While Holding the Effects of Other X’s Fixed • Use t Test Statistic • Hypotheses: – H0: j 0 (No linear relationship) – H1: j 0 (Linear relationship between Xj and Y) Bina Nusantara t Test Statistic Excel Output: Example t Test Statistic for X1 (Temperature) Coefficients Standard Error t Stat Intercept 562.1510092 21.09310433 26.65094 Temp -5.436580588 0.336216167 -16.1699 Insulation -20.01232067 2.342505227 -8.543127 bi t Sbi Bina Nusantara P-value 4.77868E-12 1.64178E-09 1.90731E-06 t Test Statistic for X2 (Insulation) t Test : Example Solution Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05. Test Statistic: H0: 1 = 0 t Test Statistic = -16.1699 H1: 1 0 Decision: Reject H0 at = 0.05. df = 12 Critical Values: Reject H0 Reject H0 .025 .025 -2.1788 Bina Nusantara 0 2.1788 t Conclusion: There is evidence of a significant effect of temperature on oil consumption holding constant the effect of insulation. Venn Diagrams and Estimation of Regression Model Only this information is used in the estimation of 1 Oil Only this information is used in the estimation of 2 Temp Insulation Bina Nusantara This information is NOT used in the estimation of 1 nor 2 Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption). Intercept Temp Insulation Coefficients 562.151009 -5.4365806 -20.012321 b1 tn p 1Sb1 Lower 95% Upper 95% 516.1930837 608.108935 -6.169132673 -4.7040285 -25.11620102 -14.90844 -6.169 1 -4.704 We are 95% confident that the estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F holding insulation constant. We can also perform the test for the significance of individual variables, H0: 1 = 0 vs. H1: 1 0, using this confidence interval. Bina Nusantara Contribution of a Single Independent Variable X j • Let Xj Be the Independent Variable of Interest • SSR X j | all others except X j SSR all SSR all others except X j – Measures the additional contribution of Xj in explaining the total variation in Y with the inclusion of all the remaining independent variables Bina Nusantara Contribution of a Single Independent Variable X k SSR X 1 | X 2 and X 3 SSR X 1 , X 2 and X 3 SSR X 2 and X 3 From ANOVA section of regression for Yˆi b0 b1 X1i b2 X 2i b3 X 3i From ANOVA section of regression for Yˆi b0 b2 X 2i b3 X 3i Measures the additional contribution of X1 in explaining Y with the inclusion of X2 and X3. Bina Nusantara Coefficient of Partial Determination of X j 2 r • Yj all others SSR X j | all others SST SSR all SSR X j | all others • Measures the proportion of variation in the dependent variable that is explained by Xj while controlling for (holding constant) the other independent variables Bina Nusantara Coefficient of Partial Determination for X j (continued) Example: Model with two independent variables 2 Y 1 2 r Bina Nusantara SSR X 1 | X 2 SST SSR X 1 , X 2 SSR X 1 | X 2 Venn Diagrams and Coefficient of Partial Determination for X j 2 Y1 2 r SSR X1 | X 2 Oil SSR X1 | X 2 SST SSR X 1 , X 2 SSR X 1 | X 2 = Temp Insulation Bina Nusantara Contribution of a Subset of Independent Variables • Let Xs Be the Subset of Independent Variables of Interest – SSR X | all others except X s s SSR all SSR all others except X s – Measures the contribution of the subset Xs in explaining SST with the inclusion of the remaining independent variables Bina Nusantara Contribution of a Subset of Independent Variables: Example Let Xs be X1 and X3 SSR X 1 and X 3 | X 2 SSR X 1 , X 2 and X 3 SSR X 2 From ANOVA section of regression for Yˆi b0 b1 X1i b2 X 2i b3 X 3i Bina Nusantara From ANOVA section of regression for Yˆi b0 b2 X 2i Testing Portions of Model • Examines the Contribution of a Subset Xs of Explanatory Variables to the Relationship with Y • Null Hypothesis: – Variables in the subset do not improve the model significantly when all other variables are included • Alternative Hypothesis: – At least one variable in the subset is significant when all other variables are included Bina Nusantara Testing Portions of Model (continued) • One-Tailed Rejection Region • Requires Comparison of Two Regressions – One regression includes everything – Another regression includes everything except the portion to be tested Bina Nusantara Partial F Test for the Contribution of a Subset of X Variables • Hypotheses: – H0 : Variables Xs do not significantly improve the model given all other variables included – H1 : Variables Xs significantly improve the model given all others included • Test Statistic: – SSR X s | all others / m F MSE all – with df = m and (n-k-1) – m = # of variables in the subset Xs Bina Nusantara Partial F Test for the Contribution of a Single X j • Hypotheses: – H0 : Variable Xj does not significantly improve the model given all others included – H1 : Variable Xj significantly improves the model given all others included • Test Statistic: SSR X j | all others – F MSE all – with df = 1 and (n-k-1 ) – m = 1 here Bina Nusantara Testing Portions of Model: Example Test at the = .05 level to determine if the variable of average temperature significantly improves the model, given that insulation is included. Bina Nusantara Testing Portions of Model: Example H0: X1 (temperature) does not improve model with X2 (insulation) included = .05, df = 1 and 12 Critical Value = 4.75 H1: X1 does improve model ANOVA (For X1 and X2) ANOVA (For X2) Regression Residual Total SS MS 228014.6263 114007.313 8120.603016 676.716918 236135.2293 SS Regression 51076.47 Residual 185058.8 Total 236135.2 SSR X 1 | X 2 228, 015 51, 076 F 261.47 MSE X 1 , X 2 676.717 Bina Nusantara Conclusion: Reject H0; X1 does improve model.