Basic Business Statistics (9th Edition) Chapter 14 Introduction to Multiple Regression © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter Topics The Multiple Regression Model Residual Analysis Testing for the Significance of the Regression Model Inferences on the Population Regression Coefficients Testing Portions of the Multiple Regression Model Dummy-Variables and Interaction Terms Logistic Regression Model © 2004 Prentice-Hall, Inc. Chap 14-2 The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Yi X1i X 2i k X ki i Dependent (Response) variable © 2004 Prentice-Hall, Inc. Independent (Explanatory) variables Chap 14-3 Multiple Regression Model Y Response Response Plane Plane X X11 © 2004 Prentice-Hall, Inc. + 1X YYi i= 00 X1i1i + 22XX2i2+i i i (Observed Y) (Observed Y) 00 i X22 X 1i ,,X X (X 1i 2i2)i + 1XX1i + 2X2i Y| XY|X= 00 1 1i 2 X 2i Chap 14-4 Multiple Regression Equation Y Y Response Response Plane Plane X X11 Yii = + b11X X11ii + bb22X 2i2i +eeii b0 (Observed (ObservedYY)) bb00 ei X X22 X 11ii , X2i2i) (X ^ ˆ + b 2i YYi i=bb00+bb1 X 1X11i i b22X2i © 2004 Prentice-Hall, Inc. Multiple Regression Equation Chap 14-5 Multiple Regression Equation Too complicated by hand! © 2004 Prentice-Hall, Inc. Ouch! Chap 14-6 Interpretation of Estimated Coefficients Slope (bj ) Estimated that the average value of Y changes by bj for each 1 unit increase in Xj , holding all other variables constant (ceterus paribus) Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1), given the inches of insulation (X2) Y-Intercept (b0) The estimated average value of Y when all Xj = 0 © 2004 Prentice-Hall, Inc. Chap 14-7 Multiple Regression Model: Example Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches. © 2004 Prentice-Hall, Inc. Oil (Gal) Temp (0F) Insulation 275.30 40 3 363.80 27 3 164.30 40 10 40.80 73 6 94.30 64 6 230.90 34 6 366.70 9 6 300.60 8 10 237.80 23 10 121.40 63 3 31.40 65 10 203.50 41 6 441.10 21 3 323.00 38 3 52.50 58 10 Chap 14-8 Multiple Regression Equation: Example Yˆi b0 b1 X1i b2 X 2i Excel Output Intercept X Variable 1 X Variable 2 bk X ki Coefficients 562.1510092 -5.436580588 -20.01232067 Yˆi 562.151 5.437 X1i 20.012 X 2i For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. © 2004 Prentice-Hall, Inc. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant. Chap 14-9 Multiple Regression in PHStat PHStat | Regression | Multiple Regression … Excel spreadsheet for the heating oil example © 2004 Prentice-Hall, Inc. Chap 14-10 Venn Diagrams and Explanatory Power of Regression Variations in Temp not used in explaining variation in Oil Temp © 2004 Prentice-Hall, Inc. Oil Variations in Oil explained by the error term SSE Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil SSR Chap 14-11 Venn Diagrams and Explanatory Power of Regression (continued) r 2 Oil Temp © 2004 Prentice-Hall, Inc. SSR SSR SSE Chap 14-12 Venn Diagrams and Explanatory Power of Regression Variation NOT explained by Temp nor Insulation SSE Temp © 2004 Prentice-Hall, Inc. Overlapping variation in both Temp and Oil Insulation are used in explaining the variation in Oil but NOT in the Insulation estimation of 1 nor 2 Chap 14-13 Coefficient of Multiple Determination Proportion of Total Variation in Y Explained by All X Variables Taken Together 2 Y 12 k r SSR Explained Variation SST Total Variation Never Decreases When a New X Variable is Added to Model Disadvantage when comparing among models © 2004 Prentice-Hall, Inc. Chap 14-14 Venn Diagrams and Explanatory Power of Regression Oil 2 Y 12 r Temp Insulation © 2004 Prentice-Hall, Inc. SSR SSR SSE Chap 14-15 Adjusted Coefficient of Multiple Determination Proportion of Variation in Y Explained by All the X Variables Adjusted for the Sample Size and the Number of X Variables Used 2 adj r 2 1 1 rY 12 n 1 k n k 1 Penalizes excessive use of independent variables 2 r Smaller than Y 12 k Useful in comparing among models Can decrease if an insignificant new X variable is added to the model © 2004 Prentice-Hall, Inc. Chap 14-16 Coefficient of Multiple Determination Excel Output 2 Y 12 r R e g re ssi o n S ta ti sti c s M u lt ip le R 0.982654757 R S q u a re 0.965610371 A d ju s t e d R S q u a re 0.959878766 S t a n d a rd E rro r 26.01378323 O b s e rva t io n s 15 SSR SST Adjusted r2 reflects the number of explanatory variables and sample size is smaller than r2 © 2004 Prentice-Hall, Inc. Chap 14-17 Interpretation of Coefficient of Multiple Determination 2 Y 12 r SSR .9656 SST 96.56% of the total variation in heating oil can be explained by temperature and amount of insulation r .9599 2 adj 95.99% of the total fluctuation in heating oil can be explained by temperature and amount of insulation after adjusting for the number of explanatory variables and sample size © 2004 Prentice-Hall, Inc. Chap 14-18 Simple and Multiple Regression Compared The slope coefficient in a simple regression picks up the impact of the independent variable plus the impacts of other variables that are excluded from the model, but are correlated with the included independent variable and the dependent variable Coefficients in a multiple regression net out the impacts of other variables in the equation Hence, they are called the net regression coefficients © 2004 Prentice-Hall, Inc. They still pick up the effects of other variables that are excluded from the model, but are correlated with the included independent variables and the dependent variable Chap 14-19 Simple and Multiple Regression Compared: Example Two Simple Regressions: Oil 0 1 Temp The three ’s are different Oil 0 2 Insulation Multiple Regression: Oil 0 1 Temp 2 Insulation The three 0’s do not have the same value © 2004 Prentice-Hall, Inc. The two 2’s do not have the same value The two 1’s do not have the same value Chap 14-20 Simple and Multiple Regression Compared: Slope Coefficients Oil b0 b1 Temp b2 Insulation e Intercept Temp Insulation Coefficients 562.1510092 -5.436580588 -20.01232067 Oil b0 b1 Temp e Coefficients 436.4382299 -5.462207697 Intercept Temp -5.4366 -5.4622 © 2004 Prentice-Hall, Inc. -20.0123 -20.3503 Oil b0 b2 Insulation e Intercept Insulation The three Coefficients 345.3783784 -20.35027027 e’s are different Chap 14-21 Simple and Multiple Regression Compared: r2 Oil b0 b1 Temp b2 Insulation e Oil b0 b1 Temp e Regression Statistics Multiple R 0.86974117 R Square 0.756449704 Adjusted R Square 0.737715065 Standard Error 66.51246564 Observations 15 © 2004 Prentice-Hall, Inc. 0.97275 Regression Statistics Multiple R 0.982654757 R Square 0.965610371 Adjusted R Square 0.959878766 Standard Error 26.01378323 Observations 15 0.96561 0.75645 0.21630 Oil b0 b1 Insulation e Regression Statistics Multiple R 0.465082527 R Square 0.216301757 Adjusted R Square 0.156017277 Standard Error 119.3117327 Observations 15 Chap 14-22 Example: Adjusted r2 Can Decrease Oil 0 1 Temp 2 Insulation Regression Statistics Multiple R 0.982654757 R Square 0.965610371 Adjusted R Square 0.959878766 Standard Error 26.01378323 Observations 15 Try a 3rd explanatory variable Oil 0 1 Temp 2 Insulation 3 Rainfall Regression Statistics Multiple R 0.983482856 R Square 0.967238528 Adjusted R Square 0.958303581 Standard Error 25.72417272 Observations 15 © 2004 Prentice-Hall, Inc. Adjusted r 2 decreases when k increases from 2 to 3 Rainfall is not useful in explaining the variation in oil consumption. Chap 14-23 Using the Regression Equation to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches. Yˆi 562.151 5.437 X 1i 20.012 X 2i 562.151 5.437 30 20.012 6 278.969 © 2004 Prentice-Hall, Inc. The predicted heating oil used is 278.97 gallons. Chap 14-24 Predictions in PHStat PHStat | Regression | Multiple Regression … Check the “Confidence and Prediction Interval Estimate” box Excel spreadsheet for the heating oil example © 2004 Prentice-Hall, Inc. Chap 14-25 Residual Plots Residuals Vs X1 May need to transform Residuals Vs May need to transform Y variable Residuals Vs Yˆ X2 May need to transform X1 variable X 2variable Residuals Vs Time May have autocorrelation © 2004 Prentice-Hall, Inc. Chap 14-26 Residual Plots: Example T em p eratu re R esid u al P lo t Maybe some nonlinear relationship 60 Residuals 40 20 Insulation R esidual P lot 0 0 20 40 60 80 -20 -40 -60 0 2 4 6 8 10 12 No Discernable Pattern © 2004 Prentice-Hall, Inc. Chap 14-27 Testing for Overall Significance Shows if Y Depends Linearly on All of the X Variables Together as a Group Use F Test Statistic Hypotheses: H0: … k = 0 (No linear relationship) H1: At least one j ( At least one independent variable affects Y ) The Null Hypothesis is a Very Strong Statement The Null Hypothesis is Almost Always Rejected © 2004 Prentice-Hall, Inc. Chap 14-28 Testing for Overall Significance (continued) Test Statistic: MSR SSR / k F MSE MSE / n k 1 Where F has k numerator and (n-k-1) denominator degrees of freedom © 2004 Prentice-Hall, Inc. Chap 14-29 Test for Overall Significance Excel Output: Example ANOVA df Regression Residual Total SS MS F Significance F 2 228014.6 114007.3 168.4712 1.65411E-09 12 8120.603 676.7169 14 236135.2 k = 2, the number of explanatory variables p-value n-1 MSR F Test Statistic MSE © 2004 Prentice-Hall, Inc. Chap 14-30 Test for Overall Significance: Example Solution H0: 1 = 2 = … = k = 0 H1: At least one j 0 = .05 df = 2 and 12 Test Statistic: F 168.47 (Excel Output) Decision: Reject at = 0.05. Critical Value: Conclusion: = 0.05 0 © 2004 Prentice-Hall, Inc. 3.89 There is evidence that at least one independent variable affects Y. F Chap 14-31 Test for Significance: Individual Variables Show If Y Depends Linearly on a Single Xj Individually While Holding the Effects of Other X’s Fixed Use t Test Statistic Hypotheses: H0: j 0 (No linear relationship) H1: j 0 (Linear relationship between Xj and Y) © 2004 Prentice-Hall, Inc. Chap 14-32 t Test Statistic Excel Output: Example t Test Statistic for X1 (Temperature) Coefficients Standard Error t Stat Intercept 562.1510092 21.09310433 26.65094 Temp -5.436580588 0.336216167 -16.1699 Insulation -20.01232067 2.342505227 -8.543127 t © 2004 Prentice-Hall, Inc. bj Sb j P-value 4.77868E-12 1.64178E-09 1.90731E-06 t Test Statistic for X2 (Insulation) Chap 14-33 t Test : Example Solution Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05. Test Statistic: H0: 1 = 0 t Test Statistic = -16.1699 H1: 1 0 Decision: Reject H0 at = 0.05. df = 12 Critical Values: Reject H0 Reject H0 .025 .025 -2.1788 © 2004 Prentice-Hall, Inc. 0 2.1788 t Conclusion: There is evidence of a significant effect of temperature on oil consumption holding constant the effect of insulation. Chap 14-34 Venn Diagrams and Estimation of Regression Model Only this information is used in the estimation of 1 Oil Only this information is used in the estimation of 2 Temp Insulation © 2004 Prentice-Hall, Inc. This information is NOT used in the estimation of 1 nor 2 Chap 14-35 Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption). Intercept Temp Insulation Coefficients 562.151009 -5.4365806 -20.012321 b1 tn p 1Sb1 Lower 95% Upper 95% 516.1930837 608.108935 -6.169132673 -4.7040285 -25.11620102 -14.90844 -6.169 1 -4.704 We are 95% confident that the estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F holding insulation constant. We can also perform the test for the significance of individual variables, H0: 1 = 0 vs. H1: 1 0, using this confidence interval. © 2004 Prentice-Hall, Inc. Chap 14-36 Contribution of a Single Independent Variable X j Let Xj Be the Independent Variable of Interest SSR X j | all others except X j SSR all SSR all others except X j Measures the additional contribution of Xj in explaining the total variation in Y with the inclusion of all the remaining independent variables © 2004 Prentice-Hall, Inc. Chap 14-37 Contribution of a Single Independent Variable X k Measures the additional contribution of X1 in explaining Y with the inclusion of X2 and X3. SSR X 1 | X 2 and X 3 SSR X 1 , X 2 and X 3 SSR X 2 and X 3 From ANOVA section of regression for Yˆi b0 b1 X1i b2 X 2i b3 X 3i From ANOVA section of regression for Yˆi b0 b2 X 2i b3 X 3i Note: the values of the coefficients b0 , b1 , and b2 change in the two regression equations. Chap 14-38 © 2004 Prentice-Hall, Inc. Coefficient of Partial Determination of X 2 Yj all others r j SSR X j | all others SST SSR all SSR X j | all others Measures the proportion of variation in the dependent variable that is explained by Xj while controlling for (holding constant) the other independent variables © 2004 Prentice-Hall, Inc. Chap 14-39 Coefficient of Partial Determination for X j (continued) Example: Model with two independent variables 2 Y 1 2 r SSR X 1 | X 2 SST SSR X 1 , X 2 SSR X 1 | X 2 © 2004 Prentice-Hall, Inc. Chap 14-40 Venn Diagrams and Coefficient of Partial Determination for X j 2 Y1 2 r SSR X1 | X 2 Oil SSR X1 | X 2 SST SSR X 1 , X 2 SSR X 1 | X 2 = Temp Insulation © 2004 Prentice-Hall, Inc. Chap 14-41 Coefficient of Partial Determination in PHStat PHStat | Regression | Multiple Regression … Check the “Coefficient of Partial Determination” box Excel spreadsheet for the heating oil example © 2004 Prentice-Hall, Inc. Chap 14-42 Contribution of a Subset of Independent Variables Let Xs Be the Subset of Independent Variables of Interest SSR X s | all others except X s SSR all SSR all others except X s Measures the contribution of the subset Xs in explaining SST with the inclusion of the remaining independent variables © 2004 Prentice-Hall, Inc. Chap 14-43 Contribution of a Subset of Independent Variables: Example Let Xs be X1 and X3 SSR X 1 and X 3 | X 2 SSR X 1 , X 2 and X 3 SSR X 2 From ANOVA section of regression for Yˆi b0 b1 X1i b2 X 2i b3 X 3i © 2004 Prentice-Hall, Inc. From ANOVA section of regression for Yˆi b0 b2 X 2i Chap 14-44 Testing Portions of Model Examines the Contribution of a Subset Xs of Explanatory Variables to the Relationship with Y Null Hypothesis: Variables in the subset do not improve the model significantly when all other variables are included Alternative Hypothesis: At least one variable in the subset is significant when all other variables are included © 2004 Prentice-Hall, Inc. Chap 14-45 Testing Portions of Model (continued) One-Tailed Rejection Region Requires Comparison of Two Regressions One regression includes everything Another regression includes everything except the portion to be tested © 2004 Prentice-Hall, Inc. Chap 14-46 Partial F Test for the Contribution of a Subset of X Variables Hypotheses: H0 : Variables Xs do not significantly improve the model given all other variables included H1 : Variables Xs significantly improve the model given all others included Test Statistic: SSR X s | all others / m F MSE all with df = m and (n-k-1) m = # of variables in the subset Xs © 2004 Prentice-Hall, Inc. Chap 14-47 Partial F Test for the Contribution of a Single X j Hypotheses: H0 : Variable Xj does not significantly improve the model given all others included H1 : Variable Xj significantly improves the model given all others included Test Statistic: SSR X j | all others F MSE all with df =1 and (n-k-1 ) m = 1 here © 2004 Prentice-Hall, Inc. Chap 14-48 Testing Portions of Model: Example Test at the = .05 level to determine if the variable of average temperature significantly improves the model, given that insulation is included. © 2004 Prentice-Hall, Inc. Chap 14-49 Testing Portions of Model: Example H0: X1 (temperature) does not improve model with X2 (insulation) included = .05, df = 1 and 12 Critical Value = 4.75 H1: X1 does improve model ANOVA (For X1 and X2) ANOVA (For X2) Regression Residual Total SS MS 228014.6263 114007.313 8120.603016 676.716918 236135.2293 SS Regression 51076.47 Residual 185058.8 Total 236135.2 SSR X 1 | X 2 228, 015 51, 076 F 261.47 MSE X 1 , X 2 676.717 © 2004 Prentice-Hall, Inc. Conclusion: Reject H0; X1 does improve model. Chap 14-50 Testing Portions of Model in PHStat PHStat | Regression | Multiple Regression … Check the “Coefficient of Partial Determination” box Excel spreadsheet for the heating oil example © 2004 Prentice-Hall, Inc. Chap 14-51 Do We Need to Do This for One Variable? The F Test for the Contribution of a Single Variable After All Other Variables are Included in the Model is IDENTICAL to the t Test of the Slope for that Variable The Only Reason to Perform an F Test is to Test Several Variables Together © 2004 Prentice-Hall, Inc. Chap 14-52 Dummy-Variable Models Categorical Explanatory Variable with 2 or More Levels Only Intercepts are Different Assumes Equal Slopes Across Categories The Number of Dummy-Variables Needed is (# of Levels - 1) Regression Model Has Same Form: Two Level Examples Yi 0 1 X1i 2 X 2i k X ki i © 2004 Prentice-Hall, Inc. Yes or No, On or Off Use Dummy-Variable (Coded as 0 or 1) Chap 14-53 Dummy-Variable Models (with 2 Levels) Yˆi b0 b1 X1i b2 X 2i Given: Y = Assessed Value of House X1 = Square Footage of House X2 = Desirability of Neighborhood = Desirable (X2 = 1) Yˆi b0 b1 X1i b2 (1) (b0 b2 ) b1 X1i Undesirable (X2 = 0) Yˆ b b X b (0) b b X i © 2004 Prentice-Hall, Inc. 0 1 1i 2 0 1 0 if undesirable 1 if desirable Same slopes 1i Chap 14-54 Dummy-Variable Models (with 2 Levels) (continued) Y (Assessed Value) Same slopes b1 b0 + b2 Intercepts different b0 X1 (Square footage) © 2004 Prentice-Hall, Inc. Chap 14-55 Interpretation of the DummyVariable Coefficient (with 2 Levels) Example: Yˆi b0 b1 X1i b2 X 2i 20 5 X1i 6 X 2i Y : Annual salary of college graduate in thousand $ X1 : GPA X 2: 0 non-business degree 1 business degree With the same GPA, college graduates with a business degree are making an estimated 6 thousand dollars more than graduates with a non-business degree, on average. © 2004 Prentice-Hall, Inc. Chap 14-56 Dummy-Variable Models (with 3 Levels) Given: Y Assessed Value of the House (1000 $) X 1 Square Footage of the House Style of the House = Split-level, Ranch, Tudor (3 Levels; Need 2 Dummy Variables) 1 if Split-level 1 if Ranch X2 X3 0 if not 0 if not Yˆi b0 b1 X 1 b2 X 2 b3 X 3 © 2004 Prentice-Hall, Inc. Chap 14-57 Interpretation of the DummyVariable Coefficients (with 3 Levels) Given the Estimated Model: Yˆi 20.43 0.045 X 1i 18.84 X 2i 23.53 X 3i For Split-level X 2 1 : Yˆi 20.43 0.045 X 1i 18.84 For Ranch X 3 1 : Yˆi 20.43 0.045 X 1i 23.53 For Tudor: Yˆi 20.43 0.045 X 1i © 2004 Prentice-Hall, Inc. With the same footage, a Splitlevel will have an estimated average assessed value of 18.84 thousand dollars more than a Tudor. With the same footage, a Ranch will have an estimated average assessed value of 23.53 thousand dollars more than a Tudor. Chap 14-58 Regression Model Containing an Interaction Term Hypothesizes Interaction between a Pair of X Variables Response to one X variable varies at different levels of another X variable Contains a Cross-Product Term Yi 0 1 X 1i 2 X 2i 3 X 1i X 2i i Can Be Combined with Other Models E.g., Dummy-Variable Model © 2004 Prentice-Hall, Inc. Chap 14-59 Effect of Interaction Given: Yi 0 1 X 1i 2 X 2 i 3 X 1i X 2i i Without Interaction Term, Effect of X1 on Y is Measured by 1 With Interaction Term, Effect of X1 on Y is Measured by 1 + 3 X2 Effect Changes as X2 Changes © 2004 Prentice-Hall, Inc. Chap 14-60 Interaction Example Y Y = 1 + 2X1 + 3X2 + 4X1X2 Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 12 8 Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 4 0 X1 0 0.5 1 1.5 Effect (slope) of X1 on Y depends on X2 value © 2004 Prentice-Hall, Inc. Chap 14-61 Interaction Regression Model Worksheet Case, i Yi X1i X2i X1i X2i 1 2 3 4 : 1 4 1 3 : 1 8 3 5 : 3 5 2 6 : 3 40 6 30 : Multiply X1 by X2 to get X1X2 Run regression with Y, X1, X2 , X1X2 © 2004 Prentice-Hall, Inc. Chap 14-62 Interpretation When There Are 3+ Levels Consider the effects of gender (male or female) and working status (working part-time, working full-time or not working) on income (Y ). Y 0 1 Male 2 Part-time 3 Full-time 4 Male Part-time 5 Male Full-time Male = 0 if female; 1 if male Part-time = 1 if working part-time; 0 if working full-time or not working Full-time = 1 if working full-time; 0 if working part-time or not working Male•Part-time = 1 if male and working part-time; 0 otherwise = (Male times Part-time) Male•Full-time = 1 if male working full-time; 0 otherwise = (Male times Full-time) © 2004 Prentice-Hall, Inc. Chap 14-63 Interpretation When There Are 3+ Levels (continued) Y 0 1 Male 2 Part-time 3 Full-time 4 Male Part-time 5 Male Full-time Not-working Part-time Full-time Female 2 3 Male 1 1 1 2 4 3 5 © 2004 Prentice-Hall, Inc. Chap 14-64 Interpreting Results Female Not-working: 0 Part-time: 0 2 Male Not-working: 0 1 Part-time: 0 1 Difference 1 1 4 2 4 Full-time: 0 3 Full-time: 0 1 3 5 1 5 Main Effects : Male, Part-time and Full-time Interaction Effects : Male•Part-time and Male•Full-time © 2004 Prentice-Hall, Inc. Chap 14-65 Evaluating the Presence of Interaction with Dummy-Variable Suppose X1 and X2 are Numerical Variables and X3 is a Dummy-Variable To Test if the Slope of Y with X1 and/or X2 are the Same for the Two Levels of X3 Model: Yi 0 1 X 1i 2 X 2i 3 X 3i 4 X 1i X 3i 5 X 2i X 3i i Hypotheses: H0: 4 = 5 = 0 (No Interaction between X1 and X3 or X2 and X3 ) H1: 4 and/or 5 0 (X1 and/or X2 Interacts with X3) Perform a Partial F Test © 2004 Prentice-Hall, Inc. SSR( X 1 , X 2 , X 3 , X 4 , X 5 ) SSR( X 1 , X 2 , X 3 ) / 2 F MSE ( X 1 , X 2 , X 3 , X 4 , X 5 ) Chap 14-66 Evaluating the Presence of Interaction with Numerical Variables Suppose X1, X2 and X3 are Numerical Variables To Test If the Independent Variables Interact with Each Other Model: Yi 0 1 X 1i 2 X 2i 3 X 3i 4 X 1i X 2i 5 X 1i X 3i 6 X 2i X 3i i Hypotheses: H0: 4 = 5 = 6 = 0 (no interaction among X1, X2 and X3 ) H1: at least one of 4, 5, 6 0 (at least one pair of X1, X2, X3 interact with each other) Perform a Partial F Test SSR( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ) SSR( X 1 , X 2 , X 3 ) / 3 F © 2004 Prentice-Hall, Inc. MSE ( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ) Chap 14-67 Logistic Regression Model Enables the Use of Regression Model to Predict the Probability of a Particular Categorical Response for a Given Set of Explanatory Variables Based on the Odds Ratio Represents the probability of a success compared with the probability of failure © 2004 Prentice-Hall, Inc. probability of success Odds ratio 1 probability of success Chap 14-68 Logistic Regression Model (continued) Logistic Regression Model ln odds ratio X k X ki i 0 1 1i 2 X 2i Logistic Regression Equation bk X ki ln estimated odds ratio b0 b1 X 1i b2 X 2i Estimated Odds Ratio lnestimated odds ratio e Estimated Probability of Success estimated odds ratio 1 estimated odds ratio © 2004 Prentice-Hall, Inc. Chap 14-69 Interpretation of Estimated Slope Coefficients Logistic Regression Equation Has to be Estimated Using Computer Statistical Software, e.g. Minitab® The Estimated Slope Coefficient bj Measures the Estimated Change in the Natural Logarithm of the Odds Ratio as a Result of a One Unit Change in the Independent Variable Xj Holding Constant the Effects of all the Other Independent Variables © 2004 Prentice-Hall, Inc. Chap 14-70 The Deviance Statistic Use to Test whether the Logistic Regression is a Good-Fitting Model Hypotheses H0 : The model is a good-fitting model H1 : The model is not a good-fitting model Test Statistic The deviance statistic has a c distribution with (n – k – 1) degrees of freedom The rejection region is always in the upper tail © 2004 Prentice-Hall, Inc. Chap 14-71 Testing Significance of an Independent Variable Hypotheses H0 : j 0 H1 : j 0 (Xj is not significant) (Xj is significant) Test Statistic The Wald statistic is normally distributed A two-tail test with left and right-tail rejection regions © 2004 Prentice-Hall, Inc. Chap 14-72 Chapter Summary Developed the Multiple Regression Model Discussed Residual Plots Addressed Testing the Significance of the Multiple Regression Model Discussed Inferences on Population Regression Coefficients Addressed Testing Portions of the Multiple Regression Model Discussed Dummy-Variables and Interaction Terms Addressed Logistic Regression Model © 2004 Prentice-Hall, Inc. Chap 14-73