Polynomial Regression Polynomial regression model may contain one, two or more than two predictor variables. Further, each predictor variable may be present in various powers. Fitting of polynomial regression models presents no new problems since they are special cases of the general linear regression model. Power Cells Example: A study of the effects of the charge rate and temperature on the life of a new type of power cell. The charge rate (X1) was controlled at three levels. The ambient temperature (X2) was controlled at three levels. Life of power cell (Y) measured by the number of charge-discharge cycles that a power cell underwent before it failed. Start with a second-order polynomial regression model with two predictor variables Yi 0 1 X i1 2 X i 2 11 X i21 22 X i22 12 X i1 X i 2 i Fitted Second-Order Polynomial Model Case Summaries 1 2 3 4 5 6 7 8 9 10 11 x1 Number of Cycles Yi Charge Rate X1 Temperature X2 xi2 x22 x1 x2 150.00 86.00 49.00 288.00 157.00 131.00 184.00 109.00 279.00 235.00 224.00 .60 1.00 1.40 .60 1.00 1.00 1.00 1.40 .60 1.00 1.40 10.00 10.00 10.00 20.00 20.00 20.00 20.00 20.00 30.00 30.00 30.00 1 0 1 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 0 -1 0 0 0 0 0 -1 0 1 X1 1.0 0.4 x2 X 2 20 10 Use of the new centered and scaled variables rather than original variables can reduce the correlation between the first power and second power terms. Correlation between X1 and X12: 0.991 2 X2 and X2 : 0.986 x1 and x12: 0 x2 and x22: 0 Note there are three repeated combinations of X1 and X2 1 Residuals Plots correlation coefficient=0.974 2 Test of Fit of the Second-Order Polynomial Model SSPE 1,404.67 ANOVA Table Number of Cycles * X1X2 Between Groups Within Groups (Combined) Total Sum of Squares 59201.3 8 Mean Square 7400.17 1404.67 2 702.333 60606.0 10 df F 10.537 Sig. .090 SSLF SSE SSPE 5,240.44 1,404.67 3,835.77 F* SSLF SSPE 3,835.77 1,404.67 182 . c p nc 3 2 .05 F .95;3,2 19.2 F * 182 . 19.2 second - order polynomial is a good fit 3 Partial Test of the Second-Order Terms (Is first-order model sufficient?). Partial F-Test H 0 : 11 22 12 0 H a : not all betas 0 F* SSR X 12 , X 22 , X 1 X 2 | X 1 , X 2 MSE 3 SSR X 12 , X 22 , X 1 X 2 | X 1 , X 2 SSR X 12 | X 1 , X 2 SSR X 22 | X 1 , X 2 , X 12 SSR X 1 X 2 | X 1 , X 2 , X 12 , X 22 1,646.0 284.9 529.0 2,459.9 2,459.9 1,0481 . .78 3 F .95;3,5 5.41 F* F * .78 5.41 H 0 : 11 22 12 0 No curvature and interaction effects are needed. Fit of First-Order Model 4 First Order Model Yi 0 1 X i1 i 2 X i 2 i Y 160.58 139.58 X 7.55 X 1 2 Coefficients a Model 1 (Constant) Charge Rate Temperature Unstandardized Coefficients Std. B Error 160.583 41.615 -139.583 31.665 7.550 1.267 Standar dized Coeffici ents Beta -.556 .751 t 3.859 -4.408 5.961 Sig. .005 .002 .000 a. Dependent Variable: Number of Cycles B t1 .10 2 2 t .975;8 2.306 sb1 31665 . sb2 1267 . Simultaneous 90% CI: -139.582.306*31.665, 7.552.306*1.267 212.6 1 66.5 4.6 2 10.5 5 Interaction Regression Models Additive Model Additive Model EY 10 2 X1 5 X 2 700 600 500 400 y 300 200 100 0 12 10 8 6 4 2 X1 Synergistic or Reinforcement Model 0 40 20 30 0 10 X2 70 50 60 Reinforcement Interaction Model EY 10 2 X1 5 X 2 .5 X1 X 2 1000 800 600 y 400 200 0 12 10 8 6 4 2 0 X1 Antagonistic or Interference Model EY 10 2 X1 5 X 2 .5 X1 X 2 40 20 30 0 10 X2 70 50 60 Interference Interaction Model 700 600 500 400 y 300 200 100 0 12 10 8 6 4 2 X1 0 40 20 30 0 10 X2 70 50 60 6 Curvilinear Model with Interaction Effects EY 165 2.6 X1 53 . loge X 2 .6 X1 loge X 2 Curvilinear Model with Interaction Effects 200 190 y 180 12 10 8 6 4 2 X1 0 60 70 50 30 40 20 10 X2 7 Body Fat Example The GLM Procedure Dependent Variable: y Source DF Sum of Squares Mean Square F Value Pr > F Model 6 407.6995001 67.9499167 10.07 0.0003 Error 13 87.6899999 6.7453846 Corrected Total 19 495.3895000 R-Square Coeff Var Root MSE y Mean 0.822988 12.86055 2.597188 20.19500 Source DF Type I SS Mean Square F Value Pr > F x1c x2c x3c x1c*x2c x1c*x3c x2c*x3c 1 1 1 1 1 1 352.2697968 33.1689128 11.5459022 1.4957180 2.7043343 6.5148360 352.2697968 33.1689128 11.5459022 1.4957180 2.7043343 6.5148360 52.22 4.92 1.71 0.22 0.40 0.97 <.0001 0.0450 0.2134 0.6455 0.5376 0.3437 Parameter Estimate Standard Error t Value Pr > |t| Intercept x1c x2c x3c x1c*x2c x1c*x3c 20.52689353 3.43780807 -2.09471734 -1.61633724 0.00887556 -0.08479084 1.07362646 3.57866572 3.03676957 1.90721068 0.03085046 0.07341774 19.12 0.96 -0.69 -0.85 0.29 -1.15 <.0001 0.3543 0.5025 0.4121 0.7781 0.2689 0.09041539 0.09200130 0.98 0.3437 x2c*x3c H 0 :4 5 6 0 H a : not all betas = 0 F* SSR X1 X 2 , X1 X 3 , X 2 X 3 | X1 , X 2 , X 3 1496 . 2.704 6.515 10.715 SSR X 1 X 2 , X 1 X 3 , X 2 X 3 | X 1 , X 2 , X 3 10.715 6.745 .53 3 3 MSE .05 F .95;313 , 3.41 F * .53 3.41 H 0 8 Qualitative Predictors One Qualitative Predictor An economists wishes to relate speed with which a particular insurance innovation is adopted (Y) to the size of the insurance firm (X1) and type of firm. The second predictor, type of firm, is qualitative and is composed of two classes - stock companies and mutual companies. Indicator Variables There are many ways to express qualitative variables. We shall use indicator variables that take on the values 0 and 1. Two Indicators for Two Classes leads to a Singular Matrix 1 X2 0 1 X3 0 if stock company otherwise if mutual company otherwise First Order Model Yi 0 1 Xi1 2 Xi 2 3 Xi 3 i Design Matrix With n = 4, First Two Observations Stock Companies, Next Two Observations Mutual Companies 1 1 X 1 1 The X’X matrix has linear dependency between columns: COL(1) = COL(3) + COL(4) 4 4 X i1 X X i 1 2 2 Rule: A qualitative variable with c classes will be represented by c-1 indicator variables, each taking on the values 0 and 1. Indicator variables are frequently also called dummy variables or binary variables. X11 1 0 X12 1 0 X13 0 1 X14 0 1 4 X i 1 4 2 i1 2 X i21 i 1 2 X i1 i 1 X i1 2 X i1 0 i 1 4 i 3 2 4 X i1 i 3 0 2 Interpretation of Regression Coefficients 9 Model Yi 0 1 X i1 2 X i 2 i X i1 size of firm 1 if stock company X i2 0 otherwise Response Function E Y 0 1 X 1 2 X 2 E Y 0 1 X 1 2 0 0 1 X 1 Mutual firms E Y 0 1 X 1 2 1 0 2 1 X 1 Stock Firms 2 indicates how much higher (lower) the response function for stock firms is than the one for mutual firms, for any given size of firm. 10 Example Ca se S um mariesa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Total N Numbr of Months Elapsed 17 26 21 30 22 0 12 19 4 16 28 15 11 38 31 21 20 13 30 14 20 Indicat or Code 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 20 Siz e of Firm 151 92 175 31 104 277 210 120 290 238 164 272 295 68 85 224 166 305 124 246 20 X1X2 0 0 0 0 0 0 0 0 0 0 164 272 295 68 85 224 166 305 124 246 20 a. Limited to first 100 cases. ANOVAa Model 1 Regres sion Residual Total Sum of Squares 1504.413 176.387 1680.800 df 2 17 19 Mean Square 752.207 10.376 F 72.497 Sig. .000b a. Dependent Variable: Numbr of Mont hs Elapsed b. Independent Variables: (Constant), Indic ator Code, Size of Firm 11 Coefficientsa Model 1 (Constant) Size of Firm Indicator Code Unstandardized Coefficients B Std. Error 33.874 1.814 Standar dized Coefficie nts Beta t 18.675 Sig. .000 -.102 .009 -.911 -11.443 .000 8.055 1.459 .439 5.521 .000 a. Dependent Variable: Numbr of Months Elapsed Fitted Regreesion Function Insurance Innovation Example 40 Numbr of Months Elapsed 30 20 10 Indicator Code 0 1 -10 0 0 100 200 300 400 Size of Firm 12 Model Containing Interactions Effects Model Yi 0 1 X i1 2 X i 2 3 X i1 X i 2 i X i1 size of firm 1 if stock company Xi2 0 otherwise Response Function E Y 0 1 X 1 2 X 2 3 X 1 X 2 E Y 0 1 X 1 2 0 3 0 0 1 X 1 Mutual firms E Y 0 1 X 1 2 1 3 X 1 0 2 1 3 X 1 Stock Firms Example: Insurance Innovation ANOVAa Model 1 Regres sion Residual Total Sum of Squares 1504.419 176.381 1680.800 df 3 16 19 Mean Square 501.473 11.024 F 45.490 Sig. .000b a. Dependent Variable: Numbr of Mont hs Elapsed b. Independent Variables: (Constant), X1X2, Size of Firm, Indicator Code Coeffi cientsa Model 1 (Const ant) Size of Firm Indicat or Code X1X2 Unstandardized Coeffic ient s B St d. Error 33.838 2.441 St andar diz ed Coeffic ie nts Beta t 13.864 Sig. .000 -.102 .013 -.909 -7. 779 .000 8.131 3.654 .443 2.225 .041 -4. 2E-04 .018 -.005 -.023 .982 a. Dependent Variable: Numbr of Months Elapsed 13 More Complex Models More Complex Models 1 X2 0 1 X3 0 Consider regression of tool wear (Y) on tool speed (X1) and tool model, the latter variables has four classes (M1, M2, M3, M4). if tool model M1 otherwise if tool model M2 otherwise 1 if tool model M3 X4 0 otherwise Yi 0 1 Xi1 2 Xi 2 3 Xi 3 4 Xi 4 i First Order Qualitative Variable Coding Tool Model X1 X2 X3 X4 M1 Xi1 1 0 0 M2 Xi1 0 1 0 M3 Xi1 0 0 1 M4 Xi1 0 0 0 Response Function E Y 0 1 X 1 2 X 2 3 X 3 4 X 4 E Y 0 1 X 1 E Y 0 2 1 X 1 E Y 0 3 1 X 1 E Y 0 4 1 X 1 M4 M1 M2 M3 Models With Qualitative Predictors Only Analysis of Variance Models Analysis of Covariance Models 14