Stat 301– Lecture 26 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan There is an indicator for Pickup but there are no pickups in the data. 1 Indicator Variables The indicator variable takes on the value 1 if it is that kind of vehicle and 0 otherwise. If all four indicator variables are 0, then the vehicle is a Sedan. 2 Explanatory Variables Indicator variables for All Wheel and Rear Wheel drive. If both indicator variables are 0, then the vehicle has Front Wheel drive. 3 Stat 301– Lecture 26 Explanatory Variables Engine size (liters) Cylinders (number) Horsepower Weight (pounds) Wheel Base (inches) Length (inches) Width (inches) 4 Forward Selection Fit Model – Personality: Stepwise Y, Response – Highway MPG Put all 13 variables into the Construct Model Effects box. Click on Run Model 5 Stepwise Fit Stopping Rule: P-value Threshold Prob to Enter = 0.050 Prob to Leave = 0.050 Direction: Forward Click on Go 6 Stat 301– Lecture 26 Stepwise Fit for Highway MPG Stepwise Regression Control Stopping Rule: P-value Threshold Prob to Enter Prob to Leave Direction: SSE 1268.269 0.05 0.05 Forward DFE RMSE RSquare RSquare Adj Cp 96 3.6347126 0.6543 0.6435 12.122214 p AICc BIC 4 548.4498 560.8374 Current Estimates Lock Entered Parameter Estimate nDF SS "F Ratio" "Prob>F" 33.0251054 1 0 0.000 1 Intercept 1 6.928014 0.522 0 0.47185 Sports Car 1 48.78157 3.800 0.0542 0 SUV 1 1.952058 0.146 0.70281 0 Wagon 1 9.296884 0.702 0.40437 0 Minivan 1 45.27064 3.517 0.06383 0 All Wheel 1 2.570807 0.193 0.66146 0 Rear Wheel 1 38.31665 2.960 0.08863 0 Engine 1 31.96371 2.456 0.12039 0 Cylinders 1 181.8159 13.762 0.00035 Horsepower -0.0257556 1 703.0322 53.215 8.5e-11 -0.0062597 Weight 1 106.6696 8.074 0.00548 Wheel Base 0.20569376 1 3.045635 0.229 0.6336 0 Length 1 0.141521 0.011 0.91821 0 Width Step History Step 1 2 3 Parameter Weight Horsepower Wheel Base Action "Sig Prob" Seq SS RSquare Cp Entered 0.0000 2065.965 0.5631 35.606 Entered 0.0001 228.0966 0.6253 18.88 Entered 0.0055 106.6696 0.6543 12.122 p AICc BIC 2 567.486 575.052 3 554.308 564.308 4 548.45 560.837 7 Forward Selection Three variables are added Weight Horsepower Wheel Base All variables added are still statistically significant. 8 Forward Selection Model with Weight, Horsepower and Wheel Base. R2 = 0.6543, adj R2 = 0.6435 RMSE = 3.635 AICc = 548.45, BIC = 560.84 Cp = 12.1222 9 Stat 301– Lecture 26 Stepwise Fit Stopping Rule: P-value Threshold Prob to Enter = 0.050 Prob to Leave = 0.050 Direction: Backward Enter All Click on Go 10 11 Backward Selection Eight variables are removed Length, Rear Wheel, Wagon, Width, Engine, Wheel Base, Weight, Sports Car. All variables left are statistically significant. 12 Stat 301– Lecture 26 Backward Selection Model with SUV, Minivan, All Wheel, Cylinders and Horsepower. R2 = 0.6874, adj R2 = 0.6708 RMSE = 3.493 AICc = 542.96, BIC = 559.98 Cp = 6.1511 13 Backward Selection The final model from Backward selection is better than the final model from Forward selection. It has a higher R2 value, higher adj R2 value, lower RMSE, AICc, BIC and Cp value. 14 Mixed Selection (Forward) Stopping Rule: P-value Threshold Prob to Enter = 0.050 Prob to Leave = 0.050 Direction: Mixed Click on Go 15 Stat 301– Lecture 26 16 Mixed Selection (Forward) Three variables are added Weight Horsepower Wheel Base No variables are removed. This is the same as with Forward Selection. 17 Mixed Selection (Backward) Stopping Rule: P-value Threshold Direction: Mixed Prob to Enter = 0.050 Prob to Leave = 0.050 Enter All Click on Go 18 Stat 301– Lecture 26 19 Mixed Selection (Backward) Eight variables are removed Length, Rear Wheel, Wagon, Width, Engine, Wheel Base, Weight, Sports Car. No variables are added. This is the same as with Backward Selection. 20 All Possible Models 213 – 1 = 8191 models possible. 1-variable models – listed in order of the R2 value. 2-variable models – listed in order of the R2 value. etc. 13-variable (full) model. 21 Stat 301– Lecture 26 All Possible Models Can specify the maximum number of variables in a model. Can specify the maximum number of models displayed for each number of variables. 22 All Possible Models Model with all 13 variables has the highest R2 value, R2 = 0.7145 Adj R2 = 0.6713 RMSE = 3.4900689 Cp = 14 AICc = 554.404 BIC = 587.7673 23 Full Model The model with 13 variables has several variables that do not add significantly to the other 12. 24 Stat 301– Lecture 26 Is there a better model? Is there a model with: Lower Lower Lower Lower RMSE? Cp? AICc? BIC? 25 All Possible Models Model with 7 variables has the lowest RMSE value. Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight RMSE = 3.4282 26 Model with lowest RMSE Several variables are not statistically significant but very close to the threshold of 0.05. Sports Car: F=3.847, P-value=0.0529 Horsepower: F=3.761, P-value=0.0555 Weight: F=3.653, P-value=0.0591 27 Stat 301– Lecture 26 All Possible Models Model with 7 variables has the lowest Cp value. Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight Cp = 4.7649 This is the same model as the one with the lowest RMSE. 28 All Possible Models Model with 7 variables has the lowest AICc value. Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight AICc = 541.854 This is the same model as the one with the lowest RMSE and Cp. 29 All Possible Models Model with 4 variables has the lowest BIC value. Sports Car, All Wheel, Cylinders, and Weight BIC = 559.9569 30 Stat 301– Lecture 26 Strategy Pick a criterion; RMSE, Cp, AICc or BIC. Identify several “good” models, i.e. low values for the criterion. Look at R2, significance of individual variables, behavior of the residuals. 31 RMSE Model Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight Number RSquare RMSE 7 0.7053 3.4282 541.8541 563.3006 AICc BIC 4.7649 Cp SUV, Minivan, All Wheel, Rear Wheel, Engine, Horsepower, Weight, Wheel Base 8 0.7083 3.4294 543.3029 566.8826 5.8613 SUV, Minivan, All Wheel, Engine, Horsepower, Weight, Wheel Base 7 0.7049 3.4308 542.0075 563.4540 4.9011 Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight, Wheel Base 8 0.7081 3.4309 543.3907 566.9705 5.9386 SUV, Minivan, All Wheel, Engine, Cylinders, Horsepower, Weight, Wheel Base 8 0.7076 3.4334 543.5393 567.1190 6.0693 32 AICc Model Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight Number RSquare RMSE AICc BIC Cp 7 0.7053 3.4282 541.8541 563.3006 4.7649 SUV, Minivan, All Wheel, Engine, Horsepower, Weight, Wheel Base 7 0.7049 3.4308 542.0075 563.4540 4.9011 SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight, Wheel Base 7 0.7043 3.4339 542.1917 563.6382 5.0650 SUV, Minivan, All Wheel, Cylinders, Horsepower 5 0.6874 3.4929 542.9625 559.9813 6.1511 SUV, Minivan, All Wheel, Rear Wheel, Engine, Horsepower, Weight, Wheel Base 8 0.7083 3.4294 543.3029 566.8826 5.8613 33 Stat 301– Lecture 26 Cp Model Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight Number RSquare RMSE 7 0.7053 3.4282 541.8541 563.3006 4.7649 AICc BIC Cp SUV, Minivan, All Wheel, Engine, Horsepower, Weight, Wheel Base 7 0.7049 3.4308 542.0075 563.4540 4.9011 SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight, Wheel Base 7 0.7043 3.4339 542.1917 563.6382 5.0650 SUV, Minivan, All Wheel, Rear Wheel, Engine, Horsepower, Weight, Wheel Base 8 0.7083 3.4294 543.3029 566.8826 5.8613 Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight, Wheel Base 8 0.7081 3.4309 543.3907 566.9705 5.9386 34 Final Model The 7-variable model with SUV, Minivan, All Wheel, Engine, Horsepower, Weight and Wheel Base Appears to be a pretty good model. 35 Prediction Equation Predicted Highway MPG = 30.74 – 3.15*SUV – 3.28*Minivan – 2.08*All Wheel – 1.65*Engine – 0.0226*Horsepower – 0.0029*Weight + 0.163*Wheel Base 36 Stat 301– Lecture 26 Summary All variables add significantly. 2 2 R = 0.705, adj R = 0.682 RMSE = 3.431 AICc = 542.01, BIC = 563.45 Cp = 4.9011 37 20 15 5 0 -5 -10 -15 -20 15 20 25 30 35 Predicted Highw ay MPG 3 .99 2 .95 .90 1 .75 .50 Normal Quantile Plot 38 0 .25 -1 .10 .05 -2 .01 -3 35 30 25 20 Count Residual Best Model 10 15 10 5 -5 0 5 10 15 39