Stat 301– Lecture 26 Model Selection Response: Highway MPG Explanatory: 13 explanatory

advertisement
Stat 301– Lecture 26
Model Selection


Response: Highway MPG
Explanatory: 13 explanatory
variables
Indicator variables for types of car
– Sports Car, SUV, Wagon, Minivan
 There is an indicator for Pickup but
there are no pickups in the data.

1
Indicator Variables
The indicator variable takes on
the value 1 if it is that kind of
vehicle and 0 otherwise.
 If all four indicator variables
are 0, then the vehicle is a
Sedan.

2
Explanatory Variables
Indicator variables for All
Wheel and Rear Wheel drive.
 If both indicator variables are
0, then the vehicle has Front
Wheel drive.

3
Stat 301– Lecture 26
Explanatory Variables







Engine size (liters)
Cylinders (number)
Horsepower
Weight (pounds)
Wheel Base (inches)
Length (inches)
Width (inches)
4
Forward Selection
Fit Model – Personality:
Stepwise
 Y, Response – Highway MPG
 Put all 13 variables into the
Construct Model Effects box.
 Click on Run Model

5
Stepwise Fit

Stopping Rule: P-value
Threshold
Prob to Enter = 0.050
 Prob to Leave = 0.050

Direction: Forward
 Click on Go

6
Stat 301– Lecture 26
Stepwise Fit for Highway MPG
Stepwise Regression Control
Stopping Rule: P-value Threshold
Prob to Enter
Prob to Leave
Direction:
SSE
1268.269
0.05
0.05
Forward
DFE
RMSE RSquare RSquare Adj
Cp
96 3.6347126
0.6543
0.6435 12.122214
p
AICc
BIC
4 548.4498 560.8374
Current Estimates
Lock Entered Parameter
Estimate nDF
SS "F Ratio" "Prob>F"
33.0251054
1
0
0.000
1
Intercept
1 6.928014
0.522
0
0.47185
Sports Car
1 48.78157
3.800
0.0542
0
SUV
1 1.952058
0.146
0.70281
0
Wagon
1 9.296884
0.702
0.40437
0
Minivan
1 45.27064
3.517
0.06383
0
All Wheel
1 2.570807
0.193
0.66146
0
Rear Wheel
1 38.31665
2.960
0.08863
0
Engine
1 31.96371
2.456
0.12039
0
Cylinders
1 181.8159
13.762
0.00035
Horsepower -0.0257556
1 703.0322
53.215
8.5e-11
-0.0062597
Weight
1 106.6696
8.074
0.00548
Wheel Base 0.20569376
1 3.045635
0.229
0.6336
0
Length
1 0.141521
0.011
0.91821
0
Width
Step History
Step
1
2
3
Parameter
Weight
Horsepower
Wheel Base
Action "Sig Prob" Seq SS RSquare
Cp
Entered
0.0000 2065.965
0.5631 35.606
Entered
0.0001 228.0966
0.6253 18.88
Entered
0.0055 106.6696
0.6543 12.122
p
AICc
BIC
2 567.486 575.052
3 554.308 564.308
4 548.45 560.837
7
Forward Selection

Three variables are added
Weight
 Horsepower
 Wheel Base


All variables added are still
statistically significant.
8
Forward Selection

Model with Weight, Horsepower
and Wheel Base.
R2 = 0.6543, adj R2 = 0.6435
 RMSE = 3.635
 AICc = 548.45, BIC = 560.84
 Cp = 12.1222

9
Stat 301– Lecture 26
Stepwise Fit

Stopping Rule: P-value
Threshold
Prob to Enter = 0.050
 Prob to Leave = 0.050


Direction: Backward


Enter All
Click on Go
10
11
Backward Selection

Eight variables are removed


Length, Rear Wheel, Wagon,
Width, Engine, Wheel Base,
Weight, Sports Car.
All variables left are statistically
significant.
12
Stat 301– Lecture 26
Backward Selection

Model with SUV, Minivan, All
Wheel, Cylinders and Horsepower.
R2 = 0.6874, adj R2 = 0.6708
 RMSE = 3.493
 AICc = 542.96, BIC = 559.98
 Cp = 6.1511

13
Backward Selection

The final model from Backward
selection is better than the
final model from Forward
selection. It has a higher R2
value, higher adj R2 value,
lower RMSE, AICc, BIC and Cp
value.
14
Mixed Selection (Forward)

Stopping Rule: P-value
Threshold
Prob to Enter = 0.050
 Prob to Leave = 0.050

Direction: Mixed
 Click on Go

15
Stat 301– Lecture 26
16
Mixed Selection (Forward)

Three variables are added
Weight
Horsepower
 Wheel Base




No variables are removed.
This is the same as with Forward
Selection.
17
Mixed Selection (Backward)

Stopping Rule: P-value
Threshold



Direction: Mixed


Prob to Enter = 0.050
Prob to Leave = 0.050
Enter All
Click on Go
18
Stat 301– Lecture 26
19
Mixed Selection (Backward)

Eight variables are removed

Length, Rear Wheel, Wagon,
Width, Engine, Wheel Base,
Weight, Sports Car.
No variables are added.
 This is the same as with
Backward Selection.

20
All Possible Models





213 – 1 = 8191 models possible.
1-variable models – listed in order
of the R2 value.
2-variable models – listed in order
of the R2 value.
etc.
13-variable (full) model.
21
Stat 301– Lecture 26
All Possible Models
Can specify the maximum
number of variables in a model.
 Can specify the maximum
number of models displayed for
each number of variables.

22
All Possible Models






Model with all 13 variables has the
highest R2 value, R2 = 0.7145
Adj R2 = 0.6713
RMSE = 3.4900689
Cp = 14
AICc = 554.404
BIC = 587.7673
23
Full Model

The model with 13 variables
has several variables that do
not add significantly to the
other 12.
24
Stat 301– Lecture 26
Is there a better model?

Is there a model with:
Lower
Lower
 Lower
 Lower


RMSE?
Cp?
AICc?
BIC?
25
All Possible Models


Model with 7 variables has the
lowest RMSE value.
 Sports Car, SUV, Minivan, All
Wheel, Cylinders, Horsepower,
Weight
RMSE = 3.4282
26
Model with lowest RMSE

Several variables are not statistically
significant but very close to the
threshold of 0.05.
 Sports Car: F=3.847, P-value=0.0529
 Horsepower: F=3.761,
P-value=0.0555
 Weight: F=3.653, P-value=0.0591
27
Stat 301– Lecture 26
All Possible Models



Model with 7 variables has the
lowest Cp value.
 Sports Car, SUV, Minivan, All
Wheel, Cylinders, Horsepower,
Weight
Cp = 4.7649
This is the same model as the one
with the lowest RMSE.
28
All Possible Models



Model with 7 variables has the
lowest AICc value.
 Sports Car, SUV, Minivan, All
Wheel, Cylinders, Horsepower,
Weight
AICc = 541.854
This is the same model as the one
with the lowest RMSE and Cp.
29
All Possible Models


Model with 4 variables has the
lowest BIC value.
 Sports Car, All Wheel, Cylinders,
and Weight
BIC = 559.9569
30
Stat 301– Lecture 26
Strategy



Pick a criterion; RMSE, Cp, AICc or
BIC.
Identify several “good” models, i.e.
low values for the criterion.
Look at R2, significance of
individual variables, behavior of the
residuals.
31
RMSE
Model
Sports Car, SUV, Minivan, All
Wheel, Cylinders, Horsepower,
Weight
Number
RSquare
RMSE
7
0.7053
3.4282 541.8541 563.3006
AICc
BIC
4.7649
Cp
SUV, Minivan, All Wheel, Rear
Wheel, Engine, Horsepower,
Weight, Wheel Base
8
0.7083
3.4294 543.3029 566.8826
5.8613
SUV, Minivan, All Wheel,
Engine, Horsepower, Weight,
Wheel Base
7
0.7049
3.4308 542.0075 563.4540
4.9011
Sports Car, SUV, Minivan, All
Wheel, Cylinders, Horsepower,
Weight, Wheel Base
8
0.7081
3.4309 543.3907 566.9705
5.9386
SUV, Minivan, All Wheel,
Engine, Cylinders, Horsepower,
Weight, Wheel Base
8
0.7076
3.4334 543.5393 567.1190
6.0693
32
AICc
Model
Sports Car, SUV, Minivan, All
Wheel, Cylinders, Horsepower,
Weight
Number
RSquare
RMSE
AICc
BIC
Cp
7
0.7053
3.4282
541.8541
563.3006
4.7649
SUV, Minivan, All Wheel,
Engine, Horsepower, Weight,
Wheel Base
7
0.7049
3.4308
542.0075
563.4540
4.9011
SUV, Minivan, All Wheel,
Cylinders, Horsepower, Weight,
Wheel Base
7
0.7043
3.4339
542.1917
563.6382
5.0650
SUV, Minivan, All Wheel,
Cylinders, Horsepower
5
0.6874
3.4929
542.9625
559.9813
6.1511
SUV, Minivan, All Wheel, Rear
Wheel, Engine, Horsepower,
Weight, Wheel Base
8
0.7083
3.4294
543.3029
566.8826
5.8613
33
Stat 301– Lecture 26
Cp
Model
Sports Car, SUV, Minivan, All
Wheel, Cylinders, Horsepower,
Weight
Number
RSquare
RMSE
7
0.7053
3.4282 541.8541 563.3006 4.7649
AICc
BIC
Cp
SUV, Minivan, All Wheel,
Engine, Horsepower, Weight,
Wheel Base
7
0.7049
3.4308 542.0075 563.4540 4.9011
SUV, Minivan, All Wheel,
Cylinders, Horsepower, Weight,
Wheel Base
7
0.7043
3.4339 542.1917 563.6382 5.0650
SUV, Minivan, All Wheel, Rear
Wheel, Engine, Horsepower,
Weight, Wheel Base
8
0.7083
3.4294 543.3029 566.8826 5.8613
Sports Car, SUV, Minivan, All
Wheel, Cylinders, Horsepower,
Weight, Wheel Base
8
0.7081
3.4309 543.3907 566.9705 5.9386
34
Final Model

The 7-variable model with

SUV, Minivan, All Wheel, Engine,
Horsepower, Weight and Wheel
Base
Appears to be a pretty good
model.
35
Prediction Equation
Predicted Highway MPG = 30.74
– 3.15*SUV – 3.28*Minivan –
2.08*All Wheel – 1.65*Engine –
0.0226*Horsepower –
0.0029*Weight + 0.163*Wheel
Base
36
Stat 301– Lecture 26
Summary
All variables add significantly.
2
2
 R = 0.705, adj R = 0.682
 RMSE = 3.431
 AICc = 542.01, BIC = 563.45
 Cp = 4.9011

37
20
15
5
0
-5
-10
-15
-20
15
20
25
30
35
Predicted Highw ay MPG
3
.99
2
.95
.90
1
.75
.50
Normal Quantile Plot
38
0
.25
-1
.10
.05
-2
.01
-3
35
30
25
20
Count
Residual
Best Model
10
15
10
5
-5
0
5
10
15
39
Download