Stepwise Regression

advertisement
Stepwise Regression
SAS
Download the Data
• http://core.ecu.edu/psyc/wuenschk/StatData/St
atData.htm
3.2 625 540 65 2.7
4.1 575 680 75 4.5
3.0 520 480 65 2.5
2.6 545 520 55 3.1
3.7 520 490 75 3.6
4.0 655 535 65 4.3
4.3 630 720 75 4.6
2.7 500 500 75 3.0 and so on
Download the SAS Code
• http://core.ecu.edu/psyc/wuenschk/SAS/SASPrograms.htm
data grades;
infile 'C:\Users\Vati\Documents\StatData\MultReg.dat';
input GPA GRE_Q GRE_V MAT AR;
PROC REG;
a: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2
selection=forward slentry = .05 details; run;
Forward Selection, Step 1
Statistics for Entry
DF = 1,28
Variable
Tolerance
GRE_Q
GRE_V
MAT
AR
1.000000
1.000000
1.000000
1.000000
Model
R-Square
0.3735
0.3381
0.3651
0.3853
F Value
Pr > F
16.69
14.30
16.10
17.55
0.0003
0.0008
0.0004
0.0003
All predictors have p < the slentry value of .05.
AR has the lowest p.
AR enters first.
Step 2
Statistics for Entry
DF = 1,27
Variable
Tolerance
GRE_Q
GRE_V
MAT
0.742099
0.835714
0.724599
Model
R-Square
0.5033
0.5155
0.4923
F Value
Pr > F
6.41
7.26
5.69
0.0174
0.0120
0.0243
All predictors have p < the slentry value of .05.
GRE-V has the lowest p.
GRE-V enters second.
Step 3
Statistics for Entry
DF = 1,26
Variable
Tolerance
GRE_Q
MAT
0.659821
0.670304
Model
R-Square
0.5716
0.5719
F Value
Pr > F
3.41
3.42
0.0764
0.0756
No predictor has p < .05, forward selection terminates.
The Final Model
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard t Value
Error
Pr > |t|
Standard Squared
ized
SemiEstimate partial
Corr
Type II
Intercept
1
0.49718
0.57652 0.86
0.3961
0
GRE_V
1
0.00285
0.00106 2.69
0.0120
0.39470 0.13020
AR
1
0.32963
0.10483 3.14
0.0040
0.46074 0.17740
.
R2 = .516, F(2, 27) = 14.36, p < .001
Backward Selection
b: MODEL GPA = GRE_Q GRE_V MAT AR /
STB SCORR2
selection=backward slstay = .05 details;
run;
• We start out with a simultaneous multiple
regression, including all predictors.
• Then we trim that model.
Step 1
Variable
Parameter
Estimate
Intercept -1.73811
GRE_Q 0.00400
GRE_V 0.00152
MAT
0.02090
AR
0.14423
Standard
Error
0.95074
0.00183
0.00105
0.00955
0.11300
Type II SS F Value Pr > F
0.50153
0.71582
0.31588
0.71861
0.24448
3.34
4.77
2.11
4.79
1.63
0.0795
0.0385
0.1593
0.0382
0.2135
GRE-V and AR have p values that exceed the slstay value of
.05.
AR has the larger p, it is dropped from the model.
Step 2
Statistics for Removal
DF = 1,26
Variable
Partial
R-Square
GRE_Q
0.1236
GRE_V
0.0340
MAT
0.1318
Model
R-Square
0.4935
0.5830
0.4852
F Value
Pr > F
8.39
2.31
8.95
0.0076
0.1405
0.0060
Only GRE_V has p > .05, it is dropped from the model.
Step 3
Statistics for Removal
DF = 1,27
Variable
Partial
R-Square
GRE_Q
0.2179
MAT
0.2095
Model
R-Square
0.3651
0.3735
F Value
Pr > F
14.11
13.56
0.0008
0.0010
No predictor has p < .05, backwards elimination halts.
The Final Model
Parameter Estimates
Variable DF
Parameter
Estimate
Standard t Value
Error
Pr > |t|
Standard Squared
ized
SemiEstimate partial
Corr
Type II
Intercept 1
-2.12938
0.92704 -2.30
0.0296
0
GRE_Q
1
0.00598
0.00159 3.76
0.0008
0.48438 0.21791
MAT
1
0.03081
0.00836 3.68
0.0010
0.47494 0.20950
R2 = .5183, F(2, 27) = 18.87, p < .001
.
What the F Test?
• Forward selection led to a model with AR
and GRE_V
• Backward selection led to a model with
MAT and GRE_Q.
• I am getting suspicious about the utility of
procedures like this.
Fully Stepwise Selection
c: MODEL GPA = GRE_Q GRE_V MAT AR /
STB SCORR2
selection=stepwise slentry=.08
slstay = .08 details; run;
• Like forward selection, but, once added to
the model, a predictor is considered for
elimination in subsequent steps.
Step 3
• Steps 1 and 2 are identical to those of
forward selection, but with slentry set to
.08, MAT enters the model.
Statistics for Entry
DF = 1,26
Variable
Tolerance
GRE_Q
MAT
0.659821
0.670304
Model
R-Square
0.5716
0.5719
F Value
Pr > F
3.41
3.42
0.0764
0.0756
Step 4
• GRE_Q enters. Now we have every
predictor in the model
Statistics for Entry
DF = 1,25
Variable
Tolerance
GRE_Q
0.653236
Model
R-Square
0.6405
F Value
Pr > F
4.77
0.0385
Step 5
• Once GRE_Q is in the model, AR and
GRE_V become eligible for removal.
Statistics for Removal
DF = 1,25
Variable
Partial
R-Square
GRE_Q
0.0686
GRE_V
0.0303
MAT
0.0689
AR
0.0234
Model
R-Square
0.5719
0.6102
0.5716
0.6170
F Value
Pr > F
4.77
2.11
4.79
1.63
0.0385
0.1593
0.0382
0.2135
Step 6
• AR out, GRE_V still eligible for removal.
Statistics for Removal
DF = 1,26
Variable
Partial
R-Square
GRE_Q
0.1236
GRE_V
0.0340
MAT
0.1318
Model
R-Square
0.4935
0.5830
0.4852
F Value
Pr > F
8.39
2.31
8.95
0.0076
0.1405
0.0060
Step 7
• At this point, no variables in the model are
eligible for removal
• And no variables not in the model are
eligible for entry.
• The final model includes MAT and GRE_Q
• Same as the final model with backwards
selection.
R-Square Selection
• d: MODEL GPA = GRE_Q GRE_V MAT
AR / selection=rsquare cp mse; run;
• Test all one predictor models, all two
predictor models, and so on.
• Goal is the get highest R2 with fewer than
all predictors.
One Predictor Models
Number in
Model
1
1
1
1
R-Square
C(p)
MSE
0.3853
0.3735
0.3651
0.3381
16.7442
17.5642
18.1490
20.0268
0.22908
0.23348
0.23661
0.24667
Variables
in Model
AR
GRE_Q
MAT
GRE_V
One Predictor Models
• AR yields the highest R2
• C(p) = 16.74, MSE = .229
• Mallows says best model will be that with
small C(p) and value of C(p) near that of p
(number of parameters in the model).
• p here is 2 – one predictor and the
intercept
• Howell suggests one keep adding
predictors until MSE starts increasing.
Two Predictor Models
Number in R-Square
Model
2
0.5830
C(p)
MSE
4.9963
0.16116
2
2
2
0.5155
0.5033
0.4935
9.6908
10.5388
11.2215
0.18725
0.19196
0.19575
2
2
0.4923
0.4852
11.3019
11.7943
0.19620
0.19894
Variables
in Model
GRE_Q
MAT
GRE_V AR
GRE_Q AR
GRE_V
MAT
MAT AR
GRE_Q
GRE_V
Two Predictor Models
• Compared to the best one predictor
model, that with MAT and GRE_Q has
– Considerably higher R2
– Considerably lower C(p)
– Value of C(p), 5, close to value of p, 3.
– Considerably lower MSE
Three Predictor Models
Number in R-Square
Model
3
0.6170
C(p)
MSE
4.6292
0.15369
3
0.6102
5.1050
0.15644
3
0.5719
7.7702
0.17182
3
0.5716
7.7888
0.17193
Variables
in Model
GRE_Q
GRE_V
MAT
GRE_Q
MAT AR
GRE_V
MAT AR
GRE_Q
GRE_V AR
Three Predictor Models
• Adding GRE_V to the best two predictor
model (GRE_Q and MAT)
– Slightly increases R2 (from .58 to .62)
– Reduces [C(p) – p] from 2 to .6
– Reduces MSE from .16 to .15
• None of these stats impress me much, I
am inclined to take the GRE_Q, MAT
model as being best.
Closer Look at MAT, GRE_Q, GRE_V
• e: MODEL GPA = GRE_Q GRE_V MAT /
ParameterSTB
Estimates
SCORR2; run;
Variable DF
Parameter
Estimate
Standard t Value
Error
Pr > |t|
Standard Squared
ized
SemiEstimate partial
Corr
Type II
Intercept 1
-2.14877
0.90541 -2.37
0.0253
0
GRE_Q
1
0.00493
0.00170 2.90
0.0076
0.39922 0.12357
GRE_V
1
0.00161
0.00106 1.52
0.1405
0.22317 0.03404
MAT
1
0.02612
0.00873 2.99
0.0060
0.40267 0.13180
.
Keep GRE_V or Not ?
• It does not have a significant partial effect
in the model, why keep it?
• Because it is free info. You get GRE-V
and GRE_Q for the same price as GRE_Q
along.
• Equi donati dentes non inspiciuntur.
– As (gift) horses age, their gums recede,
making them look long in the tooth.
Add AR ?
•
•
•
•
•
R2 increases from .617 to .640
C(p) = p (always true in full model)
MSE drops from .154 to .150
Getting AR data is expensive
Stop gathering the AR data, unless it has
some other value.
Conclusions
• Read
http://core.ecu.edu/psyc/wuenschk/StatHel
p/Stepwise-Voodoo.htm
• Treat all claims based on stepwise
algorithms as if they were made by
Saddam Hussein on a bad day with a
headache having a friendly chat with
George Bush.
Download