Stat 301– Lecture 27 Forward Selection

advertisement
Stat 301– Lecture 27
Forward Selection
The Forward selection
procedure looks to add
variables to the model.
 Once added, those variables
stay in the model even if they
become insignificant at a
later step.

1
Backward Selection
The Backward selection
procedure looks to remove
variables from the model.
 Once removed, those variables
cannot reenter the model even
if they would add significantly
at a later step.

2
Mixed Selection
A combination of the Forward and
Backward selection procedures.
 Starts out like Forward selection
but looks to see if an added
variable can be removed at a
later step.

3
Stat 301– Lecture 27
Mixed – Set up
Stepwise Fit for MDBH
Stepwise Regression Control
Stopping Rule: P-value Threshold
Prob to Enter
Prob to Leave
Direction:
SSE
10.3855
0.25
0.25
Mixed
DFE
RMSE RSquare RSquare Adj
19 0.7393276
0.0000
0.0000
Cp
102.4885
p
AICc
BIC
1 48.35699 49.64257
Current Estimates
Lock Entered Parameter
Intercept
X1
X2
X3
Estimate nDF
SS "F Ratio" "Prob>F"
6.265
1
0
0.000
1
0
1 6.207045
26.739
6.42e-5
0
1 0.616027
1.135
0.30079
0
1 7.335255
43.287
3.52e-6
Step History
Step
Parameter Action "Sig Prob"
Seq SS RSquare
Cp
p
AICc
BIC
4
Stepwise Regression Control
Direction – Mixed
 Prob to Enter – controls what
variables are added.
 Prob to Leave – controls what
variables are removed.
 Prob to Enter = Prob to Leave

5
Current Estimates
The current estimates are
exactly the same as with the
Forward selection procedure.
 Clicking on Step will initiate the
Mixed procedure that starts like
the Forward procedure.

6
Stat 301– Lecture 27
Stepwise Fit for MDBH
Stepwise Regression Control
Stopping Rule: P-value Threshold
Prob to Enter
Prob to Leave
Direction:
SSE
3.0502449
0.25
0.25
Mixed
DFE
RMSE RSquare RSquare Adj
Cp
18 0.4116528
0.7063
0.6900 19.387747
p
AICc
BIC
2 26.64733 28.13453
Current Estimates
Lock Entered Parameter
Intercept
X1
X2
X3
Estimate nDF
SS "F Ratio" "Prob>F"
3.8956688
1
0
0.000
1
0
1 1.000159
8.294
0.0104
0
1 0.403296
2.590
0.12594
32.9371533
1 7.335255
43.287
3.52e-6
Step History
Step
1
Parameter Action "Sig Prob"
Seq SS RSquare
Cp
X3
Entered
0.0000 7.335255
0.7063 19.388
p
AICc
BIC
2 26.6473 28.1345
7
Current Estimates – Step 1


X3 is added to the model
Predicted MDBH = 3.896 +
32.937*X3
R2=0.7063
MSE  3.0502499 / 18  0.4117
 RMSE =

8
Current Estimates – Step 1
By clicking on Step you will
invoke the Backward part of the
Mixed procedure.
 Because X3 is statistically
significant and is the only
variable in the model, clicking
on Step will not do anything.

9
Stat 301– Lecture 27
Current Estimates – Step 1

Of the remaining variables not in
the model X1 will add the largest
sum of squares if added to the
model.
SS = 1.000
“F Ratio” = 8.294
 “Prob>F” =0.0104


10
JMP Mixed – Step 2

Because X1 will add the largest
sum of squares and that
addition is statistically
significant, by clicking on Step,
JMP will add X1 to the model
with X3.
11
Stepwise Fit for MDBH
Stepwise Regression Control
Stopping Rule: P-value Threshold
Prob to Enter
Prob to Leave
Direction:
SSE
2.0500859
0.25
0.25
Mixed
DFE
RMSE RSquare RSquare Adj
Cp
17 0.3472654
0.8026
0.7794 9.7842935
p
3
AICc
BIC
21.8672 23.18346
Current Estimates
Lock Entered Parameter
Intercept
X1
X2
X3
Estimate nDF
SS "F Ratio" "Prob>F"
3.14321066
1
0
0.000
1
0.03136639
1 1.000159
8.294
0.0104
0
1 0.670967
7.784
0.01311
22.953752
1 2.128369
17.649
0.0006
Step History
Step
1
2
Parameter Action "Sig Prob" Seq SS RSquare
Cp
Entered
0.0000 7.335255
0.7063 19.388
X3
Entered
0.0104 1.000159
0.8026 9.7843
X1
p
AICc
BIC
2 26.6473 28.1345
3 21.8672 23.1835
12
Stat 301– Lecture 27
Current Estimates – Step 2


X1 is added to the model
Predicted MDBH = 3.143 +
0.0314*X1 + 22.954*X3
R2=0.8026
MSE  2.0500859 / 17  0.3473
 RMSE =

13
Current Estimates – Step 2
By clicking on Step you will
invoke the Backward part of the
Mixed procedure.
 Because X3 and X1 are
statistically significant, clicking
on Step will not do anything.

14
Current Estimates – Step 2

Of the remaining variables not in
the model X2 will add the largest
sum of squares if added to the
model.
SS = 0.671
 “F Ratio” = 7.784
 “Prob>F” =0.0131

15
Stat 301– Lecture 27
JMP Mixed – Step 3

Because X2 will add the largest
sum of squares and that
addition is statistically
significant, by clicking on Step,
JMP will add X2 to the model
with X3 and X1.
16
Stepwise Fit for MDBH
Stepwise Regression Control
Stopping Rule: P-value Threshold
Prob to Enter
Prob to Leave
Direction:
SSE
1.3791191
0.25
0.25
Mixed
DFE
RMSE RSquare RSquare Adj
16 0.2935898
0.8672
0.8423
Cp
4
p
AICc
BIC
4 17.55751 18.25046
Current Estimates
Lock Entered Parameter
Intercept
X1
X2
X3
Estimate nDF
SS "F Ratio" "Prob>F"
3.23573225
1
0
0.000
1
0.09740562
1 1.26783
14.709
0.00146
-0.0001689
1 0.670967
7.784
0.01311
3.46681347
1 0.014774
0.171
0.68437
Step History
Step
1
2
3
Parameter
X3
X1
X2
Action "Sig Prob" Seq SS RSquare
Cp
Entered
0.0000 7.335255
0.7063 19.388
Entered
0.0104 1.000159
0.8026 9.7843
Entered
0.0131 0.670967
0.8672
4
p
AICc
BIC
2 26.6473 28.1345
3 21.8672 23.1835
4 17.5575 18.2505
17
Current Estimates – Step 3


X2 is added to the model
Predicted MDBH = 3.236 +
0.0974*X1 – 0.000169*X2 +
3.467*X3
R2=0.8672
 RMSE =
MSE  1.3791191 / 16  0.2936

18
Stat 301– Lecture 27
Current Estimates – Step 3


By clicking on Step you will invoke
the Backward part of the Mixed
procedure.
Note that variable X3 is no longer
statistically significant and so it will
be removed from the model when
you click on Step.
19
Stepwise Fit for MDBH
Stepwise Regression Control
Stopping Rule: P-value Threshold
Prob to Enter
Prob to Leave
Direction:
SSE
1.3938931
0.25
0.25
Mixed
DFE
RMSE RSquare RSquare Adj
Cp
17 0.2863454
0.8658
0.8500 2.1714023
p
AICc
BIC
3 14.15158 15.46784
Current Estimates
Lock Entered Parameter
Intercept
X1
X2
X3
Estimate nDF
SS "F Ratio" "Prob>F"
3.26051366
1
0
0.000
1
0.10691347
1 8.37558 102.149
1.32e-8
-0.0001898
1 2.784561
33.961
0.00002
0
1 0.014774
0.171
0.68437
Step History
Step
1
2
3
4
Parameter
X3
X1
X2
X3
Action
"Sig Prob"
Entered
0.0000
Entered
0.0104
Entered
0.0131
Removed
0.6844
Seq SS RSquare
Cp
7.335255
0.7063 19.388
1.000159
0.8026 9.7843
0.670967
0.8672
4
0.014774
0.8658 2.1714
p
2
3
4
3
AICc
26.6473
21.8672
17.5575
14.1516
BIC
28.1345
23.1835
18.2505
15.4678
20
Current Estimates – Step 4
X3 is removed from the model
 Predicted MDBH = 3.2605 +
0.1069*X1 – 0.0001898*X2

R2=0.8658
 RMSE = MSE  1.3938931 / 17  0.2863

21
Stat 301– Lecture 27
Current Estimates – Step 4



Because X1 and X2 add significantly
to the model they cannot be
removed.
Because X3 will not add significantly
to the model it cannot be added.
The Mixed procedure stops.
22
Response MDBH
Summary of Fit
RSquare
0.865785
RSquare Adj
0.849995
Root Mean Square Error
0.286345
Mean of Response
6.265
Observations (or Sum Wgts)
20
Analysis of Variance
Source
Model
Error
C. Total
DF
2
17
19
Sum of
Squares Mean Square
F Ratio
8.991607
4.49580 54.8311
0.08199 Prob > F
1.393893
10.385500
<.0001*
Parameter Estimates
Term
Intercept
X1
X2
Estimate Std Error t Ratio Prob>|t|
3.2605137 0.333024
9.79 <.0001*
0.1069135 0.010578 10.11 <.0001*
-0.00019 3.256e-5
-5.83 <.0001*
Effect Tests
Source
X1
X2
Nparm
1
1
DF
1
1
Sum of
Squares
F Ratio Prob > F
8.3755800 102.1490 <.0001*
2.7845615 33.9607 <.0001*
23
Finding the “best” model
For this example, the Forward
selection procedure did not find
the “best” model.
 The Backward and Mixed
selection procedures came up
with the “best” model.

24
Stat 301– Lecture 27
Finding the “best” model
None of the automatic selection
procedures are guaranteed to
find the “best” model.
 The only way to be sure, is to
look at all possible models.

25
All Possible Models




For k explanatory variables there
k
are 2  1 possible models.
There are k 1-variable models.
k 
There are 
 2-variable models.
 2
There are 
k
 3


 3-variable

models.
26
All Possible Models




When confronted with all possible
models, we often rely on summary
statistics to describe features of
the models.
R2: Larger is better
adjR2: Larger is better
RMSE: Smaller is better
27
Stat 301– Lecture 27
All Possible Models

Another summary statistic used
to assess the fit of a model is
Mallows Cp.
 SSE p
C p  
 MSEFull

  (n  2 p); p  k  1

28
MDBH Example
Model with X1 and X2 (p=3)
 SSEp=1.3938931
 MSEFull=0.08619

 1.3938931 
Cp  
  (20  6)  16.1714  14  2.1714
 0.0861949 
29
All Possible Models

The smaller Cp is the “better”
the fit of the model.
 The full model will have
Cp=p.
30
Stat 301– Lecture 27
All Possible Models

There are several criteria that
incorporate the maximized
value of the Likelihood function,
L. Which gives the probability of
getting the sample data given
the best estimates of the
parameters in the model.
31
All Possible Models




Another summary statistic is the
Akaike Information Criterion or AIC.
AIC = 2p – 2ln(L)
AICc = AIC +2p(p+1)/(n –p–1)
The smaller the AICc the “better”
the fit of the model.
32
All Possible Models
Another summary statistic is the
Bayesian Information Criterion
of BIC.
 BIC = –2ln(L) + pln(n)
 The smaller the BIC the better
the fit of the model.

33
Stat 301– Lecture 27
JMP – Fit Model
Personality – Stepwise – Run
 Red triangle pull down next to
Stepwise Fit

All Possible Models
 Maximum number of terms: 3
 Number of best models: 3

34
All Possible Models

Right click on table – Columns

Check Cp
35
Stepwise Fit for MDBH
SSE
10.3855
DFE
RMSE RSquare RSquare Adj
19 0.7393276
0.0000
0.0000
Cp
102.4885
p
AICc
BIC
1 48.35699 49.64257
All Possible Models
Ordered up to best 3 models up to 3 terms per model.
Model
Number RSquare
X3
1
0.7063
X1
1
0.5977
X2
1
0.0593
X1,X2
2
0.8658
X1,X3
2
0.8026
X2,X3
2
0.7451
X1,X2,X3
3
0.8672
RMSE
0.4117
0.4818
0.7367
0.2863
0.3473
0.3946
0.2936
AICc
26.6473
32.9417
49.9281
14.1516
21.8672
26.9777
17.5575
BIC
28.1345
34.4289
51.4153
15.4678
23.1835
28.2940
18.2505
Cp
19.3877
32.4768
97.3416
2.1714
9.7843
16.7089
4.0000
0.9
0.8
0.7
0.6
0.5
0.4
X3
0.3
X1,X2
X1,X2,X3
0.2
0.1
1
2
3
4
5
6
p = Number of Terms
36
Stat 301– Lecture 27
All Possible Models
Lists all 7 models.
 1-variable models – listed in
order of the R2 value.
 2-variable models – listed in
order of the R2 value.
 3-variable (full) model.

37
All Possible Models
Model with X1, X2, X3 –
Highest R2 value.
 Model with X1, X2 – Lowest
RMSE, Lowest AICc, Lowest
BIC, and lowest Cp.

38
Best Model
Which is “best”?
 According to our definition of
“best” we can’t tell until we look
at the significance of the
individual variables in the
model.

39
Download