Uploaded by forreasonsandstuff

Model Selection Techniques

Lab 10
Model Selection
Model Selection
— Goal: find the best group of X variables that can predict Y
— *Don’t use model selection when you have a
specific hypothesis that you are testing!!
— Four ways to select variables for a prediction model:
— Best subsets
— Backward elimination
— Forward selection
— Stepwise selection
Best Subsets
— Considers all possible models with 1 covariate, then 2
covariates, then 3 covariates, etc., until the maximum model
with all possible covariates
— Total number of models = 2k-1 where k is the number of
potential covariates
— Only available within proc reg
— Syntax:
Use Mallow’s Cp or F test to assess which model is best!
Backward Elimination
— Start with the maximum model (model with all variables), and
eliminate the non-significant predictors
— Non significant predictors are those with a p-value above the
specified SLS (significance level to stay)
— Available within proc reg or proc glmselect
— Glmselect allows for easy inclusion of class variables and/or
interaction terms
— Syntax:
Any class variables
here, as in proc glm
Keeps all parts of the class
variable together (all parts
will be removed together)
Use significance level to remove
variables from the model
Type of selection
More liberal cutoff…
Shows statistics for
removed variables
Stops when all variables in the
model have a p<0.15
Shows p-values
for all models
Forward Selection
— Start with a model with no covariates, and add in X variables one
at a time if they are significant at the SLE (significance level for
— Again, can use proc reg or proc glmselect
— Syntax:
Any class variables
here, as in proc glm
Keeps all parts of the class
variable together (all parts
will be entered together)
Use significance level to enter
variables into the model
Type of selection
More liberal cutoff…
Shows statistics for
entered variables
Stops when all variables in the
model have a p<0.15
Shows p-values
for all models
Stepwise Selection
— Combination of forward and backward selection
— Specify an entrance significance level, and a significance level for
variables to remain in the model
— SAS will always remove a variable before adding a new one
— Syntax:
Any class variables
here, as in proc glm
Keeps all parts of the class
variable together (all parts
will be removed together)
Use significance level to enter or
remove variables from the model
More liberal cutoff…
Shows statistics for
Type of selection
entered or removed
Shows p-values
Stops when all variables in the
for all models
model have a p<0.15
Additional points
— Remember that these are automated processes, and must be
checked by hand!
— Be aware of entry of some dummy variables and not others
(ex: white race, but not Asian race), which is only a
problem in proc reg model selection
— Be aware of entry of higher order variables without lower term
ones (ex: height2 without height)
— Be sure to evaluate assumptions of linear regression
for the final model
— Consider possible interactions, and assess model for any
collinearity issues
Additional points
— Remember that you should NOT use
model selection procedures when you are
testing a specific hypothesis!