Handout #14: Model Selection Procedures Example 14.1: Consider

advertisement
Handout #14: Model Selection Procedures
Example 14.1: Consider the MN Marriage Amendment data from our course website. This dataset
contains several potential variables. The primary goal of this investigation is to identify which
(predictor) variables impacted the percentage of people that voted for Amendment #1.
A snip-it of the dataset is provided here.
Linear Regression Setup
ο‚·
ο‚·
Response Variable: Percentage of People that Voted for Amendment #1
Predictor Variables:
o Percent of Population who voted for Obama
o Unemployment Rate
o Percent of Population Living in Poverty
o Median Household Income
o Percent of Population: Age 0-17
o Percent of Population: Age 18-24
o Percent of Population: Age 25-44
o Percent of Population: Age 45-65
o Percent of Population: Age Over 65
o Percent of Population: White
o Percent of Population: African American
o Percent of Population: American Indian
o Percent of Population: Asian
o Percent of Population: Other
o Percent of Population: Of Hispanic Origin
ο‚·
Implied structure for mean and variance functions
o
o
𝐸(π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‰π‘œπ‘‘π‘’π‘‘ π‘Œπ‘’π‘  π‘“π‘œπ‘Ÿ π΄π‘šπ‘’π‘›π‘‘π‘šπ‘’π‘›π‘‘ #1 | … ) = 𝛽0 + β‹―
π‘‰π‘Žπ‘Ÿ(π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‰π‘œπ‘‘π‘’π‘‘ π‘Œπ‘’π‘  π‘“π‘œπ‘Ÿ π΄π‘šπ‘’π‘›π‘‘π‘šπ‘’π‘›π‘‘ #1| … ) = 𝜎 2
1
Initial Regression Model – All Predictors
Question
1. Consider the variation in Percent Vote Yes in the following plot.
Average: 59.9, Std Dev = 8.77, Variance = 76.87, Count=87
What proportion of the variation in Percent Vote Yes can be explained by all of these
predictors? Discuss.
2
Comment Regarding R2:
ο‚·
R2 measures the proportion of variation in the response that can be explained by the predictor
variables in the model.
𝑅2 = 1 −
π‘†π‘’π‘š π‘œπ‘“ π‘†π‘žπ‘’π‘Žπ‘Ÿπ‘’π‘‘ πΈπ‘Ÿπ‘Ÿπ‘œπ‘Ÿ
π‘†π‘’π‘š π‘œπ‘“ π‘†π‘žπ‘’π‘Žπ‘Ÿπ‘’π‘‘ π‘‡π‘œπ‘‘π‘Žπ‘™
Visual Representation of R2
ο‚·
Concern: Each time we add a predictor variable to the model, the sum of squared error can only
decrease; however, the sums of squared total remains constant. Thus, R2 will always increase as
predictor variable(s) are added to the model. This is true regardless of the worthiness of the
added predictor variable(s).
ο‚·
Fix: Adjusted R2 which is given by the quantity
𝐴𝑑𝑗𝑒𝑠𝑑𝑒𝑑 𝑅 2 = 𝑅 2 − [(1 − 𝑅 2 ) ∗
# π‘ƒπ‘Ÿπ‘’π‘‘π‘–π‘π‘‘π‘œπ‘Ÿπ‘ 
]
𝑛 − (#π‘ƒπ‘Ÿπ‘’π‘‘π‘–π‘π‘‘π‘œπ‘Ÿπ‘  + 1)
A second formula -- which is easier to use when computing Adjusted R2 is given by
π‘†π‘’π‘š π‘œπ‘“ π‘†π‘žπ‘’π‘Žπ‘Ÿπ‘’π‘‘ πΈπ‘Ÿπ‘Ÿπ‘œπ‘Ÿ
⁄𝑑𝑓
π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
𝐴𝑑𝑗𝑒𝑠𝑑𝑒𝑑 𝑅 2 = 1 −
π‘†π‘’π‘š π‘œπ‘“ π‘†π‘žπ‘’π‘Žπ‘Ÿπ‘’π‘‘ π‘‡π‘œπ‘‘π‘Žπ‘™
⁄𝑑𝑓
π‘‘π‘œπ‘‘π‘Žπ‘™
π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ 𝑖𝑛 πΆπ‘œπ‘›π‘‘π‘–π‘‘π‘–π‘œπ‘›π‘Žπ‘™ π·π‘–π‘ π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘–π‘œπ‘›
1−
π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ 𝑖𝑛 π‘ˆπ‘›π‘π‘œπ‘›π‘‘π‘–π‘‘π‘–π‘œπ‘›π‘Žπ‘™ π·π‘–π‘ π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘–π‘œπ‘›
ο‚·
Adjusted R2 can *not* be interpreted directly as the “proportion of variation in the response
being explained by the predictors.” However, this quantity can be used to better quantify our
desire to reduce the unexplained variation in the response using a minimum number of
predictor variables. This concept is commonly referred to as having a parsimonious model.
Verify the calculation of the adjusted R2 quantity provided by JMP
3
Wiki Entry for Adjusted R2
http://en.wikipedia.org/wiki/Coefficient_of_determination
Consider once again the entire list of predictor variables used in our initial model above. Some of these
predictors appear to be important to helping to explain the variation in Percent Vote Yes and the others
appear not so important.
Entire list of parameter estimates
A list, sorted by importance, is provided by JMP
Questions:
2. Which predictor variables appear to important in explaining Percent Vote Yes?
4
3. Why cannot the absolute size of the estimated effect be used to rank the importance of a
predictor variable?
4. What is a reasonable way to rank the importance of these predictors? Discuss.
Model Selection – Finding the Best Model
ο‚·
Implied structure for mean and variance functions
o
o
𝐸(π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‰π‘œπ‘‘π‘’π‘‘ π‘Œπ‘’π‘  π‘“π‘œπ‘Ÿ π΄π‘šπ‘’π‘›π‘‘π‘šπ‘’π‘›π‘‘ #1 | ? ? ? ? ) = 𝛽0 + β‹―
π‘‰π‘Žπ‘Ÿ(π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‰π‘œπ‘‘π‘’π‘‘ π‘Œπ‘’π‘  π‘“π‘œπ‘Ÿ π΄π‘šπ‘’π‘›π‘‘π‘šπ‘’π‘›π‘‘ #1| ? ? ? ? ) = 𝜎 2
The model building feature in JMP can be invoked by selecting Stepwise from the Personality: box in the
Fit Model window.
.
ο‚·
ο‚·
Y box (Response): Percentage of People that Voted for Amendment #1
Model Effect box: The entire list of candidate predictor variables.
o Any type of predictor variable can be considered in a model selection process, e.g.
categorical predictors, transformed predictor variables, interaction terms, etc.
5
The following Fit Stepwise window is provided.
Consider only the bottom portion of the output provided.
The effect of adding predictor variables to the model can be easily seen by selecting any number of
predictor variables in the Entered Column.
Model #1: Try Percent Obama as a predictor first.
𝐸(π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‰π‘œπ‘‘π‘’ π‘Œπ‘’π‘  | π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‚π‘π‘Žπ‘šπ‘Ž ) = 𝛽0 + 𝛽1 ∗ π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‚π‘π‘Žπ‘šπ‘Ž
Model #2: Let’s add another predictor variable -- skipping over Unemployment as this predictor does not
appear important (p-value = 0.27).
𝐸(π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‰π‘œπ‘‘π‘’ π‘Œπ‘’π‘  | π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‚π‘π‘Žπ‘šπ‘Ž, π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘ƒπ‘œπ‘£π‘’π‘Ÿπ‘‘π‘¦ ) = 𝛽0 + 𝛽1 ∗ π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘‚π‘π‘Žπ‘šπ‘Ž + 𝛽2 ∗ π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ π‘ƒπ‘œπ‘£π‘’π‘Ÿπ‘‘π‘¦
6
Notice, that after Percent Poverty was added to the model, the effect of Unemployment now appears to
be important. Such anomalies are not uncommon. The R2 and Adjusted R2 values suggest
Unemployment does indeed produce a reduction in the unexplained variation.
A quick check of the added variable plots suggest that Unemployment is indeed adding something to our
model.
The Model Selection – Step by Step
JMP has automated the process of adding predictor variables to a model. JMP has incorporated a
variety of commonly used methodologies into its procedures. JMP is one of the best software packages
that I have used for model selection.
7
1st Predictor to be put into the model is Percent Obama…
Criteria to Section Process
ο‚·
Adjusted R2 – larger is better
ο‚·
Mallows’ Cp -- closest to number of predictors in model is best
Calculating this quantity for the example above.
π‘€π‘Žπ‘™π‘™π‘œπ‘€ ′ 𝑠 𝐢𝑝
π‘†π‘’π‘š π‘œπ‘“ π‘†π‘žπ‘’π‘Žπ‘Ÿπ‘’π‘  πΈπ‘Ÿπ‘Ÿπ‘œπ‘Ÿ
=(
) − (𝑛 − 2 ∗ # π‘π‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿ)
π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ 𝑖𝑛 πΆπ‘œπ‘›π‘‘π‘–π‘‘π‘–π‘œπ‘›π‘Žπ‘™ π·π‘–π‘ π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘–π‘œπ‘› π‘€π‘–π‘‘β„Ž π‘Žπ‘™π‘™ π‘π‘Ÿπ‘’π‘‘π‘–π‘π‘‘π‘œπ‘Ÿπ‘ 
3862.77
=(
) − (87 − 2 ∗ 2)
8.279
= 383.55
ο‚·
Akaike’s Information Criteria – smaller is better
ο‚·
Bayesian Information Criterion – smaller is better
8
Automating the Entire Process
Clicking the Go button instead of the Step button will proceed by adding one variable at a time until all
predictor variables have been considered. This approach is called Forward Selection. After finishing this
process, JMP will identify the predictor variables deem important via this process.
From the red drop down arrow, you can select Plot Criterion History.
9
Clicking the Run Model button gives the output from the fitted model with the identified predictor
variables.
Specifying the Stopping Criteria
10
Specifying the Direction
Procedure for Mixed Direction
1. Choose αentry ≤ αstay (be somewhat generous setting these, say 0.05 up to 0.25).
2. Fit a simple linear regression model for each of the potential predictor variables. Then, obtain
the p-value to test whether or not the regression is useful. Let X1 denote the variable with the
smallest p-value.
ο‚· If p-value < αentry then X1 is retained.
ο‚· If p-value > αentry then the procedure ends.
3. Fit all 2-variable models, where X1 is one of the pair. Obtain the p-value to test whether the
coefficient of the second variable is zero, and let X2 be the variable with the smallest p-value.
ο‚· If p-value > αentry then the procedure ends.
ο‚· If p-value < αentry then X2 is retained. Then, we check to see if X1 is still needed in the model.
If p-value > αstay, then X1 is removed and we go to the beginning of Step 3. If the p-value is
less than αstay, then X1 and X2 are both retained.
4. Examine which predictor is the next candidate for addition, and then examine whether any
other variables already in the model should be dropped. This procedure continues until no
variables can be either added or deleted.
Procedure for Forward:
The procedure is very similar to stepwise regression; however, once a variable enters a model, it is never
removed.
Procedure for Backward :
The full model (with all predictors) is fit first, and the coefficient with the largest p-value is identified. If
this p-value is greater than αstay, then the variable is dropped from the model. The new model (minus
the one predictor) is then fit, and this process continues until no more predictors can be dropped.
11
Returning to our example – using p-value threshold approach
Upon careful examination, we see we obtain the same model as the Minimum BIC criteria.
Final Thoughts
1. The model deemed “best” may be different depending upon the approach.
2. There is no clear rule as to how to use these procedures to choose a single “best” model. However,
the outcomes from these search procedures can be used to identify a number of possible regression
models that warrant further consideration.
3. Use experience, judgment, and the background science when model building! If you believe a
predictor variable is fundamental, then include it regardless of the results of the search procedure.
12
Example 14.2: For this example, consider the Percent Vote Yes for Amendment #2 – the MN Voter ID
Amendment, as the response. Use the entire set of predictor variables at your disposal in this dataset.
Tasks:
1. Obtain a model which contains only the necessary predictor variables as determined by the
Minimum BIC criteria.
2. Use the P-Value threshold approach to obtain the necessary list of predictor variables. Do the
two approaches agree?
3. Use the “Big Test” presented in Handout #10 to confirm your findings in Task #1. That is,
conduct a formal statistical test that show that only a subset of the predictor variables is
necessary to model the response variable.
13
Download