Handout #14: Model Selection Procedures Example 14.1: Consider

Handout #14: Model Selection Procedures Example 14.1: Consider the MN Marriage Amendment data from our course website. This dataset contains several potential variables. The primary goal of this investigation is to identify which (predictor) variables impacted the percentage of people that voted for Amendment #1. A snip-it of the dataset is provided here. Linear Regression Setup   Response Variable: Percentage of People that Voted for Amendment #1 Predictor Variables: o Percent of Population who voted for Obama o Unemployment Rate o Percent of Population Living in Poverty o Median Household Income o Percent of Population: Age 0-17 o Percent of Population: Age 18-24 o Percent of Population: Age 25-44 o Percent of Population: Age 45-65 o Percent of Population: Age Over 65 o Percent of Population: White o Percent of Population: African American o Percent of Population: American Indian o Percent of Population: Asian o Percent of Population: Other o Percent of Population: Of Hispanic Origin  Implied structure for mean and variance functions o o 𝐸(𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑉𝑜𝑡𝑒𝑑 𝑌𝑒𝑠 𝑓𝑜𝑟 𝐴𝑚𝑒𝑛𝑑𝑚𝑒𝑛𝑡 #1 | … ) = 𝛽0 + ⋯ 𝑉𝑎𝑟(𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑉𝑜𝑡𝑒𝑑 𝑌𝑒𝑠 𝑓𝑜𝑟 𝐴𝑚𝑒𝑛𝑑𝑚𝑒𝑛𝑡 #1| … ) = 𝜎 2 1 Initial Regression Model – All Predictors Question 1. Consider the variation in Percent Vote Yes in the following plot. Average: 59.9, Std Dev = 8.77, Variance = 76.87, Count=87 What proportion of the variation in Percent Vote Yes can be explained by all of these predictors? Discuss. 2 Comment Regarding R2:  R2 measures the proportion of variation in the response that can be explained by the predictor variables in the model. 𝑅2 = 1 − 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑇𝑜𝑡𝑎𝑙 Visual Representation of R2  Concern: Each time we add a predictor variable to the model, the sum of squared error can only decrease; however, the sums of squared total remains constant. Thus, R2 will always increase as predictor variable(s) are added to the model. This is true regardless of the worthiness of the added predictor variable(s).  Fix: Adjusted R2 which is given by the quantity 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅 2 = 𝑅 2 − [(1 − 𝑅 2 ) ∗ # 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 ] 𝑛 − (#𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 + 1) A second formula -- which is easier to use when computing Adjusted R2 is given by 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟 ⁄𝑑𝑓 𝑒𝑟𝑟𝑜𝑟 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅 2 = 1 − 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑇𝑜𝑡𝑎𝑙 ⁄𝑑𝑓 𝑡𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑖𝑛 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 1− 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑖𝑛 𝑈𝑛𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛  Adjusted R2 can *not* be interpreted directly as the “proportion of variation in the response being explained by the predictors.” However, this quantity can be used to better quantify our desire to reduce the unexplained variation in the response using a minimum number of predictor variables. This concept is commonly referred to as having a parsimonious model. Verify the calculation of the adjusted R2 quantity provided by JMP 3 Wiki Entry for Adjusted R2 http://en.wikipedia.org/wiki/Coefficient_of_determination Consider once again the entire list of predictor variables used in our initial model above. Some of these predictors appear to be important to helping to explain the variation in Percent Vote Yes and the others appear not so important. Entire list of parameter estimates A list, sorted by importance, is provided by JMP Questions: 2. Which predictor variables appear to important in explaining Percent Vote Yes? 4 3. Why cannot the absolute size of the estimated effect be used to rank the importance of a predictor variable? 4. What is a reasonable way to rank the importance of these predictors? Discuss. Model Selection – Finding the Best Model  Implied structure for mean and variance functions o o 𝐸(𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑉𝑜𝑡𝑒𝑑 𝑌𝑒𝑠 𝑓𝑜𝑟 𝐴𝑚𝑒𝑛𝑑𝑚𝑒𝑛𝑡 #1 | ? ? ? ? ) = 𝛽0 + ⋯ 𝑉𝑎𝑟(𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑉𝑜𝑡𝑒𝑑 𝑌𝑒𝑠 𝑓𝑜𝑟 𝐴𝑚𝑒𝑛𝑑𝑚𝑒𝑛𝑡 #1| ? ? ? ? ) = 𝜎 2 The model building feature in JMP can be invoked by selecting Stepwise from the Personality: box in the Fit Model window. .   Y box (Response): Percentage of People that Voted for Amendment #1 Model Effect box: The entire list of candidate predictor variables. o Any type of predictor variable can be considered in a model selection process, e.g. categorical predictors, transformed predictor variables, interaction terms, etc. 5 The following Fit Stepwise window is provided. Consider only the bottom portion of the output provided. The effect of adding predictor variables to the model can be easily seen by selecting any number of predictor variables in the Entered Column. Model #1: Try Percent Obama as a predictor first. 𝐸(𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑉𝑜𝑡𝑒 𝑌𝑒𝑠 | 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑂𝑏𝑎𝑚𝑎 ) = 𝛽0 + 𝛽1 ∗ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑂𝑏𝑎𝑚𝑎 Model #2: Let’s add another predictor variable -- skipping over Unemployment as this predictor does not appear important (p-value = 0.27). 𝐸(𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑉𝑜𝑡𝑒 𝑌𝑒𝑠 | 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑂𝑏𝑎𝑚𝑎, 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑃𝑜𝑣𝑒𝑟𝑡𝑦 ) = 𝛽0 + 𝛽1 ∗ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑂𝑏𝑎𝑚𝑎 + 𝛽2 ∗ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑃𝑜𝑣𝑒𝑟𝑡𝑦 6 Notice, that after Percent Poverty was added to the model, the effect of Unemployment now appears to be important. Such anomalies are not uncommon. The R2 and Adjusted R2 values suggest Unemployment does indeed produce a reduction in the unexplained variation. A quick check of the added variable plots suggest that Unemployment is indeed adding something to our model. The Model Selection – Step by Step JMP has automated the process of adding predictor variables to a model. JMP has incorporated a variety of commonly used methodologies into its procedures. JMP is one of the best software packages that I have used for model selection. 7 1st Predictor to be put into the model is Percent Obama… Criteria to Section Process  Adjusted R2 – larger is better  Mallows’ Cp -- closest to number of predictors in model is best Calculating this quantity for the example above. 𝑀𝑎𝑙𝑙𝑜𝑤 ′ 𝑠 𝐶𝑝 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝐸𝑟𝑟𝑜𝑟 =( ) − (𝑛 − 2 ∗ # 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟) 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑖𝑛 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑎𝑙𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 3862.77 =( ) − (87 − 2 ∗ 2) 8.279 = 383.55  Akaike’s Information Criteria – smaller is better  Bayesian Information Criterion – smaller is better 8 Automating the Entire Process Clicking the Go button instead of the Step button will proceed by adding one variable at a time until all predictor variables have been considered. This approach is called Forward Selection. After finishing this process, JMP will identify the predictor variables deem important via this process. From the red drop down arrow, you can select Plot Criterion History. 9 Clicking the Run Model button gives the output from the fitted model with the identified predictor variables. Specifying the Stopping Criteria 10 Specifying the Direction Procedure for Mixed Direction 1. Choose αentry ≤ αstay (be somewhat generous setting these, say 0.05 up to 0.25). 2. Fit a simple linear regression model for each of the potential predictor variables. Then, obtain the p-value to test whether or not the regression is useful. Let X1 denote the variable with the smallest p-value.  If p-value < αentry then X1 is retained.  If p-value > αentry then the procedure ends. 3. Fit all 2-variable models, where X1 is one of the pair. Obtain the p-value to test whether the coefficient of the second variable is zero, and let X2 be the variable with the smallest p-value.  If p-value > αentry then the procedure ends.  If p-value < αentry then X2 is retained. Then, we check to see if X1 is still needed in the model. If p-value > αstay, then X1 is removed and we go to the beginning of Step 3. If the p-value is less than αstay, then X1 and X2 are both retained. 4. Examine which predictor is the next candidate for addition, and then examine whether any other variables already in the model should be dropped. This procedure continues until no variables can be either added or deleted. Procedure for Forward: The procedure is very similar to stepwise regression; however, once a variable enters a model, it is never removed. Procedure for Backward : The full model (with all predictors) is fit first, and the coefficient with the largest p-value is identified. If this p-value is greater than αstay, then the variable is dropped from the model. The new model (minus the one predictor) is then fit, and this process continues until no more predictors can be dropped. 11 Returning to our example – using p-value threshold approach Upon careful examination, we see we obtain the same model as the Minimum BIC criteria. Final Thoughts 1. The model deemed “best” may be different depending upon the approach. 2. There is no clear rule as to how to use these procedures to choose a single “best” model. However, the outcomes from these search procedures can be used to identify a number of possible regression models that warrant further consideration. 3. Use experience, judgment, and the background science when model building! If you believe a predictor variable is fundamental, then include it regardless of the results of the search procedure. 12 Example 14.2: For this example, consider the Percent Vote Yes for Amendment #2 – the MN Voter ID Amendment, as the response. Use the entire set of predictor variables at your disposal in this dataset. Tasks: 1. Obtain a model which contains only the necessary predictor variables as determined by the Minimum BIC criteria. 2. Use the P-Value threshold approach to obtain the necessary list of predictor variables. Do the two approaches agree? 3. Use the “Big Test” presented in Handout #10 to confirm your findings in Task #1. That is, conduct a formal statistical test that show that only a subset of the predictor variables is necessary to model the response variable. 13

Handout #14: Model Selection Procedures Example 14.1: Consider

Related documents

Products

Support

Handout #14: Model Selection Procedures Example 14.1: Consider

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib