Handout #14: Model Selection Procedures Example 14.1: Consider the MN Marriage Amendment data from our course website. This dataset contains several potential variables. The primary goal of this investigation is to identify which (predictor) variables impacted the percentage of people that voted for Amendment #1. A snip-it of the dataset is provided here. Linear Regression Setup ο· ο· Response Variable: Percentage of People that Voted for Amendment #1 Predictor Variables: o Percent of Population who voted for Obama o Unemployment Rate o Percent of Population Living in Poverty o Median Household Income o Percent of Population: Age 0-17 o Percent of Population: Age 18-24 o Percent of Population: Age 25-44 o Percent of Population: Age 45-65 o Percent of Population: Age Over 65 o Percent of Population: White o Percent of Population: African American o Percent of Population: American Indian o Percent of Population: Asian o Percent of Population: Other o Percent of Population: Of Hispanic Origin ο· Implied structure for mean and variance functions o o πΈ(πππππππ‘ πππ‘ππ πππ πππ π΄ππππππππ‘ #1 | … ) = π½0 + β― πππ(πππππππ‘ πππ‘ππ πππ πππ π΄ππππππππ‘ #1| … ) = π 2 1 Initial Regression Model – All Predictors Question 1. Consider the variation in Percent Vote Yes in the following plot. Average: 59.9, Std Dev = 8.77, Variance = 76.87, Count=87 What proportion of the variation in Percent Vote Yes can be explained by all of these predictors? Discuss. 2 Comment Regarding R2: ο· R2 measures the proportion of variation in the response that can be explained by the predictor variables in the model. π 2 = 1 − ππ’π ππ πππ’ππππ πΈππππ ππ’π ππ πππ’ππππ πππ‘ππ Visual Representation of R2 ο· Concern: Each time we add a predictor variable to the model, the sum of squared error can only decrease; however, the sums of squared total remains constant. Thus, R2 will always increase as predictor variable(s) are added to the model. This is true regardless of the worthiness of the added predictor variable(s). ο· Fix: Adjusted R2 which is given by the quantity π΄πππ’π π‘ππ π 2 = π 2 − [(1 − π 2 ) ∗ # πππππππ‘πππ ] π − (#πππππππ‘πππ + 1) A second formula -- which is easier to use when computing Adjusted R2 is given by ππ’π ππ πππ’ππππ πΈππππ ⁄ππ πππππ π΄πππ’π π‘ππ π 2 = 1 − ππ’π ππ πππ’ππππ πππ‘ππ ⁄ππ π‘ππ‘ππ ππππππππ ππ πΆπππππ‘πππππ π·ππ π‘ππππ’π‘πππ 1− ππππππππ ππ ππππππππ‘πππππ π·ππ π‘ππππ’π‘πππ ο· Adjusted R2 can *not* be interpreted directly as the “proportion of variation in the response being explained by the predictors.” However, this quantity can be used to better quantify our desire to reduce the unexplained variation in the response using a minimum number of predictor variables. This concept is commonly referred to as having a parsimonious model. Verify the calculation of the adjusted R2 quantity provided by JMP 3 Wiki Entry for Adjusted R2 http://en.wikipedia.org/wiki/Coefficient_of_determination Consider once again the entire list of predictor variables used in our initial model above. Some of these predictors appear to be important to helping to explain the variation in Percent Vote Yes and the others appear not so important. Entire list of parameter estimates A list, sorted by importance, is provided by JMP Questions: 2. Which predictor variables appear to important in explaining Percent Vote Yes? 4 3. Why cannot the absolute size of the estimated effect be used to rank the importance of a predictor variable? 4. What is a reasonable way to rank the importance of these predictors? Discuss. Model Selection – Finding the Best Model ο· Implied structure for mean and variance functions o o πΈ(πππππππ‘ πππ‘ππ πππ πππ π΄ππππππππ‘ #1 | ? ? ? ? ) = π½0 + β― πππ(πππππππ‘ πππ‘ππ πππ πππ π΄ππππππππ‘ #1| ? ? ? ? ) = π 2 The model building feature in JMP can be invoked by selecting Stepwise from the Personality: box in the Fit Model window. . ο· ο· Y box (Response): Percentage of People that Voted for Amendment #1 Model Effect box: The entire list of candidate predictor variables. o Any type of predictor variable can be considered in a model selection process, e.g. categorical predictors, transformed predictor variables, interaction terms, etc. 5 The following Fit Stepwise window is provided. Consider only the bottom portion of the output provided. The effect of adding predictor variables to the model can be easily seen by selecting any number of predictor variables in the Entered Column. Model #1: Try Percent Obama as a predictor first. πΈ(πππππππ‘ πππ‘π πππ | πππππππ‘ πππππ ) = π½0 + π½1 ∗ πππππππ‘ πππππ Model #2: Let’s add another predictor variable -- skipping over Unemployment as this predictor does not appear important (p-value = 0.27). πΈ(πππππππ‘ πππ‘π πππ | πππππππ‘ πππππ, πππππππ‘ πππ£πππ‘π¦ ) = π½0 + π½1 ∗ πππππππ‘ πππππ + π½2 ∗ πππππππ‘ πππ£πππ‘π¦ 6 Notice, that after Percent Poverty was added to the model, the effect of Unemployment now appears to be important. Such anomalies are not uncommon. The R2 and Adjusted R2 values suggest Unemployment does indeed produce a reduction in the unexplained variation. A quick check of the added variable plots suggest that Unemployment is indeed adding something to our model. The Model Selection – Step by Step JMP has automated the process of adding predictor variables to a model. JMP has incorporated a variety of commonly used methodologies into its procedures. JMP is one of the best software packages that I have used for model selection. 7 1st Predictor to be put into the model is Percent Obama… Criteria to Section Process ο· Adjusted R2 – larger is better ο· Mallows’ Cp -- closest to number of predictors in model is best Calculating this quantity for the example above. ππππππ€ ′ π πΆπ ππ’π ππ πππ’ππππ πΈππππ =( ) − (π − 2 ∗ # πππππππ‘ππ) ππππππππ ππ πΆπππππ‘πππππ π·ππ π‘ππππ’π‘πππ π€ππ‘β πππ πππππππ‘πππ 3862.77 =( ) − (87 − 2 ∗ 2) 8.279 = 383.55 ο· Akaike’s Information Criteria – smaller is better ο· Bayesian Information Criterion – smaller is better 8 Automating the Entire Process Clicking the Go button instead of the Step button will proceed by adding one variable at a time until all predictor variables have been considered. This approach is called Forward Selection. After finishing this process, JMP will identify the predictor variables deem important via this process. From the red drop down arrow, you can select Plot Criterion History. 9 Clicking the Run Model button gives the output from the fitted model with the identified predictor variables. Specifying the Stopping Criteria 10 Specifying the Direction Procedure for Mixed Direction 1. Choose αentry ≤ αstay (be somewhat generous setting these, say 0.05 up to 0.25). 2. Fit a simple linear regression model for each of the potential predictor variables. Then, obtain the p-value to test whether or not the regression is useful. Let X1 denote the variable with the smallest p-value. ο· If p-value < αentry then X1 is retained. ο· If p-value > αentry then the procedure ends. 3. Fit all 2-variable models, where X1 is one of the pair. Obtain the p-value to test whether the coefficient of the second variable is zero, and let X2 be the variable with the smallest p-value. ο· If p-value > αentry then the procedure ends. ο· If p-value < αentry then X2 is retained. Then, we check to see if X1 is still needed in the model. If p-value > αstay, then X1 is removed and we go to the beginning of Step 3. If the p-value is less than αstay, then X1 and X2 are both retained. 4. Examine which predictor is the next candidate for addition, and then examine whether any other variables already in the model should be dropped. This procedure continues until no variables can be either added or deleted. Procedure for Forward: The procedure is very similar to stepwise regression; however, once a variable enters a model, it is never removed. Procedure for Backward : The full model (with all predictors) is fit first, and the coefficient with the largest p-value is identified. If this p-value is greater than αstay, then the variable is dropped from the model. The new model (minus the one predictor) is then fit, and this process continues until no more predictors can be dropped. 11 Returning to our example – using p-value threshold approach Upon careful examination, we see we obtain the same model as the Minimum BIC criteria. Final Thoughts 1. The model deemed “best” may be different depending upon the approach. 2. There is no clear rule as to how to use these procedures to choose a single “best” model. However, the outcomes from these search procedures can be used to identify a number of possible regression models that warrant further consideration. 3. Use experience, judgment, and the background science when model building! If you believe a predictor variable is fundamental, then include it regardless of the results of the search procedure. 12 Example 14.2: For this example, consider the Percent Vote Yes for Amendment #2 – the MN Voter ID Amendment, as the response. Use the entire set of predictor variables at your disposal in this dataset. Tasks: 1. Obtain a model which contains only the necessary predictor variables as determined by the Minimum BIC criteria. 2. Use the P-Value threshold approach to obtain the necessary list of predictor variables. Do the two approaches agree? 3. Use the “Big Test” presented in Handout #10 to confirm your findings in Task #1. That is, conduct a formal statistical test that show that only a subset of the predictor variables is necessary to model the response variable. 13