Software Section for Chapter 16 Variable Selection Procedures with MINITAB Using MINITAB’s Stepwise Procedure Using MINITAB’s Forward Selection Procedure Using MINITAB’s Backward Elimination Procedure Using MINITAB’s Best-Subsets Procedure Variable Selection Procedures with SPSS Variable Selection Procedures with R Stepwise regression Forward selection Backward selection Best subsets For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Variable Selection Procedures with MINITAB Cravens In Chapter 16, we discussed the use of variable selection procedures in solving multiple regression problems. In Figure 16.16 we showed the MINITAB stepwise regression output for the Cravens data, and in Figure 16.17 we showed the MINITAB best-subsets output. In this section we describe the steps required to generate the output in both of these figures, as well as the steps required to use the forward selection and backward elimination procedures. Using MINITAB’s Stepwise Procedure Step 1 Stat > Regression > Regression > Fit regression model 2 Enter Sales in the Response box. [Main menu bar] [Regression panel] Enter Time, Poten, AdvExp, Share, Change, Accounts and Work in the Continuous predictors box, and Rating in the Categorical predictors box. Click on Stepwise. 3 When the Stepwise dialog box appears: Select the Methods button and click on Stepwise 4 Enter 0.05 in the Alpha to enter box Enter 0.05 in the Alpha to remove box Click Display the table of model selection details Opt for Include details for each step. Stepwise Methods panel Stepwise Methods panel Click OK Click OK The output appears as follows: Regression Analysis: Sales versus Time, Poten, AdvExp, Share, Change, Accounts, Work, Rating Method For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Categorical predictor coding (1, 0) Stepwise Selection of Terms Candidate terms: Time, Poten, AdvExp, Share, Change, Accounts, Work, Rating ----Step 1---- -----Step 2---- -----Step 3----- -----Step 4-- --Coef P Coef P Coef P Coef 709 21.72 0.000 50 19.05 0.000 -327 15.55 0.000 -1442 9.21 0.2265 0.000 0.2161 0.000 0.1750 0.02192 0.019 0.03822 P Constant Accounts 0.004 AdvExp 0.000 Poten 0.000 Share 0.001 190.1 S 453.836 R-sq 90.04% R-sq(adj) 88.05% R-sq(pred) 85.97% Mallows’ Cp 6.15 881.093 650.392 582.636 56.85% 77.51% 82.77% 54.97% 75.47% 80.31% 43.32% 70.04% 76.41% 17.37 1.00 -1.68 α to enter = 0.05, α to remove = 0.05 Analysis of Variance Source Regression Poten AdvExp Share Accounts Error Total DF 4 1 1 1 1 20 24 Adj SS 37260200 4727687 4630364 3009401 2129972 4119349 41379549 Adj MS 9315050 4727687 4630364 3009401 2129972 205967 F-Value 45.23 22.95 22.48 14.61 10.34 P-Value 0.000 0.000 0.000 0.001 0.004 Model Summary S 453.836 R-sq 90.04% R-sq(adj) 88.05% R-sq(pred) 85.97% Coefficients Term Constant Poten AdvExp Share Accounts Coef -1442 0.03822 0.1750 190.1 9.21 SE Coef 424 0.00798 0.0369 49.7 2.87 T-Value -3.40 4.79 4.74 3.82 3.22 P-Value 0.003 0.000 0.000 0.001 0.004 VIF 1.83 1.15 1.74 1.99 Regression Equation Sales = -1442 + 0.03822 Poten + 0.1750 AdvExp + 190.1 Share + 9.21 Accounts For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA - Fits and Diagnostics for Unusual Observations Obs 10 R Sales 4876 Fit 3942 Resid 934 Std Resid 2.14 R Large residual Using Minitab’s Forward Selection Procedure To use Minitab’s forward selection procedure, we simply modify steps 3 and 4 in Minitab’s stepwise regression procedure as shown here: Step 3 When the Stepwise-Methods dialog box appears: Select Forward selection. 4 Enter 0.05 in the Alpha to enter box. Click Display the table of model selection details. StepwiseMethods panel StepwiseMethods panel Opt for Include details for each step. Using MINITAB’s Backward Elimination Procedure To use MINITAB’s backward elimination procedure, we simply modify step 5 in MINITAB’s stepwise regression procedure as shown here: Step 3 When the Stepwise-Methods dialog box appears: Select Backward elimination 4 Enter 0.05 in the Alpha to enter box Click Display the table of model selection details Opt for Include details for each step StepwiseMethods panel StepwiseMethods panel Using Minitab’s Best-Subsets Procedure For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA The following steps can be used to produce the MINITAB best-subsets regression output for the Craven data. Step [Main menu bar] 1 Stat > Regression > Regression > Best Subsets 2 When the Best Subsets Regression dialog box appears: Enter Sales in the Response box. Best Subsets Regression panel Enter Sales in the Response box Enter Time, Poten, AdvExp, Share, Change, Accounts, Work, and Rating in the Free predictors box. Click OK The output appears as follows: Best Subsets Regression: Sales versus Time, Poten, ... Response is Sales Vars 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 R-Sq 56.8 38.8 77.5 74.6 84.9 82.8 90.0 89.6 91.5 91.2 92.0 91.6 92.2 92.0 92.2 R-Sq (adj) 55.0 36.1 75.5 72.3 82.7 80.3 88.1 87.5 89.3 88.9 89.4 88.9 89.0 88.8 88.3 R-Sq (pred) 43.3 25.0 70.0 65.7 79.2 76.4 86.0 84.6 86.9 87.1 85.4 86.8 84.1 83.5 81.8 Mallows Cp 67.6 104.6 27.2 33.1 14.0 18.4 5.4 6.4 4.4 5.0 5.4 6.1 7.0 7.3 9.0 S 881.09 1049.3 650.39 691.11 545.52 582.64 453.84 463.93 430.21 436.75 427.99 438.20 435.66 440.29 449.02 T i m e P o t e n A d v E x p S h a r e C h a n g e A c c o u n t s X W o r k R a t i n g X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Variable Selection Procedures with SPSS Parallel facilities exist in SPSS for conducting Stepwise regression, Forward selection and Backward elimination. But there is no procedure in SPSS that corresponds with MINITAB’s Best subsets capability. To exercise Stepwise regression in SPSS the following steps can be followed: Cravens Step 1 Analyze > Regression > Linear 2 Enter Sales in the Dependent box [Main menu bar] [Linear panel] Enter Time, Poten, AdvExp, Share, Change, Accounts, Work, and Rating in the Independent(s) box. Select the Methods button. 3 For the Method box select Stepwise 4 Click OK In the case of the Forward selection and Backward elimination alternatives, at Step 4 select Forward and Backward alternatives respectively. Variable Selection Procedures with R In this section, we describe how R can be used to perform a multiple regression models including stepwise regression, forward selection, backward selection, and a best subset procedure using the Cravens data. For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Stepwise Regression Cravens In this section we show how to do variable selection using stepwise regression. Step 1 Install and load the ordinary least squares regression (olsrr)in RStudio. Set the working directory in R to the folder containing ‘Cravens.CSV’, using the menu choices Session > Set Working Directory > Choose Directory 2 Read the contents of the CSV file into an R data frame. 3 Construct a linear regression model using all available independent variables by entering the following: 4 To perform a stepwise regression with to enter (pent) = .05 and to leave (prem) = .05, enter the following: A portion of the resulting output is shown in Figure R 16.1. In the Parameter Estimates section, we see that the final model is: Figure R 16.1 R Stepwise Regression Output for the Cravens Data For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA This model has an adjusted R2 = 0.881. This output matches (after rounding) the output provided in Figure 16.16. Forward Selection Cravens In this section we show how to do variable selection using forward selection. For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Note that Steps 1–3 can be skipped if you have already loaded the ordinary least squares regression in R (olsrr) package and created the cravens_df data frame and the linear regression model cravens_mod. Step 1 Install and load the ordinary least squares regression (olsrr)in RStudio. Set the working directory in R to the folder containing ‘Cravens.CSV’, using the menu choices Session > Set Working Directory > Choose Directory 2 Read the contents of the CSV file into an R data frame. 3 Construct a linear regression model using all available independent variables by entering the following: 4 To perform a forward selection with to enter (pent) = .05, enter the following: A portion of the resulting output is shown in Figure R 16.2. In the Parameter Estimates section, we see that the final model is: For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Figure R 16.2 R Forward Selection Output for the Cravens Data This model has an adjusted R2 = 0.881. This output matches (after rounding) the output provided in Section 16.4. For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Backward Selection Cravens In this section we show how to perform variable selection using backward selection. Note that Steps 1–3 can be skipped if you have already loaded the olsrr package and created the cravens_df data frame and the linear regression model cravens_mod. Step 1 Install and load the ordinary least squares regression (olsrr)in RStudio. Set the working directory in R to the folder containing ‘Cravens.CSV’, using the menu choices Session > Set Working Directory > Choose Directory 2 Read the contents of the CSV file into an R data frame. 3 Construct a linear regression model using all available independent variables by entering the following: 4 To perform a backward selection with to leave (prem) = .05, enter the following: A portion of the resulting output is shown in Figure R 16.3. In the Parameter Estimates section, we see that the final model is For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Figure R 16.3 R Backward Selection Output for the Cravens Data This model has an adjusted R2 = 0.875. This output matches (after rounding) the output provided in Section 16.4. For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Best Subsets Cravens In this section we show how to perform variable selection using best subsets. Note that Steps 1–3 can be skipped if you have already loaded the olsrr package and created the cravens_df data frame and the linear regression model cravens_mod. Step 1 Install and load the ordinary least squares regression (olsrr)in RStudio. Set the working directory in R to the folder containing ‘Cravens.CSV’, using the menu choices Session > Set Working Directory > Choose Directory 2 Read the contents of the CSV file into an R data frame. 3 Construct a linear regression model using all available independent variables by entering the following: 4 To perform a best subsets variable selection, enter the following: A portion of the resulting output is shown in Figure R 16.4. The Model Index and Predictors section gives the chosen model for a given number of independent variables (in this case, 1 through 8). For example, the chosen four variable model has as independent variables, Poten, AdvExp, Share, and Accounts. The Subset Regression Summary gives the R2, adjusted R2, predicted R2, and the Mallow’s Cp for each model. This output matches (after rounding) the output provided in Figure 16.17. For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA Figure R 16.4 R Best Subsets Output for the Cravens Data Note that the increase in adjusted R2 as more variables are added tapers off after four variables are included. So, we might choose the model with independent variables, Poten, AdvExp, Share, and Accounts. We can estimate the four-variable model and test for significance by using the lm function as described in the Software Section for Chapter 15. For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith. Statistics for Business and Economics 5e, © 2020, Cengage EMEA