Uploaded by mohlin.robert

Intro SPSS - R

advertisement
Software Section for Chapter 16
Variable Selection Procedures with MINITAB
Using MINITAB’s Stepwise Procedure
Using MINITAB’s Forward Selection Procedure
Using MINITAB’s Backward Elimination Procedure
Using MINITAB’s Best-Subsets Procedure
Variable Selection Procedures with SPSS
Variable Selection Procedures with R
Stepwise regression
Forward selection
Backward selection
Best subsets
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Variable Selection Procedures with MINITAB
Cravens
In Chapter 16, we discussed the use of variable selection procedures in solving multiple
regression problems. In Figure 16.16 we showed the MINITAB stepwise regression output
for the Cravens data, and in Figure 16.17 we showed the MINITAB best-subsets output. In
this section we describe the steps required to generate the output in both of these figures, as
well as the steps required to use the forward selection and backward elimination procedures.
Using MINITAB’s Stepwise Procedure
Step
1
Stat > Regression > Regression > Fit regression model
2
Enter Sales in the Response box.
[Main menu
bar]
[Regression
panel]
Enter Time, Poten, AdvExp, Share, Change, Accounts and
Work in the Continuous predictors box, and Rating in the
Categorical predictors box.
Click on Stepwise.
3
When the Stepwise dialog box appears:
Select the Methods button and click on Stepwise
4
Enter 0.05 in the Alpha to enter box
Enter 0.05 in the Alpha to remove box
Click Display the table of model selection details
Opt for Include details for each step.
Stepwise
Methods
panel
Stepwise
Methods
panel
Click OK
Click OK
The output appears as follows:
Regression Analysis: Sales versus Time, Poten, AdvExp, Share, Change,
Accounts, Work, Rating
Method
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Categorical predictor coding
(1, 0)
Stepwise Selection of Terms
Candidate terms: Time, Poten, AdvExp, Share, Change, Accounts, Work, Rating
----Step 1----
-----Step 2----
-----Step 3-----
-----Step 4--
--Coef
P
Coef
P
Coef
P
Coef
709
21.72
0.000
50
19.05
0.000
-327
15.55
0.000
-1442
9.21
0.2265
0.000
0.2161
0.000
0.1750
0.02192
0.019
0.03822
P
Constant
Accounts
0.004
AdvExp
0.000
Poten
0.000
Share
0.001
190.1
S
453.836
R-sq
90.04%
R-sq(adj)
88.05%
R-sq(pred)
85.97%
Mallows’ Cp
6.15
881.093
650.392
582.636
56.85%
77.51%
82.77%
54.97%
75.47%
80.31%
43.32%
70.04%
76.41%
17.37
1.00
-1.68
α to enter = 0.05, α to remove = 0.05
Analysis of Variance
Source
Regression
Poten
AdvExp
Share
Accounts
Error
Total
DF
4
1
1
1
1
20
24
Adj SS
37260200
4727687
4630364
3009401
2129972
4119349
41379549
Adj MS
9315050
4727687
4630364
3009401
2129972
205967
F-Value
45.23
22.95
22.48
14.61
10.34
P-Value
0.000
0.000
0.000
0.001
0.004
Model Summary
S
453.836
R-sq
90.04%
R-sq(adj)
88.05%
R-sq(pred)
85.97%
Coefficients
Term
Constant
Poten
AdvExp
Share
Accounts
Coef
-1442
0.03822
0.1750
190.1
9.21
SE Coef
424
0.00798
0.0369
49.7
2.87
T-Value
-3.40
4.79
4.74
3.82
3.22
P-Value
0.003
0.000
0.000
0.001
0.004
VIF
1.83
1.15
1.74
1.99
Regression Equation
Sales = -1442 + 0.03822 Poten + 0.1750 AdvExp + 190.1 Share + 9.21 Accounts
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
-
Fits and Diagnostics for Unusual Observations
Obs
10
R
Sales
4876
Fit
3942
Resid
934
Std
Resid
2.14
R
Large residual
Using Minitab’s Forward Selection Procedure
To use Minitab’s forward selection procedure, we simply modify steps 3 and 4 in Minitab’s
stepwise regression procedure as shown here:
Step
3
When the Stepwise-Methods dialog box appears:
Select Forward selection.
4
Enter 0.05 in the Alpha to enter box.
Click Display the table of model selection details.
StepwiseMethods
panel
StepwiseMethods
panel
Opt for Include details for each step.
Using MINITAB’s Backward Elimination Procedure
To use MINITAB’s backward elimination procedure, we simply modify step 5 in
MINITAB’s stepwise regression procedure as shown here:
Step
3
When the Stepwise-Methods dialog box appears:
Select Backward elimination
4
Enter 0.05 in the Alpha to enter box
Click Display the table of model selection details
Opt for Include details for each step
StepwiseMethods
panel
StepwiseMethods
panel
Using Minitab’s Best-Subsets Procedure
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
The following steps can be used to produce the MINITAB best-subsets regression output for
the Craven data.
Step
[Main menu bar]
1
Stat > Regression > Regression > Best Subsets
2
When the Best Subsets Regression dialog box appears:
Enter Sales in the Response box.
Best Subsets
Regression
panel
Enter Sales in the Response box
Enter Time, Poten, AdvExp, Share, Change, Accounts,
Work, and Rating in the Free predictors box.
Click OK
The output appears as follows:
Best Subsets Regression: Sales versus Time, Poten, ...
Response is Sales
Vars
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
R-Sq
56.8
38.8
77.5
74.6
84.9
82.8
90.0
89.6
91.5
91.2
92.0
91.6
92.2
92.0
92.2
R-Sq
(adj)
55.0
36.1
75.5
72.3
82.7
80.3
88.1
87.5
89.3
88.9
89.4
88.9
89.0
88.8
88.3
R-Sq
(pred)
43.3
25.0
70.0
65.7
79.2
76.4
86.0
84.6
86.9
87.1
85.4
86.8
84.1
83.5
81.8
Mallows
Cp
67.6
104.6
27.2
33.1
14.0
18.4
5.4
6.4
4.4
5.0
5.4
6.1
7.0
7.3
9.0
S
881.09
1049.3
650.39
691.11
545.52
582.64
453.84
463.93
430.21
436.75
427.99
438.20
435.66
440.29
449.02
T
i
m
e
P
o
t
e
n
A
d
v
E
x
p
S
h
a
r
e
C
h
a
n
g
e
A
c
c
o
u
n
t
s
X
W
o
r
k
R
a
t
i
n
g
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X X
X
X
X X X
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Variable Selection Procedures with SPSS
Parallel facilities exist in SPSS for conducting Stepwise regression, Forward selection and
Backward elimination. But there is no procedure in SPSS that corresponds with MINITAB’s
Best subsets capability.
To exercise Stepwise regression in SPSS the following steps can be followed:
Cravens
Step
1
Analyze > Regression > Linear
2
Enter Sales in the Dependent box
[Main menu
bar]
[Linear
panel]
Enter Time, Poten, AdvExp, Share, Change, Accounts, Work,
and Rating in the Independent(s) box.
Select the Methods button.
3
For the Method box select Stepwise
4
Click OK
In the case of the Forward selection and Backward elimination alternatives, at Step 4 select
Forward and Backward alternatives respectively.
Variable Selection Procedures with R
In this section, we describe how R can be used to perform a multiple regression models
including stepwise regression, forward selection, backward selection, and a best subset
procedure using the Cravens data.
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Stepwise Regression
Cravens
In this section we show how to do variable selection using stepwise regression.
Step
1
Install and load the ordinary least squares regression (olsrr)in RStudio. Set the
working directory in R to the folder containing ‘Cravens.CSV’, using the menu
choices
Session > Set Working Directory > Choose Directory
2
Read the contents of the CSV file into an R data frame.
3
Construct a linear regression model using all available independent variables by
entering the following:
4
To perform a stepwise regression with to enter (pent) = .05 and to leave (prem)
= .05, enter the following:
A portion of the resulting output is shown in Figure R 16.1. In the Parameter Estimates
section, we see that the final model is:
Figure R 16.1 R Stepwise Regression Output for the Cravens Data
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
This model has an adjusted R2 = 0.881. This output matches (after rounding) the output
provided in Figure 16.16.
Forward Selection
Cravens
In this section we show how to do variable selection using forward selection.
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Note that Steps 1–3 can be skipped if you have already loaded the ordinary least squares
regression in R (olsrr) package and created the cravens_df data frame and the linear
regression model cravens_mod.
Step
1
Install and load the ordinary least squares regression (olsrr)in RStudio. Set the
working directory in R to the folder containing ‘Cravens.CSV’, using the menu
choices
Session > Set Working Directory > Choose Directory
2
Read the contents of the CSV file into an R data frame.
3
Construct a linear regression model using all available independent variables by
entering the following:
4
To perform a forward selection with to enter (pent) = .05, enter the following:
A portion of the resulting output is shown in Figure R 16.2. In the Parameter Estimates
section, we see that the final model is:
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Figure R 16.2 R Forward Selection Output for the Cravens Data
This model has an adjusted R2 = 0.881. This output matches (after rounding) the output
provided in Section 16.4.
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Backward Selection
Cravens
In this section we show how to perform variable selection using backward selection. Note
that Steps 1–3 can be skipped if you have already loaded the olsrr package and created the
cravens_df data frame and the linear regression model cravens_mod.
Step
1
Install and load the ordinary least squares regression (olsrr)in RStudio. Set the
working directory in R to the folder containing ‘Cravens.CSV’, using the menu
choices
Session > Set Working Directory > Choose Directory
2
Read the contents of the CSV file into an R data frame.
3
Construct a linear regression model using all available independent variables by
entering the following:
4
To perform a backward selection with to leave (prem) = .05, enter the following:
A portion of the resulting output is shown in Figure R 16.3. In the Parameter Estimates
section, we see that the final model is
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Figure R 16.3 R Backward Selection Output for the Cravens Data
This model has an adjusted R2 = 0.875. This output matches (after rounding) the output
provided in Section 16.4.
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Best Subsets
Cravens
In this section we show how to perform variable selection using best subsets. Note that Steps
1–3 can be skipped if you have already loaded the olsrr package and created the cravens_df
data frame and the linear regression model cravens_mod.
Step
1
Install and load the ordinary least squares regression (olsrr)in RStudio. Set the
working directory in R to the folder containing ‘Cravens.CSV’, using the menu
choices
Session > Set Working Directory > Choose Directory
2
Read the contents of the CSV file into an R data frame.
3
Construct a linear regression model using all available independent variables by
entering the following:
4
To perform a best subsets variable selection, enter the following:
A portion of the resulting output is shown in Figure R 16.4. The Model Index and Predictors
section gives the chosen model for a given number of independent variables (in this case, 1
through 8). For example, the chosen four variable model has as independent variables, Poten,
AdvExp, Share, and Accounts. The Subset Regression Summary gives the R2, adjusted R2,
predicted R2, and the Mallow’s Cp for each model. This output matches (after rounding) the
output provided in Figure 16.17.
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Figure R 16.4 R Best Subsets Output for the Cravens Data
Note that the increase in adjusted R2 as more variables are added tapers off after four
variables are included. So, we might choose the model with independent variables, Poten,
AdvExp, Share, and Accounts. We can estimate the four-variable model and test for
significance by using the lm function as described in the Software Section for Chapter 15.
For use with Anderson, Sweeney, Williams, Camm, Cochran, Freeman, Shoesmith.
Statistics for Business and Economics 5e, © 2020, Cengage EMEA
Download