Computer lab 6: Multiple linear regression – model selection and validation.

advertisement
Computer lab 6: Multiple linear regression – model selection and
validation.
In practice, we often have many possible explanatory variables in a multiple linear regression model
and want to find a “good” model. With “good” we mean a model with few variables and which is
acceptable compared to all other models.
Learning objectives
After reading the recommended text and completing the computer lab the student shall be able to:



Understand and make use of different criteria for model selection
Use different automatic search procedures for model selection
Validate selected models
Recommended reading
Chapter 9 in Kutner et al.
Assignment 1: Selection of regression models
Study again the SENIC data in Appendix C.1. Carry out exercise 9.25 a, b. Note that only the cases
57-113 should be used in this analysis! Which model should be selected by using the
criterion?
Using the Cp criterion? Which model should be selected with backward elimination?
Assignment 2: Validation
Now we want to validate the models above.
First we carry out internal validations of the three models. Run each model and study the criterion
PRESS. Which model is the “best”? Is the “best” model a good one?
Carry out an external validation of the “best” model by utilizing cases 1-56:
First, fit a regression model to cases 1-56 (the validation set) with the same explanatory variables as
in the “best” model found in the model-building set.
Compare the model in the validation set with the model in the model-building set by investigating



The estimated regression coefficients and their standard errors
MSE
R2
Next, calculate the mean squared prediction error, MSPR. This is done by predicting each
observation in the validation set by utilizing the regression function estimated in the “best” model
from the model-building set. These predicted ̂ together with the observed Yi in the validation set
will give MSPR = *∑(
̂) (see book, page 370). Compare MSPR with MSE in the model-building
set. Conclusion?
To hand in
Answers to assignment 1-2.
The lab report should be handed in no later than 5 days after the scheduled computer lab. Use Lisam
(lisam.liu.se) for handing in the assignments.
Download