Computer lab 7: Multiple linear regression – more about diagnostics

advertisement
Computer lab 7: Multiple linear regression – more about diagnostics
We have used residual plots to check the adequacy of a regression model. Further we have studied
VIF values to see if we have problems with multicollinearity. Now we will refine the diagnostics for
detecting outliers in the dependent variable, outliers in the explanatory variables and influential
cases.
Learning objectives
After reading the recommended text and completing the computer lab the student shall be able to:



See the usefulness of the hat matrix, e.g. for detecting outliers in Y and X and identifying
influential cases
Use the options in SAS and Minitab for outlier detection and influential case identification
Further, the student will have an orientation about remedial measures, nonlinear regression
and neural networks.
Recommended reading
Chapter 10 in Kutner et al.
Chapter 11 and 13 can be helpful to read for orientation, but are not included in the examination of
the course or in this computer lab.
Assignment 1: Detecting outliers
Consider again the data in exercise 6.18 about the commercial real estate company. Fit a multiple
regression with x1, x2 and x4 as explanatory variables and study the different residuals.
a) Plot residuals, studentized residuals and studentized deleted residuals against the fits. For
identifying outliers in studentized deleted residuals plot, use significance level α = 0.1. Can
you detect any Y outlier? Which plot is most evident?
b) Identify any outlying X observations using leverage values. If you get any outlier, go to the
original data and try to interpret why the case is an outlier in X data.
c) Identify whether a new observation with x1 = 12 x2 = 9.23, x4 = 354884 is a substantial
extrapolation beyond the range of data by computing its leverage (Hint: use “X’X inverse”
option in “Storage” when you run regression in Minitab and then perform necessary matrix
operations)
Assignment 2: Identifying influential cases
Consider the same data, but search for influential cases with DFFITS and Cook’s distances.
a) Find the observations which are influential on their own fitted value.
b) Investigate if there are any observations that are influential on all fitted values.
c) Compare the observations found in b) and c) with the observations found in Assignment 1.
Conclusions?
To hand in
Answers to assignment 1-2, no later than 5 days after the scheduled computer lab. Mail it to
tommy.schyman@liu.se.
Download