Computer lab 7: Multiple linear regression – more about diagnostics We have used residual plots to check the adequacy of a regression model. Further we have studied VIF values to see if we have problems with multicollinearity. Now we will refine the diagnostics for detecting outliers in the dependent variable, outliers in the explanatory variables and influential cases. Learning objectives After reading the recommended text and completing the computer lab the student shall be able to: See the usefulness of the hat matrix, e.g. for detecting outliers in Y and X and identifying influential cases Use the options in SAS and Minitab for outlier detection and influential case identification Further, the student will have an orientation about remedial measures, nonlinear regression and neural networks. Recommended reading Chapter 10 in Kutner et al. Chapter 11 and 13 can be helpful to read for orientation, but are not included in the examination of the course or in this computer lab. Assignment 1: Detecting outliers Consider again the data in exercise 6.18 about the commercial real estate company. Fit a multiple regression with x1, x2 and x4 as explanatory variables and study the different residuals. a) Plot residuals, studentized residuals and studentized deleted residuals against the fits. For identifying outliers in studentized deleted residuals plot, use significance level α = 0.1. Can you detect any Y outlier? Which plot is most evident? b) Identify any outlying X observations using leverage values. If you get any outlier, go to the original data and try to interpret why the case is an outlier in X data. c) Identify whether a new observation with x1 = 12 x2 = 9.23, x4 = 354884 is a substantial extrapolation beyond the range of data by computing its leverage (Hint: use “X’X inverse” option in “Storage” when you run regression in Minitab and then perform necessary matrix operations) Assignment 2: Identifying influential cases Consider the same data, but search for influential cases with DFFITS and Cook’s distances. a) Find the observations which are influential on their own fitted value. b) Investigate if there are any observations that are influential on all fitted values. c) Compare the observations found in b) and c) with the observations found in Assignment 1. Conclusions? To hand in Answers to assignment 1-2, no later than 5 days after the scheduled computer lab. Mail it to tommy.schyman@liu.se.