Influence Statistics and Outliers

Influence Statistics, Outliers, and Collinearity Diagnostics  Standardized Residuals – Residuals divided by their estimated standard errors (like t-statistics). Observations with values larger than t  2n ri   ei s 1  vii ,n  p ' in absolute value are considered outliers. s  MS Re sidual vii  i thh diagonal element of P Studentized Residuals – Similar to standardized residuals, except s ( i )  MS Re sidual (i ) is computed on the regression fit on the remaining n-1 cases. Observations with values larger than t  2n , n  p ' 1 in absolute value are considered outliers ri*  ei s(i ) 1  vii s  MS Re sidual (i ) vii  i thh diagonal element of P These are labelled as RSTUDENT by SAS.  Leverage Values (Hat Diag) – Measure of how far an observation is from the others in terms of the levels of the independent variables (not the dependent variable). Observations with values larger than 2p’/n are considered to be potentially highly influential. vii  i th diagonal element of P  DFFITS – Measure of how much an observation has effected its fitted value from the regression model. p' Values larger than 2 in absolute value are considered highly influential. n ^ ^   ei e2   (n  p'1) s(2i )  (n  p' ) s 2  i  s 1 v  1  vii s (i ) vii ii   (i ) Note that DFFITSi measures the number of standard errors that the fitted value for the ith case has shifted when it was not used in the regression fit. DFFITS i   Y i  Y i (i )  vii 1  vii DFBETAS – Measure of how much an observation has effected the estimate of a regression coefficient 2 (there is one DFBETA for each regression coefficient, including the intercept). Values larger than in n absolute value are considered highly influential. ^ DFBETAS j (i )  ^  j   j (i ) s(i ) c jj c jj  ( j  1) st diagonal element of ( X ' X ) 1 Note that DFBETASj(i) measures the number of standard errors that the regression coefficient for the jth predictor variable has shifted when the ith case was not used in the regression fit.  Cook’s D – Measure of aggregate impact of each observation on the group of regression coefficients, as well as the group of fitted values. Values larger than F.50, p ',n  p , are considered highly influential. ' ^ Di  ^ ^ ^ (  (i )   )' ( X ' X )(  (i )   ) p' s 2  ^ ^ ^ ^  Y  Y Y  Y ( i ) ( i )     2 ri  vii         2 p'  1  vii  p' s Note that Di is like an F-statistic used for testing K '   m where K’=I. Can also be thought of as the shift in a 100(1-)100% confidence ellipse when Di  Fa , p ',n  p ' when the ith case is not used to fit the regression model.  COVRATIO – Measure of the impact of each observation on the variances (and standard errors) of the 3 p' regression coefficients and their covariances. Values outside interval 1  considered highly influential. n det( s(2i ) ( X (' i ) X (i ) ) 1 ) COVRATIO i  det( s 2 ( X ' X ) 1 )  Variance Inflation Factor (VIF) – Measure of how highly correlated each independent variable is with the other predictors in the model. Values larger than 10 for a predictor imply large inflation of standard errors of regression coefficients due to this variable being in model. 1 where Rk2 is the coefficient of multiple determination when Xk is regressed on the p-1 remaining 2 1  Rk independent variables. VIFk  Obtaining Influence Statistics and Studentized Residuals in SPSS A. Choose ANALYZE, REGRESSION, LINEAR, and input the Dependent variable and set of Independent variables from your model of interest (possibly having been chosen via an automated model selection method). B. Under STATISTICS, select Collinearity Diagnostics, Casewise Diagnostics and All Cases and CONTINUE C. Under PLOTS, select Y:*SRESID and X:*ZPRED. Also choose HISTOGRAM. These give a plot of studentized residuals versus standardized predicted values, and a histogram of standardized residuals (residual/sqrt(MSE)). Select CONTINUE. D. Under SAVE, select Studentized Residuals, Cook’s, Leverage Values, Covariance Ratio, Standardized DFBETAS, Standardized DFFITS. Select CONTINUE. The results will be added to your original data worksheet. Obtaining Influence Statistics and Studentized Residuals in SAS PROC REG; MODEL Y = X1 X2 … XP / R INFLUENCE VIF; RUN; Obtaining Influence Statistics and Studentized Residuals in R reg1.reg <- lm(y ~ x1 + … + xp) reg1.rstudent <- rstudent(reg1.reg) reg1.inf <- influence.measures(reg1.reg)

Influence Statistics and Outliers

Related documents

Products

Support

Influence Statistics and Outliers

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib