Chapter 10

Regression Model Building Diagnostics KNNL – Chapter 10 Model Adequacy for Predictors – Added Variable Plot • Graphical way to determine partial relation between response and a given predictor, after controlling for other predictors – shows form of relation between new X and Y • May not be helpful when other predictor(s) enter model with polynomial or interaction terms that are not controlled for • Algorithm (assume plot for X3, given X1, X2): – Fit regression of Y on X1,X2, obtain residuals = ei(Y|X1,X2) – Fit regression of X3 on X1,X2, obtain residuals = ei(X3|X1,X2) – Plot ei(Y|X1,X2) (vertical axis) versus ei(X3|X1,X2) (horizontal axis) • Slope of the regression through the origin of ei(Y|X1,X2) on ei(X3|X1,X2) is the partial regression coefficient for X3 Outlying Y Observations – Studentized Residuals Model Errors (unobserved):  i  Yi    0  1 X i1  ...   p 1 X i , p 1  E  i   0  2  i    2   i ,  j   0 i  j Residuals (observed) where hij  (i, j )th element of H = X(X'X)-1 X': nn ei  Yi  Y i  Yi   b0  b1 X i1  ...  bp 1 X i , p 1  ^ E ei   0  2 ei    2 1  hii  s 2 ei   MSE 1  hii   ei , e j   hij 2 i  j s ei , e j   hij MSE i  j Semi-Studentized Residual (Residual divided by estimate of  , trivial to compute): ei ei*  MSE Studentized Residual (Residual divided by its standard error, messier to compute): ri  ei MSE 1  hii  Outlying Y Observations – Studentized Deleted Residuals Deleted Residual (Observed value minus fitted value when regression is fit on the other n -1 cases): ^ di  Yi  Y i (i ) ^ Y i (i )  b0(i )  b1(i ) X i1  ...  bp 1(i ) X i , p 1 bk (i )  regression coefficient of X k when case i is deleted Studentized Deleted Residual (makes use of having predicted i th response from regression based on other n -1 cases): di ti  ~ t  n  p  1 s d i   s 2 di   s 2 pred i   MSEi  1  Xi' X(i)'X i    Xi   ei2 Note: SSE   n  p  MSE   n  p  1 MSEi   1  hii   n  p 1  ti  ei  2  SSE 1  hii   ei  -1 X i'  1 X i1 X i , p 1  1/2 Computed without re-fitting n regressions     Test for outliers (Bonferroni adjustment): Outlier if ti  t 1    , n  p  1   2n   Outlying X-Cases – Hat Matrix Leverage Values  h11 h -1 H = X  X'X  X'   21    hn1 Notes: 0  hii  1 h1n  h2 n    hnn  h12 h22 hn 2  1   x  i1  xi       x  i , p 1  hij  x i'  X'X  x j -1  n    hii  trace  H   trace X  X'X  X'  trace X'X  X'X  i 1 -1 -1   trace I   p p Cases with X-levels close to the “center” of the sampled X-levels will have small leverages. Cases with “extreme” levels have large leverages, and have the potential to “pull” the regression equation toward their observed Y-values. Large leverage values are > 2p/n (2 times larger than the mean) ^ ^ n i 1 Y = HY  Y i   hijY j   hijY j  hiiYii  j 1 j 1 n hY j i 1 ij j Leverage values for new observations: hnew,new  X'new  X'X  X new -1 New cases with leverage values larger than those in original dataset are extrapolations Identifying Influential Cases I – Fitted Values Influential Cases in Terms of Their Own Fitted Values - DFFITS: ^ DFFITSi  ^ Y i  Y i (i )  # of standard errors own fitted value is shifted when case included vs excluded MSE(i ) hii Computational Formula (avoids fitting all deleted models): 1/2    hii  n  p 1 DFFITSi  ei    2 SSE 1  h  e 1  h   ii i ii     1/2 1/2  h   ti  ii   1  hii  Problem cases are >1 for small to medium sized datasets,  2 p for larger ones n Influential Cases in Terms of All Fitted Values - Cook's Distance: 2 ^ ^ ^ ^ ^ ^      Y  Y j (i )  Y- Y (i)  '  Y- Y (i)   j   ei2  hii   j 1      Di      pMSE pMSE pMSE  1  hii 2  Problem cases are > F  0.50; p, n  p  n Influential Cases II – Regression Coefficients Influential Cases in Terms of Regression Coefficients (One for each case for each coefficient): b b  DFBETAS k (i )  k k (i )  # of standard errors coefficient is shifted when case included vs excluded MSE(i ) ckk where  X'X  -1  c00  c 10    c p 1,0 c01 c11 c p 1,1 c0, p 1  c1, p 1    c p 1, p 1  Problem cases are >1 for small to medium sized datasets,  2 for larger ones n Influential Cases in Terms of Vector of Regression Coefficients - Cook's Distance: 2 ^ ^  ^ ^  ^ ^  Y  Y j j ( i )     Y- Y (i)  '  Y- Y (i)  ei2  hii   b - b (i)  '  X'X   b - b (i)   j 1      Di      pMSE pMSE pMSE  1  hii 2  pMSE Problem cases are > F  0.50; p, n  p  n When some cases are highly influential, should check and see if they affect inferences regarding model. Multicollinearity - Variance Inflation Factors • Problems when predictor variables are correlated among themselves – Regression Coefficients of predictors change, depending on what other predictors are included – Extra Sums of Squares of predictors change, depending on what other predictors are included – Standard Errors of Regression Coefficients increase when predictors are highly correlated – Individual Regression Coefficients are not significant, although the overall model is – Width of Confidence Intervals for Regression Coefficients increases when predictors are highly correlated – Point Estimates of Regression Coefficients arewrong sign (+/-) Variance Inflation Factor Original Units for X 1 ,..., X p 1 , Y : σ 2 b   2  X'X  Correlation Transformed Values: X ik*       σ b 2 * * 2 rXX -1 where: VIF k   2 1  X ik  X k    sk n 1   b     VIF  * k * -1 Yi*  1  Yi  Y    n  1  sY  2 k 1 1  Rk2 with Rk2  Coefficient of Determination for regression of X k on the other p  2 predictors Rk2  0  VIF k  1 0  Rk2  1  VIF k  1 Rk2  1  VIF k   Multicollinearity is considered problematic wrt least squares estimates if: p 1     max VIF 1 ,..., VIF  p 1  10 or if VIF   VIF  k 1 p 1 k is much larger than 1

Chapter 10

Related documents

Products

Support

Chapter 10

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib