Residuals and Influence Measures (PPT)

advertisement
Regression Diagnostics
Outlying Y Observations – Standardized Residuals
Model Errors (unobserved):
 i  Yi    0  1 X i1  ...   p X i , p 
E  i   0
 2  i    2
  i ,  j   0 i  j
Residuals (observed) where vij  (i, j )th element of P = X(X'X)-1 X':
nn
^
^
^

ei  Yi  Y i  Yi    0   1 X i1  ...   p X i , p 


^
E ei   0
 2 ei    2 1  vii 
s 2 ei   MSE 1  vii 
 ei , e j   vij 2 i  j
s ei , e j   vij MSE i  j
Standardized Residual (Residual divided by its estimated standard error):
ei
ri 
s  MSE
s 1  vii 
Studentized Residual (Estimate of  does not involve current observation):
ei
ri* 
s(i )  MSE(i ) where MSE(i ) is from regression on n -1 remaining cases:
s(i ) 1  vii 
Under normality of errors, ri* ~ tn  p '1
Outlying Y Observations – Studentized Deleted Residuals
Deleted Residual (Observed value minus fitted value when regression is fit on the other n -1 cases):
^
di  Yi  Y i (i )
^
^
^
^
Y i (i )   0(i )   1(i ) X i1  ...   p ( i ) X i , p
^
 k (i )  regression coefficient of X k when case i is deleted
Studentized Deleted Residual (makes use of having predicted i th response from regression based on other n -1 cases):
di
ri* 
~ t  n  p ' 1
s d i 

s 2 di   s 2 pred i   MSEi  1  Xi' X(i)'X i 


Xi 

ei2
Note: SSE   n  p ' MSE   n  p ' 1 MSEi  
1  vii
-1
X i'  1 X i1
X i , p 
1/2


n  p ' 1
 ri  ei 
2
 SSE 1  vii   ei 
*
Computed without re-fitting n regressions
  

Test for outliers (Bonferroni adjustment): Outlier if ri*  t 1    , n  p ' 1
  2n 

Outlying X-Cases – Hat Matrix Leverage Values
 v11 v12
v
v
-1
P = X  X'X  X'   21 22


 vn1 vn 2
Notes: 0  vii  1
v1n 
v2 n 


vnn 
 1 
x 
i1
xi   
 
 
 xi , p 
vij  x i'  X'X  x j
-1

n


 vii  trace  P   trace X  X'X  X'  trace X'X  X'X 
i 1
-1
-1
  trace  I   p '
p'
Cases with X-levels close to the “center” of the sampled X-levels will have small leverages.
Cases with “extreme” levels have large leverages, and have the potential to “pull” the
regression equation toward their observed Y-values. Large leverage values are > 2p’/n (2
times larger than the mean)
^
^
n
i 1
Y = PY  Y i   vijY j   vijY j  viiYii 
j 1
j 1
n
vY
j i 1
ij
j
Leverage values for new observations: vnew,new  X'new  X'X  X new
-1
New cases with leverage values larger than those in original dataset are extrapolations
Identifying Influential Cases I – Fitted Values
Influential Cases in Terms of Their Own Fitted Values - DFFITS:
^
DFFITSi 
^
Y i  Y i (i )
 # of standard errors own fitted value is shifted when case included vs excluded
MSE(i ) vii
Computational Formula (avoids fitting all deleted models):
1/2

  vii 
n  p ' 1
DFFITSi  ei 


2
 SSE 1  vii   ei   1  vii 
1/2
1/2
 v 
 ri*  ii 
 1  vii 
Problem cases are >1 for small to medium sized datasets,  2
p'
for larger ones
n
Influential Cases in Terms of All Fitted Values - Cook's Distance:
2
^
^

^ ^  ^ ^ 
Y

Y
j
j
(
i
)



 Y- Y (i)  '  Y- Y (i)  r 2  v


j 1 
 
 i
ii
Di 



p ' MSE
p ' MSE
p '  1  vii  
Problem cases are > F  0.50; p ', n  p ' 
n
Influential Cases II – Regression Coefficients
Influential Cases in Terms of Regression Coefficients (One for each case for each coefficient, k  0,1,..., p):
^
 DFBETAS k (i ) 
where  X'X 
-1
^
 k   k (i )
MSE(i ) ckk
 c00
c
10



c p ,0
c01
c11
c p ,1
 # of standard errors coefficient is shifted when case included vs excluded
c0, p 
c1, p 


c p , p 
Problem cases are >1 for small to medium sized datasets, 
2
for larger ones
n
Influential Cases in Terms of Vector of Regression Coefficients - Cook's Distance:
2
^
^

^ ^  ^ ^ 
^ ^ 
^ ^ 
Y

Y
j
j
(
i
)
YY
'
YY
ββ
'
X'X
(i)  
(i) 
  β- β (i) 



2


(i)  


r
v


 
 i




ii
Di  j 1



p ' MSE
p ' MSE
p '  1  vii  
p ' MSE
Problem cases are > F  0.50; p ', n  p ' 
n
When some cases are highly influential, should check and see if they affect inferences regarding model.
Multicollinearity - Variance Inflation Factors
• Problems when predictor variables are correlated among
themselves
– Regression Coefficients of predictors change, depending on
what other predictors are included
– Extra Sums of Squares of predictors change, depending on what
other predictors are included
– Standard Errors of Regression Coefficients increase when
predictors are highly correlated
– Individual Regression Coefficients are not significant, although
the overall model is
– Width of Confidence Intervals for Regression Coefficients
increases when predictors are highly correlated
– Point Estimates of Regression Coefficients arewrong sign (+/-)
Variance Inflation Factor

^
Original Units for X 1 ,..., X p , Y : σ β   2  X'X 
2
-1
1  X ik  X k 
Correlation Transformed Values: X ik 


sk
n 1 

^*
^*
2
2


2
*
-1
2
*
σ β    rXX
   k    VIF k
 
 
1
where: VIF k 
1  Rk2
*
 
1  Yi  Y 
Yi 


n  1  sY 
*
 
with Rk2  Coefficient of Determination for regression of X k on the other p  1 predictors
Rk2  0  VIF k  1
0  Rk2  1  VIF k  1
Rk2  1  VIF k  
Multicollinearity is considered problematic wrt least squares estimates if:
p


 
max VIF 1 ,..., VIF  p  10 or if VIF 
 VIF 
k 1
p
k
is much larger than 1
Download