3.5 Regression diagnostics

advertisement
Regression Diagnostics
There are two kinds of “unusual” observations. They are
 Outlier
 Influential observations.
(I) Outliers:
The observations with large values of the following two types of residuals might be
considered as outliers. These two type of residuals are
1.
internally studentized residuals
2.
externally studentized residuals.
1. Internally Studentized Residuals:
Let
 p11
p
1
t
t
P  X X X X   21
 

 p n1


p12
p 22



pn 2


p1n 
p 2 n 
 .

p nn 
Then, the internally studentized residuals are
si 
ei
ei

, i  1,2,, n.
1/ 2
s.e.(ei ) s1  pii 
An observation with large s i might be the outlier. A benchmark could be obtained by
using the distribution result,
si2
1 n  p 1
~ beta( ,
).
n p
2
2
[Derivation of s.e.(ei )  s1  pii  ]
1/ 2
V (e)  V I  P Y   I  P V (Y )I  P   I  P  2 I I  P   I  P  2
t
1
t
Thus,
Var(ei )  1  pii  2  s.e.(ei ) 
1  pii s 2
 s1  pii 
1/ 2
,
n
where s 2 
e
i 1
2
i
n p
is the mean residual sum of squares.
2. Externally Studentized Residuals:
Intuitively, if observation i is outlier, the estimate s 2 of  2 might not be accurate
due to the effect of the observation. It might be better to estimate  2 by s (i2 ) , the
mean residual sum of squares as observation i being deleted. Thus, the resulting
residuals are the externally studentized residuals
ti 
ei
,
1/ 2
s(i ) 1  pii 
where
 ei2
n  p s  
 1  pii

n  p 1
2
s (2i )



.
An observation with large t i might be the outlier. A benchmark could be obtained by
using the distribution result,
Note:
ti ~ t n  p 1 .
t i is not bounded but si  n  p 
1/ 2
. The externally studentized
residuals might be more sensitive to a extreme outlier.
(II) Influential Observations:
Let b(i ) be least square estimates as observation i is deleted, for example, b( 3) is the
least square estimate based on the observations excluding observation 3. Also, let
Yˆ(i )  Xb(i ) .
2
Cook’s distance,

Yˆ  Yˆ  Yˆ  Yˆ  b  b  X X b  b 
D 

t
t
(i )
i
ps
(i )
2


ei

1 
 s1  pii  2 
t
(i )
(i )
ps
2
2
 pii  1
 p 1

  si2  ii  , i  1,2,  n.
 1  pii  p
 1  pii  p
can be used to detect the influential observations. An observation with large Di
might be considered as an influential observation.
3
Download