3.2-3.3 Diagnostics for residuals

advertisement
1
Chapter 3 Diagnostics and Remedial Measures
3.2, 3.3 Diagnostics for Residuals
The residuals are defined as
ei  yi  yˆ i  yi  b0  b1 xi1    bp 1 xip 1 , i  1,2,, n.
If the least square estimate is a sensible estimate, e i can be regarded as the
“estimate” of  i ,
ei   0  1 xi1     p1 xip1   i  b0  b1 xi1    b p1 xip1
  0  b0   1  b1 xi1     p1  b p1 xip1   i
 i
Thus, the residuals should exhibit tendencies that tend to confirm the assumptions
about  i , such as independence, having zero mean, having a constant variance  2 ,
and following a normal distribution. On the other hand, as the residuals do not exhibit
tendencies, this might imply the assumptions might be violated. Usually, we can plot
the residuals and then examine the tendencies. The principal ways of plotting
residuals e i are
I. To check for non-normality.
II. To check for time effects if the time order of the data is known.
III. To check for nonconstant variance.
IV. To check for curvature of higher order than fitted in the X’s
I. Non-Normality Checks on Residuals:
A simple histogram or a stem and leaf plot of the residuals can be used to check the
normality. If these residuals are bell-shaped distributed, this might imply the
normality assumption might not be violated.
0
5
10
15
Histogram:
-2
-1
0
Res idual s
1
2
3
2
Stem-and leaf plot:
-2 : 2
-1 : 9755
-1 : 33322111110
-0 : 9988887776665555
-0 : 44444333333332210
0 : 011111222222234444
0 : 5666667777888999
1 : 011223
1 : 5556788
2 : 01
2 : 89
0
-2
-1
Residuals
1
2
3
The better method to check normality is the normal probability plot. The residuals and
their corresponding normal scores are plotted. If the plot look straight, then the
normality assumption might not be violated.
-2
-1
0
1
Quantiles of Standard Normal
II, III, IV:
The 3 properties can be checked by plotting the residuals versus
(a) fitted value ŷi
(b) the covariate x ij
The typical satisfactory plots are as follows:
2
0
-3
-2
-1
Residuals
1
2
3
3
0
20
40
60
Three typical unsatisfactory plots are as follows:
(i)
fitted value or
covariate value
(ii)
80
100
4
fitted value or
covariate value
(iii)
fitted value or
covariate value
(a) versus fitted value ŷi :
Why ŷi , but not yi :


rey  correlatio n coefficien t of e and y  1  R 2

1/ 2
reyˆ  0
Therefore, ei ' s and y i ' s are correlated. If we plot ei ' s versus y i ' s , then a linear
5
trend might be due to the positive correlation of ei ' s and y i ' s rather than the
violation of the assumptions about the random error. On the other hand, since ei ' s
and yˆ i ' s are uncorrelated, a unsatisfactory residual plot might do imply the violation
of the assumptions.
reyˆ  0 :
Derivation of
Let
n
e 
e
i 1
n
 yˆ
i
ˆ 
y
and
n
i
i 1
.
n
Since
n
n
 e yˆ  yˆ  e
i
i 1
i
i 1
0
and
X t e  X t  y  yˆ   X t  y  Xb  X t y  X t Xb


1
 Xty XtX XtX
Xty  Xty Xty  0
then





seYˆ   ei  e  yˆ i  yˆ   ei yˆ i  yˆ   ei yˆ i  since
i 1
i 1
i 1

t
 yˆ t e   Xb e  bt X t e  0 since X t e  0
n
n
n



ˆ
e
y

0


i
i 1

n
Thus,
n
reyˆ 
Derivation of
Sicne
s
seyˆ
s
ee yˆyˆ

1/ 2


 e
i 1
i

 e  yˆ i  yˆ



2

2
ˆ
ˆ


e

e
y

y

i
 i

i 1
 i 1

n
rey  1  R 2
n

1/ 2
:
1/ 2
0
6

S ey   ei  e  yi  y    ei  yi  y    ei yi  since
i 1
i 1
i 1

n
n

  ei  yi  yˆ i   since
i 1

n
n
n
n
 e yˆ
i i
i 1
n

n
 e y  y  e  0 
i 1
i
i 1
i

 0

   yi  yˆ i 
2
i 1
thus,
n
reY 
S
S ey
S yy 
1/ 2
ee

n

 y
i 1
1/ 2
 e
i
i 1
n
n
2
2




e

e
y

y


i
i


i 1
 i 1

i
n
 yˆ i 
 n
2
ˆ


y

y

i
i


i 1

 n
2
 y  y 
i
 

i 1
n
1/ 2
 y
1/ 2

i 1
2
 yˆ i 
2
i
1/ 2
n
n
2
2
ˆ




y

y
y

y


i
i
i

 

 i 1
  i 1

n

2
ˆ


y

y

i


i 1

 1  n
2

 yi  y  
 
i 1

  yi  y 
i 1
1/ 2
2
 n 2  n
2
  ei    yi  y  
 i 1   i 1


 since

 e  yi  y 
1/ 2
1/ 2

 1  R2

1/ 2
2
   yi  yˆi     yˆi  y  
i 1
i 1

n
2
n
 The residual plot (i): nonconstant variance.
Remedy: use weighted least square or transformation of
yi .
Note:
Transformation of the response
yi
can stabilize the variance (make
7
the variance of the transformed
yi
constant).
 The residual plot (ii): errors in analysis, some extra terms highly
correlated to the other terms are missing or wrongful omission of
0 .
[Justification:]
Suppose the postulated model is
yi   0  1 xi1   2 xi 2     p1 xip1   i
and the true model is
Yi   0  1 xi1   2 xi 2     p1 xip1   p xip   i ,
i  1,2,, n.
Then,
yˆ i  b0  b1 xi1    b p1 xip1
and
ei   p xip   i .
If
p  0
and
xip
is correlated to
ŷi
(since
xip
is correlated to some
covariates in the postulated model), then the residual plot will be like (ii).
Remedy: rechecking the analysis process, adding some extra terms to
the model or adding  0 back to the model.
 The residual plot (iii): some extra terms (might be second order
terms of existing variables or some variables not in the model) are
missing or unequal variance
Remedy: adding extra terms to the model or transformation of
yi
.
8
(b) versus the covariate x ij :
 The residual plot (i): nonconstant variance.
Remedy: use weighted least square or transformation of
yi .
 The residual plot (ii): errors in analysis or some extra terms highly
correlated to X j are missing.
Suppose the true model
yi   0  1 xi1     p1 xip1   p zi   i ,
and the covariate X j is correlated to Z. Thus, heuristically,
ei   p zi   i .
As
p  0
and X j is positively correlated to Z, the residual plot would look
like (ii).
Remedy: rechecking the analysis process or adding some extra terms
to the model.
 The residual plot (iii): some extra terms are missing or the variance
of yi are not equal.
Remedy: adding extra terms (second order terms of the existing
variables or the terms not in the original model) to the model or
transformation of
Yi
3.9 Transformation
Box-Cox Transformation:
Suppose the tentatively used model is y  X   . Two transformations can be used.
The first transformation is
9
Wi ( ) 
( yi  1)
 , 0
log( yi ),
 0
, i  1,2,, n.
The above transformation is also called Box-Cox transformation.
Wi ( )  lim
Note: lim
 0
 0
yi  1

 log( yi ) .
The other transformation is to scale Wi ( ) by
Vi ( ) 
( yi  1)
y  1
y log( yi ),
,  0
 0
where
1/ n
y  n
 n

y1 y2  yn    yi 
 i 1 
is the geometric mean of y1 , y 2 ,, y n .
The model based on the transformed response is
W1 ( ) 
W ( ) 
  X  
W ( )   2
  


Wn ( )
or
V1 ( ) 
V ( ) 
  X  
V ( )   2
.
  


Vn ( ) 
Note: V transformation is usually preferred!!
11.1 Weighted Least Squares
Suppose the linear regression model is
,
10
yi   0  1 xi1   2 xi 2     p1 xip1   i ,  i ~ N (0, i2 ), i  1,, n.
Let
wi 
1
 i2
and
 w1
0
W 


0
0
0 
  .

 wn 


0
w2

0
To estimate  , we need to minimize the weighted least square criterion,
S w    S w  0 , 1 ,,  p 1    wi yi   0  1 xi1     p 1 xip1 
n
2
i 1
  y  X  W  y  X 
t
The weighted least square estimate is

b  X tWX

1
X tWY
Download