Chapter 7 Models for Survival Data

advertisement
Chapter 8 Model Checking
8.1 Introduction
The process of statistical analysis might take the form
Select
Model
Class
Summarize
Some
Models
Conclusions
Data
Stop
In the above process, however, even after a careful selection of model
class, the data themselves may indicate that the particular model is
unsuitable. Thus, it seems to be reasonable to introduce model
checking to the original process. The news process of statistical
analysis is
Select
Model
Class
Data
Some
Models
Conclusions
Summarize
Model
Checking
Stop
The inadequacy indicated by model checking could take two forms.
It may be that the data as a whole show some systematic departure
from the fitted values, or it may be that a few data values are
discrepant from the rest. The detection of both systematic and
isolated discrepancies is part of the technique of model checking.
1
8.2 Basic Quantities in Model Checking
In linear model, model checking uses mainly the following statistics
from the fit:
The fitted values: ˆ n1  X n p ˆ p1 .

Y  Xˆ  Y  Xˆ 

t
The mean residual sum of square: s
2
n p
The residual: e  Y  ̂

The hat matrix: P  X X X
t

1
Xt
In generalized linear model, the statistics used in model checking are:
 
 
 ˆ1   g 1 ˆ1   g 1 x1 ˆ
 ˆ   1 ˆ   1 ˆ
g  2   g x2 
ˆ n1   2   

    
The fitted values:

  1 ˆ
   1
 ˆ n   g ˆn   g xn 


.


 
''
'
The variance estimate: Vi  V ˆ i   b ˆi , ˆ i  b ˆi , i  1,, n
The hat matrix: H  W
 w1
0
W 


0
0
1/ 2

w2 


0


X X tWX

1
X tW 1/ 2 ,
   2 
0
 i  
 i   ˆ
0 
 
, wi 

Vi
.

wn 
The residuals: the standardized Pearson residuals
rP' ,i 
yi  ˆ i
ˆV 1  h
ii 
i

rP ,i
ˆ1  hii 
,
the standardized deviance residuals
2
rD ,i
rD' ,i 
ˆ1  h
ii
where
 

, ,
    
~
~
rD ,i  sign  yi  ˆ i  2 yi  i  ˆi  b  i  b ˆi
~
   y .
'
, hii is the i’th diagonal element of H and  i  b
1
i
8.3 Checks for Systematic Departure from Model
(a)
Informal check using residuals
'
Standardized deviance residuals rD ,i are recommended, plotted
either against ̂i or against the fitted value ̂ i transformed to the
constant-information scale of the error distribution. Thus, we use
̂ : Normal errors
2 ̂ : Poisson errors
2 sin 1
 ̂ : Binomial errors
2 log ̂  : Gamma errors
 2 ˆ
1
2
: Inverse Gaussian errors
The null pattern of this plot is a distribution of residuals for varying
̂ with mean 0 and constant range. The plot is given in (i). Typical
deviations are
 The appearance of curvature in the mean
 A systematic change of range with fitted value.
The plots corresponding to the above deviations are given in (ii) and
(iii).
3
0
-3
-2
-1
Residuals
1
2
3
(i)
0
20
40
60
80
(ii)
(iii)
4
100
Curvature may arise from the following causes:
 Wrong choice of link function
 Wrong choice of scale of one or more covariate
 Omission of a quadratic form in a covariate
Note:
The standardized residuals can be plotted against an explanatory
variable.
(b)
Checking the variance function
A plot of the absolute residuals against fitted values gives an informal
check on the adequacy of the assumed variance function. The null
pattern shows no trend, but an ill-chosen variance function will
result in a trend in the mean. A positive trend indicates that the
current variance function is increasing too slowly with the mean. For
example, an original choice of V     may need to be replaced
2
by V    
(c)
Checking the link function
A plot of the adjusted dependent variable
 i
zi  ˆi   yi  ˆ i 
  i


   ˆ
against ̂i gives an informal check on the link function. The null
pattern is a straight line. For link functions of the power family an
upwards curvature in the plot points to a link with higher power
than that used, and downwards curvature to a lower power.
(d)
Checking the link function
The plot of ei  zi  ˆi  ˆ j xij , j  1,, p against xij provides
5
an informal check on the scale of covariates. The null pattern should
be a straight line.
8.4 Checks for Isolated Departures from the Model
(a)
Measure of leverage

t
In standard regression, P  X X X

1
 
X t  pij
nn
, pii can be
used to measure of leverage (i.e., the effect of the change of the
covariate value on the fitted value). As
pii 
2p
, the i’th
n
observation can be regarded as high leverage point. Similarly, in
generalized linear model, hii , the i’th diagonal element of H, can be
used as a measure of leverage.
Note:
Usually, in standard linear model, a point at the extreme of the
x-range will have high leverage. However, for generalized linear
model, a point at the extreme of the x-range will not necessarily have
high leverage if its weight is very small.
(b)
Measure of influence
In standard linear model, the Cook’s distance
ˆ  ˆ  X X ˆ  ˆ   ˆ  ˆ  Vaˆr ˆ  ˆ  ˆ 

t
Ci
(i )
(i )
ps



ps
(i )
2
t
Yˆ  Yˆ( i ) Yˆ  Yˆ( i )
2
1
t
t
(i )
p

Yˆ  Yˆ( i )
ps 2
2
, i  1,, n
can be used to assess the influence of the i’th observation, where
̂ i  is the parameter estimate without the contribution from the i’th
6
observation and Yˆi   X̂ i  . Similarly, in generalized linear model,
the modified Cook’s distance is
Di


ˆ  ˆ
 X WX ˆ  ˆ 
t
(i )
t
(i )
ps 2
can be used to assess the influence of the i’th observation.
7
Download