# 2.4 Goodness of fit

(I) Goodness of fit:
Let the log-likelihood function
n 

  m 

l     log  i   yi log  i   mi  yi  log 1   i 

i 1 
  yi 

n 
 
 m  i
  m 
  log  i   yi log  i   mi  yi  log  i
i 1 
 mi 
 mi
  yi 
Then, the deviance function is





n 
y 
 m  y 
~
~
~
D y1 , y2 ,, yn , 1 ,  2 ,,  n   2  yi log  ~i   mi  yi log  i ~i 
i 1 
 i 
 mi   i 
where
~i  mi i  ˆ .
Result:
Let the null model involves p parameters, g i   xi  , i  1, 2, , n
and
 . Then,
~i  g 1 xi ˆ
mi 
D y1, y2 ,, yn , ~1, ~2 ,, ~n  
 2n p
under the following assumptions:
(a) The observations are distributed independently, according to
the binomial distribution. The case of over-dispersion is not
considered.
(b)
mi  , i  1,, n ,
The sample size n is fixed,
mi i 1   i    .
Note:
1
and in fact
~ ~
~
In the limit given by assumption (b), D y1 , y2 ,, yn , 1 ,  2 ,,  n  is
approximately independent of the estimated parameter
hence
approximately
independent
of
the
fitted
̂
and
probability
 i   ˆ .
Note:
If n is large and
mi i 1   i 
(a) The limiting
 2 approximation no longer holds.
is bounded, then
(b) D y1 , y2 ,, yn , 1 ,  2 ,,  n  is not independent of
~ ~
~
 i   ˆ
,
even approximately.
~ ~
~
(b) implies a large value of D y1 , y2 ,, yn , 1 ,  2 ,,  n  can not
necessarily be considered to be the evidence of a poor fit. The large
~ ~
~
value of D y1 , y2 ,, yn , 1 ,  2 ,,  n  might be due to the value of
i .
(II) Comparing two nested models:
The deviance function is most directly useful for comparing two
nested models. Let the null hypothesis H 0 and the alternative
hypothesis H a be
H 0 : the null model
H a : the extended model containing an additional covariate
Denote
2
ˆ 0  ˆ 01
ˆ a  ˆ a1
ˆ 02  ˆ 0 n  : the fitted value under H 0
ˆ a 2  ˆ an  : the fitted value under H a
Then,
D y1 , y 2 ,, y n , ˆ 01 , ˆ 02 ,, ˆ 0 n   D y1 , y 2 ,, y n , ˆ a1 , ˆ a 2 ,, ˆ an 
 2l ˆ a1 , ˆ a 2 ,, ˆ an   l ˆ 01 , ˆ 02 ,, ˆ 0 n 
 likelihood - ratio statistic for testin g H 0 v.s. H a
2
tends to 1 as
 assumption (a) holds and n is large
or
 assumptions (a) and (b) hold.
Note:
D y1 , y2 ,, yn , ˆ 01 , ˆ 02 ,, ˆ 0n  needs not have an approximate 2
distribution nor need it be distributed independently of ̂ 0 .
(III) Sparseness:
The data are sparse if many components of the binomial index
vector are small, that is, mi is small. The effect of sparseness is
mainly on the deviance function and Pearson’s statistic.
(a) Effect on the deviance function:
Assume Yi ~ b1,  i  and a linear logistic model is fitted. That is
mi  1 for all i. Then,
3
~ 

i
   log  ~
~
1 
1  exp x ˆ 

exp xi ˆ

  xi ˆ

i
i
i
~ 
1 
i
1
 
1  exp xi ˆ
and the score function

U ˆ  X t  y  ~   0  X t y  X t ~,
where
~ 
 y1 

1
y 

~ 
2
~   2
y   , 
 
  .
 
~ 
y
 n
 n 
The deviance function is
n 
y 
 1  yi 
~
~
~
D y1 , y 2 ,, y n , 1 ,  2 ,,  n   2  yi log  ~i   1  yi log 
~ 

1


i 1 
i 
 i

n 
 ~i 
~ 


 2  yi log  yi   1  yi log 1  yi   yi log 

log
1


i 
~ 
i 1 
 1  i 

Since yi log  yi   1  yi log 1  yi   0 , thus
n
n
i 1
i 1

 
D y1 , y 2 ,, y n , ~1 , ~2 ,, ~n   2 yi xi ˆ  2 log 1  exp xi ˆ
n

 

 
 2 ˆ t X t y  2 log 1  exp xi ˆ
i 1
n
 2 ˆ t X t ~  2 log 1  exp xi ˆ
i 1
 a function of ˆ
Given ̂ , the deviance function has a conditional degenerate
4
distribution. That is, given a fixed value of ̂ , the deviance function
is independent of the value of y1 , y 2 ,, y n . This implies the
deviance function can not be used to test goodness of fit.
(b) Effect on Pearson’s statistic:
The Pearson’s statistic is
2

yi  ̂i 
 
.
i 1 Var Y 
i
n
n
X  r
2
i 1
2
P ,i
Assume Yi ~ b1,   . Then,
n
 y  ˆ i 
  i
i 1 Var Y 
i
2
n
X
2
n

y
2
i
 ny
n

n
2

y 1  y 
ny 1  y 

n
y 1  y 
i 1
 yi  ˆ 
ˆ 1  ˆ 
i 1 
2
y
i
i 1
 ny 2
y 1  y 

 y
i 1
i
 y
2
y 1  y 
ny  ny 2

y 1  y 
Thus, it is not very useful as a test for goodness of fit!!
Note:
For intermediate cases in which mi are small but mostly greater
than 1, the deviance function or Pearson’s statistic can be used as a
test for goodness of fit.
Note:
If n is large, it is best to use a normal approximation for Pearson’s
statistic.
5
