2.4 Measuring the goodness of fit (I) Goodness of fit: Let the log-likelihood function n m l log i yi log i mi yi log 1 i i 1 yi n m i m log i yi log i mi yi log i i 1 mi mi yi Then, the deviance function is n y m y ~ ~ ~ D y1 , y2 ,, yn , 1 , 2 ,, n 2 yi log ~i mi yi log i ~i i 1 i mi i where ~i mi i ˆ . Result: Let the null model involves p parameters, g i xi , i 1, 2, , n and . Then, ~i g 1 xi ˆ mi D y1, y2 ,, yn , ~1, ~2 ,, ~n 2n p under the following assumptions: (a) The observations are distributed independently, according to the binomial distribution. The case of over-dispersion is not considered. (b) mi , i 1,, n , The sample size n is fixed, mi i 1 i . Note: 1 and in fact ~ ~ ~ In the limit given by assumption (b), D y1 , y2 ,, yn , 1 , 2 ,, n is approximately independent of the estimated parameter hence approximately independent of the fitted ̂ and probability i ˆ . Note: If n is large and mi i 1 i (a) The limiting 2 approximation no longer holds. is bounded, then (b) D y1 , y2 ,, yn , 1 , 2 ,, n is not independent of ~ ~ ~ i ˆ , even approximately. ~ ~ ~ (b) implies a large value of D y1 , y2 ,, yn , 1 , 2 ,, n can not necessarily be considered to be the evidence of a poor fit. The large ~ ~ ~ value of D y1 , y2 ,, yn , 1 , 2 ,, n might be due to the value of i . (II) Comparing two nested models: The deviance function is most directly useful for comparing two nested models. Let the null hypothesis H 0 and the alternative hypothesis H a be H 0 : the null model H a : the extended model containing an additional covariate Denote 2 ˆ 0 ˆ 01 ˆ a ˆ a1 ˆ 02 ˆ 0 n : the fitted value under H 0 ˆ a 2 ˆ an : the fitted value under H a Then, D y1 , y 2 ,, y n , ˆ 01 , ˆ 02 ,, ˆ 0 n D y1 , y 2 ,, y n , ˆ a1 , ˆ a 2 ,, ˆ an 2l ˆ a1 , ˆ a 2 ,, ˆ an l ˆ 01 , ˆ 02 ,, ˆ 0 n likelihood - ratio statistic for testin g H 0 v.s. H a 2 tends to 1 as assumption (a) holds and n is large or assumptions (a) and (b) hold. Note: D y1 , y2 ,, yn , ˆ 01 , ˆ 02 ,, ˆ 0n needs not have an approximate 2 distribution nor need it be distributed independently of ̂ 0 . (III) Sparseness: The data are sparse if many components of the binomial index vector are small, that is, mi is small. The effect of sparseness is mainly on the deviance function and Pearson’s statistic. (a) Effect on the deviance function: Assume Yi ~ b1, i and a linear logistic model is fitted. That is mi 1 for all i. Then, 3 ~ i log ~ ~ 1 1 exp x ˆ exp xi ˆ xi ˆ i i i ~ 1 i 1 1 exp xi ˆ and the score function U ˆ X t y ~ 0 X t y X t ~, where ~ y1 1 y ~ 2 ~ 2 y , . ~ y n n The deviance function is n y 1 yi ~ ~ ~ D y1 , y 2 ,, y n , 1 , 2 ,, n 2 yi log ~i 1 yi log ~ 1 i 1 i i n ~i ~ 2 yi log yi 1 yi log 1 yi yi log log 1 i ~ i 1 1 i Since yi log yi 1 yi log 1 yi 0 , thus n n i 1 i 1 D y1 , y 2 ,, y n , ~1 , ~2 ,, ~n 2 yi xi ˆ 2 log 1 exp xi ˆ n 2 ˆ t X t y 2 log 1 exp xi ˆ i 1 n 2 ˆ t X t ~ 2 log 1 exp xi ˆ i 1 a function of ˆ Given ̂ , the deviance function has a conditional degenerate 4 distribution. That is, given a fixed value of ̂ , the deviance function is independent of the value of y1 , y 2 ,, y n . This implies the deviance function can not be used to test goodness of fit. (b) Effect on Pearson’s statistic: The Pearson’s statistic is 2 yi ̂i . i 1 Var Y i n n X r 2 i 1 2 P ,i Assume Yi ~ b1, . Then, n y ˆ i i i 1 Var Y i 2 n X 2 n y 2 i ny n n 2 y 1 y ny 1 y n y 1 y i 1 yi ˆ ˆ 1 ˆ i 1 2 y i i 1 ny 2 y 1 y y i 1 i y 2 y 1 y ny ny 2 y 1 y Thus, it is not very useful as a test for goodness of fit!! Note: For intermediate cases in which mi are small but mostly greater than 1, the deviance function or Pearson’s statistic can be used as a test for goodness of fit. Note: If n is large, it is best to use a normal approximation for Pearson’s statistic. 5