Over or under dispersion Problem 1/15 Example 1: dogs and owners data set I In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. I Let Yij = 1 if the j-th quiz question was answered correctly by the i-th person. In the data set we collected, i = 1, · · · , 27 and j = 1, · · · , 12. I It is reasonable to assume that Yij s (j = 1, · · · , 12) are dependent to each other. 2/15 Example 1 continued I To model the dependence among Yij s (j = 1, · · · , 12), we could assume that Yij ∼ Bernoulli(νi ) where νi ∼ Beta(α1 , α2 ) is a random variable. I Using the property of Beta distribution, E(νi ) = I α1 α2 1 α1 and Var(νi ) = . α1 + α2 α1 + α2 α1 + α2 α1 + α2 + 1 For convenience, define pi = α1 /(α1 + α2 ) and φ as an additional parameter. Then E(νi ) = pi and Var(νi ) = φpi (1 − pi ). 3/15 Example 1 continued I By using the above model, we know that Cov(Yij , Yij 0 ) = E(Yij Yij 0 ) − E(Yij )E(Yij 0 ) = E(νi2 ) − E 2 (νi ) = Var(νi ) = φpi (1 − pi ). I If φ = 0, then Yij and Yij 0 are uncorrelated. This also implies that νi is a constant degenerated to pi . I If φ > 0, then Yij and Yij 0 are dependent. This corresponds to the overdispersion case. 4/15 Example 1 continued Let Si = Yi1 + · · · + Yini . Then we have E(Si ) = E{E(Si |νi )} = ni X E(νi ) = ni pi , j=1 and Var(Si ) = E{Var(Si |νi )} + Var{E(Si |νi )} = E{ni νi (1 − νi )} + Var{ni νi } = ni (pi − φpi (1 − pi ) − pi2 ) + ni2 φpi (1 − pi ) = ni pi (1 − pi )[1 + (ni − 1)φ]. I If φ = 0, no dispersion. I If φ > 0, over dispersion; If φ < 0, under dispersion. 5/15 Over and under dispersion problems In a common logistic regression model, Si ∼ Binomial(ni , pi ) and pi = exp(XiT β) . 1 + exp(XiT β) If one assumes that the model for pi is correctly specified but the observed variance of Si is larger or smaller than expected variance ni pi (1 − pi ), then we have the so-called under or over dispersion problems. 6/15 Detection of over or under dispersion problem If the usual logistic regression model is correct, then the deviance D follows a chi-square distribution with m − p degrees of freedom. I If D > m − p = E(χ2m−p ), it could be an indicator of the over dispersion problem. I If D < m − p = E(χ2m−p ), it could be an indicator of the under dispersion problem. I But D is away from m − p could also be the result of (1): under or over fitting; (2): wrong link function; (3): existence of outliers; (4): binary data or ni small. 7/15 Possible reasons for dispersion I Variation among success probabilities. I Correlation among binary responses. 8/15 Over or under dispersion logistic regression model Let Si be the number of successes among ni trials. An over or under dispersion logistic regression model assumes that E(Si ) = ni pi and Var(Si ) = φni pi (1 − pi ). Moreover, pi = exp(XiT β) . 1 + exp(XiT β) Here φ is called dispersion parameter. 9/15 Quasi-likelihood I Recall that, in a usual logistic regression model, Si ∼ Binomial(ni , pi ) and pi = exp(XiT β)/{1 + exp(XiT β)}. I The log-likelihood for β is `(β) = m X Si XiT β i=1 I − m X ni log{1 + exp(XiT β)} + Constant. i=1 The score function for β is m ∂`(β) X = Xi (Si − ni pi ) ∂β i=1 m m Z X Si − ni pi ∂ni pi ∂ X µi Si − µ = dµ = ni pi (1 − pi ) ∂β ∂β V (µ) Si i=1 i=1 where V (µ) = µ(ni − µ)/ni and µi = ni pi . 10/15 Quasi-likelihood for over or under dispersion models I Define the log quasi-likelihood for β as Q(β) = m X Qi = i=1 I m Z X i=1 µi Si Si − µ dµ. φV (µ) The maximum quasi-likelihood estimator of β is β̂ = arg max Q(β). I The above estimator β̂ is the same as the MLE of β in a usual logistic regression model. Because φ is not useful in the quasi-score function. 11/15 Estimation of dispersion parameter I Define the Pearson χ2 statistic as χ2 = m X (Si − ni p̂i )2 . ni p̂i (1 − p̂i ) i=1 I It can be shown that E(χ2 ) ≈ (m − p)φ. I Then we can estimate φ by φ̂ = χ2 /(m − p). 12/15 Deviance I The deviance for the over or under dispersion logistic regression model is defined as D = −2φQ = −2 m Z X i=1 I µi Si Si − µ dµ. V (µ) The deviance is the same as the usual logistic regression model without dispersion parameter. 13/15 Wald type inference for β For over or under dispersion logistic regression model, β̂ − β ∼ N(0, φ(X T VX )−1 ) where V = diag(n1 p1 (1 − p1 ), · · · , nm pm (1 − pm )) and φ is the dispersion parameter. In performing the inference, we need to take the dispersion parameter into consideration. 14/15 Likelihood ratio type inference for β Suppose we consider comparing the following two models Model Deviance Covariates 1 D1 x1 , · · · , xl 2 D2 x1 , · · · , xl , xl+1 , · · · , xp for p > l. This corresponds to testing H0 : βl+1 = · · · = βp = 0 vs H1 : not H0 . The test statistic for testing above hypothesis is Fn = (D1 − D2 )/(p − l) φ̂ , which follows an Fp−l,m−p distribution under H0 . 15/15