Generalized linear models 1/15 Generalized linear models A generalized linear model contains the following three parts: 1. The random component: the response Yi belongs to an exponential dispersion family with mean E(Yi ) = µi = b0 (θi ) and Var(Yi ) = a(φ)V (µi ). P 2. The systematic component: ηi = pj=1 Xij βj . 3. Link function (between random and systematic components): g(µi ) = ηi . 2/15 Choice of models I Null model: ηi = β1 (a constant, one parameter) for i = 1, · · · , n. This model consigns all the variation in Yi s to the random component. I Saturated model: ηi = g(Yi ) (namely µi = Yi , n parameters). This model consigns all the variation in Yi s to the systematic component but leaving none for the random component. I Null model is usually too simple and saturated model is uninformative. A statistical model describes how we partition the total variation into systematic structure and random component. 3/15 Deviance in linear models Consider the following linear models: Yi = µi + εi , µi = XiT β and εi ∼ N(0, σ 2 ). The log-likelihood function for the above model is `(µi ) = −n log(2πσ 2 )/2 − n X (Yi − µi )2 /(2σ 2 ). i=1 4/15 Deviance in linear models I The log-likelihood for the full model (FM) with µi = XiT β is `(FM) = −n log(2πσ 2 )/2 − n X (Yi − XiT β̂)2 /(2σ 2 ). i=1 where β̂ is the MLE of β. I The log-likelihood for the saturated model (SM) with µi = Yi is `(SM) = −n log(2πσ 2 )/2 5/15 Deviance in linear models The deviance for linear models is D = φ · 2{`(SM) − `(FM)} = σ 2 · 2{`(SM) − `(FM)} = σ2 · n 2 X (Yi − XiT β̂)2 2σ 2 i=1 = n X (Yi − XiT β̂)2 . i=1 Therefore, deviance in linear models is the same as SSE of the full model. 6/15 Log-likelihood for generalized linear models I Let `(µ, φ) be the log-likelihood of the generalized linear models in terms of mean parameters µ rather than the canonical parameter θ. I For a saturated model, we estimate µ by Y , therefore, `(Y , φ) is the log-likelihood for the saturated model. I For a full model with p parameters, ηi = XiT β, the log-likelihood for the full model maximized over β for a fixed φ is `(µ̂, φ) where µ̂ = X β̂ and β̂ is the MLE. 7/15 Log-likelihood ratio I Recall that the log-likelihood function for generalized linear model is `(µ, φ) = n n X yi θi (µi ) − b{θi (µi )} a(φ) i=1 I o − h(yi , φ) . For a saturated model, the log-likelihood `(Y , φ) is n n o X yi θ̃i − b(θ̃i ) − h(yi , φ) , `(Y , φ) = a(φ) i=1 where θ̃i = θi (Yi ). 8/15 Log-likelihood ratio I For a full model with p parameters, ηi = XiT β, the log-likelihood is `(θ̂, φ) = n n o X yi θ̂i − b(θ̂i ) − h(yi , φ) , a(φ) i=1 where θ̂i = θi (µ̂i ) and µ̂i = g −1 (XiT β̂). I The -2 times the log-likelihood ratio is n n X yi (θ̃i − θ̂i ) b(θ̃i ) − b(θ̂i ) o 2{`(Y , φ) − `(θ̂, φ)} = 2 − . a(φ) a(φ) i=1 9/15 Deviance for generalized linear models Assume that the dispersion parameter function of the form a(φ) = φ/w for some weight function w. The deviance for the generalized linear model is D(β) = φ · 2{`(Y , φ) − `(θ̂, φ)} n n o X = 2w yi (θ̃i − θ̂i ) − b(θ̃i ) + b(θ̂i ) . i=1 The scaled deviance is D(β)/φ. 10/15 Example: Gauss-Markov linear models For linear models, we know that θ̃i = yi , θ̂i = µ̂i , b(θi ) = θi2 /2, a(φ) = σ 2 , φ = σ 2 , and w = 1. So the deviance is D(β) = 2 n n o X yi (yi − µ̂i ) − yi2 /2 + µ̂2i /2 i=1 n X = (yi2 − 2µ̂i yi + µ̂2i ) i=1 n X = (yi − µ̂i )2 . i=1 11/15 Example: Logistic regression models For logistic regression models, we know that θ̃i = log( µ̂i yi ), θ̂i = log( ), 1 − yi 1 − µ̂i b(θi ) = log(1 + eθi ), a(φ) = 1, φ = 1, and w = 1. So the deviance is D(β) = 2 =2 n n X y 1 − yi 1 − yi o yi log( i ) − yi log( ) + log( ) µ̂i 1 − µ̂i 1 − µ̂i i=1 n n X yi log( i=1 yi 1 − yi o ) + (1 − yi ) log( ) µ̂i 1 − µ̂i where µ̂i = p̂i . 12/15 Goodness-of-fit test I Goodness-of-fit test aims to examine if the current full model fits the data well. It is a model diagnostic tool. I The scaled deviance has an asymptotic χ2n−p distribution if the current model is good enough. I But the test can not be applied if the dispersion parameter is unknown. For example, in the linear regression model, φ = σ 2 is unknown. An F-test may be used if the dispersion parameter is estimated using Pearson chi-square statistics. 13/15 Inference for part of parameters For example, if we want to test for H0 : βl+1 = · · · = βk = 0 for k > l. This can be done by comparing the following two nested models Model 1: x1 , · · · , xl with deviance D1 ; Model 2: x1 , · · · , xl , xl+1 , · · · , xk with deviance D2 . Under H0 , we know (D1 − D2 )/φ ∼ χ2k −l if φ is known. If φ is unknown, (D1 −D2 )/(k−l) φ̂ φ̂ = ∼ Fk−l,n−p under H0 where n 1 X (Yi − µ̂i )2 . n−p V (µ̂i ) i=1 14/15 Residuals I The deviance residual is n o1/2 d di = sign(ei ) 2(yi (θ̃i − θ̂i ) − b(θ̃i ) + b(θ̂i ))/a(φ) . where ei = Yi − µ̂i . I The Pearson’s residual is Y − µ̂i ri = pi . V (µ̂i ) 15/15