Generalized linear models 1/16 Linear regression model Consider a Gauss-Markov model Yi ∼ N(µi , σ 2 ) where µi = XiT β. The pdf of the response Yi is o 1 2 (Y − µ ) i i σ2 2πσ n y µ − µ2 /2 o √ yi2 i i i = exp − − log( 2πσ) . σ2 2σ 2 f (Yi ; µi , σ 2 ) = √ 1 exp n − 2/16 Linear regression model Define θi = µi , b(θi ) = µ2i /2, a(φ) = φ, φ = σ 2 and √ h(yi ; φ) = yi2 /(2σ 2 ) + log( 2πσ). Then the pdf of a normally distributed response can be written as f (yi ; θi , φ) = exp n y θ − b(θ ) o i i i − h(yi , φ) . a(φ) 3/16 Logistic regression model Consider a logistic regression model Yi ∼ Bernoulli(pi ) and pi = exp(XiT β) . 1 + exp(XiT β) The pmf of the response is n pi f (yi ; pi ) = exp yi log( ) + log(1 − pi )} 1 − pi n o = exp yi θi − log(1 + eθi ) o n y θ − b(θ ) i = exp i i − h(yi , φ) , a(φ) where θi = log{pi /(1 − pi )}, pi = eθi /(1 + eθi ), a(φ) = 1 and h(yi , φ) = 0. 4/16 Poisson regression model Consider a Poisson regression model Yi ∼ Poisson(µi ) and µi = exp(XiT β). The pmf of the response is y f (yi ; pi ) = exp(−µi )µi i /yi ! n o = exp − µi + yi log(µi ) − log(yi !) n y θ − b(θ ) o i = exp i i − h(yi , φ) , a(φ) where θi = log(µi ), b(θi ) = eθi , a(φ) = 1 and h(yi , φ) = log(yi !). 5/16 Exponential dispersion family I Assume the response Y has pdf or pmf f (y; θ, φ) = exp n yθ − b(θ) a(φ) o − h(y , φ) , for some functions a(·), b(·), h(·, ·). The parameter θ is called “canonical parameter” and φ is a dispersion parameter. I The above family is called exponential dispersion family. I This family includes many distributions as special cases: normal, Poisson, Binomial, Gamma,· · · 6/16 Statistical properties The likelihood theory suggests that E n∂ o log f (Y ; θ, φ)|θ0 ,φ0 = 0 ∂θ and Var n∂ o n ∂2 o log f (Y ; θ, φ)|θ0 ,φ0 = −E , log f (Y ; θ, φ)| θ ,φ 0 0 ∂θ ∂θ2 where θ0 , φ0 are true values of θ and φ. 7/16 Mean and variance The statistical property of the exponential dispersion family implies that µ = E(Y ) = ∂b(θ) ∂θ and Var n Y − b0 (θ) o = b00 (θ)/a(φ). a(φ) Combining the above two results together, we obtain Var(Y ) = b00 (θ)a(φ). 8/16 Mean and variance relationship Because µ = b0 (θ), we obtain θ = b0−1 (µ). As a result, the variance of Y depends on µ in the following way Var(Y ) = a(φ)V (µ), where V (µ) = b00 {b0−1 (µ)}. 9/16 Link function I The heart of a generalized linear model is to model some function of µ as a linear function of predictors X . That is for some link function g(·) and some p-dim coefficient vector β, we assume that g(µ) = X T β. I A canonical link is to set θ = b0−1 (µ) = X T β. I Therefore, a canonical link function is g(·) = b0−1 (·). 10/16 Canonial link functions I In the Gauss-Markov linear model, µ = θ = E(Y ). Then b0 (θ) = θ and b0−1 (µ) = µ. The canonical link function is g(µ) = µ. Therefore, µ = X T β. I In the Poisson regression model, µ = b0 (θ) = exp(θ). Then b0−1 (µ) = log(µ). The canonical link function is g(µ) = log(µ). Therefore, log(µ) = X T β. I In Bernoulli response case, µ = b0 (θ) = exp(θ)/(1 + exp(θ)). Then b0−1 (µ) = log{µ/(1 − µ)}. The canonical link function is g(µ) = log{µ/(1 − µ)}. Therefore, log{µ/(1 − µ)} = X T β. 11/16 Maximum likelihood I Assume that Y1 , · · · , Yn are independent sample from f (y; θ, φ) and X1 , · · · , Xn are the corresponding covariates. The log-likelihood for β and φ is `(β, φ) = n n o X Yi θi − b(θi ) − h(Yi , φ) a(φ) i=1 where θi = b0−1 (µi ) = b0−1 {g −1 (XiT β)}. I The maximum likelihood estimator of β is P β̂ = arg maxβ ni=1 {Yi θi − b(θi )}. 12/16 Score function I The score function of β is n 1 1 ∂`(β, φ) X Xi . = {Yi −b0 (θi )} ∂β a(φ)V {g −1 (XiT β)} g 0 {g −1 (XiT β)} i=1 I The MLE of β is a solution of ∂`(β, φ)/∂β = 0. I Note that the above score function does not depend on the dispersion parameter φ. The estimation of φ can be obtained seperately. 13/16 Score function in a matrix form Let X = (X1 , · · · , Xn )T be an n × p matrix, Y = (Y1 , · · · , Yn )T and µ = (µ1 , · · · , µn )T . Define the following n × n matrices, W −1 = diag{V (µ1 ){g 0 (µ1 )}2 , · · · , V (µn ){g 0 (µn )}2 } and ∆ = diag{g 0 (µ1 ), · · · , g 0 (µn )}. The score function can be written as ∂`(β, φ) = {a(φ)}−1 X T W ∆(Y − µ). ∂β 14/16 Fisher’s information matrix I The Fisher’s information matrix for β is ∂`(β, φ) ∂µ ∂W ∆ = −{a(φ)}−1 X T W ∆ T +{a(φ)}−1 X T (Y −µ). T ∂β∂β ∂β ∂β T I The Fishers’s information matrix is In = {a(φ)}−1 X T WX . 15/16 Asymptotic normality of MLE β̂ Using the large sample theory of likelihood method, we have β̂ − β ∼ N(0, a(φ)(X T WX )−1 ). The wald type inference can be performed based on the above asymptotic normality. 16/16