Generalized Linear Models: A subclass of nonlinear models with a \linear" component Data: % where There is a link function h() that links the conditional mean of Yj given Xj = (Xij ; : : : ; Xpj )T to a linear function of unknown parameters, e.g. (Y1; XT1 ) (Y2; XT2 ) .. (Yn; XTn ) response variable 3. (Systematic part of the model) h(E (Yj jXj )) = XTj = 1X1j + : : : + pXpj Members of the exponential family of distributions available in - explantory or regression, variables S-PLUS: glm() function SAS: PROC GENMOD XTj = (X1j ; X2j ; : : : ; Xpj ) describe the conditions under which the response Yj was obtained. 1 3 Basic Features: 1. Y1; Y2; ; Yn are independent 2. Yj has a distribution in the exponential family of distributions, j = 1; 2; ; n. The joint likelihood has the form (canonical form): where Normal distribution: Y N (; 2) ( 2) ( Y ) 1 f (Y ; ; ) = p 2 exp 22 2 8 > < 2 " > ; " C (Y; ) L(; ; Y) = jn=1f (Yj ; j ; ) Here Y b(j ) g f (Yj ; j ) = exp f j j a() for some functions 9 #> = 2 Y = exp > 2 2 1 Y 2 + log(22) 2 : E (Y ) = V ar(Y ) = 2 = 2 and b() = 2 a() = a(); b(); and c(): Here, j is called a canonical parameter. 2 4 Binomial Distribution: Y Bin(n; ) n f (Y ; ; ) = Y (1 )n Y Y Y (1 )n = n Y 1 ) ( = exp Y log + n log(1 )+ log n 1 Y " C (Y; ). f (Y ; ; ) = Y = exp = exp n Y Y 1 exp( Y= ) () + ( 1) log(Y ) log( ) log( ()) 1 +log 1 1 = log " C (Y; ) ) = 1 +e e 1 Here, = 1 = 1; ) = n log(1 + e ): b() = n log(1 There is no dispersion parameter, so we will take 1 and a() = 1. b() = log( ); a() = 7 5 Poisson distribution: Y P oisson() Y e f (Y ; ; ) = Y! n o = exp Y log() log(Y !) " C (Y; ). Here and o + ( 1) log(Y ) + 1 log( 1) log( ()) Here and Two parameter gamma distribution:Y gamma(; ) Inverse Gaussian Distribution: Y IG(; 2) ( (Y )2 1 f (Y ; ; ) = p 2 3 exp 222Y 2 Y for Y > 0; > 0; 2 > 0 also E (Y ) = ; V ar(Y ) = 23 8 > < Y ( 212 ) 1 1 + 2 exp > 2 2 Y : = log() ) = exp() b() = = exp() and there is no additional dispersion parameter. We will take 1 and a() = 1. Then, f (Y ; ; ) = expfY exp() log( (Y + 1))g 6 ) 9 = 1 log(22Y 3)> > 2 ; " C (Y; ) Here, 1 2 2p b() = = 2 = 2 \dispersion parameter" a() = = 2 = 8 Canonical form of the log-likelihood: `(; ; Y ) = log(f (Y ; ; )) = Y b() + C (Y; ) a() paratial derivatives: Normal distribution: Y N (; 2) ; Y ) = Y b0 ( ) (i) @`(; @ a() 2 b() = 2 b 0 ( ) = 00 2 (ii) @ `(@;2 ;Y ) = ab(()) b00() = 1 a() = = 2 Apply the following results ! @`(; ; Y ) =0 Ed @ @`(; ; Y ) 2 @ 2`(; ; Y ) + E =0 @2 @ to obtain ! 0 @`(; ; Y = E (Y ) b () 0=E @ a() E ! " # ) E (Y ) = ) ) Var(Y ) = b00()a() = (1) = = 2 " the variance does not depend on the parameter that dene the mean 9 which implies E (Y ) = b0() Furthermore, from (i), (ii) and (iv) we have ! !2 2 0 = E @ `(; 2 ; Y ) + E @`(; ; Y ) @ @ ! Y b0() b00() + V ar = a() a() b00() V ar(Y ) + = a() [a()]2 which implies b00() V (Y ) = a() " " function of the dispersion called the parameter variance function 10 11 Binomial distribution: Y Bin(n; ) = log( 1 ) b() = (n)log(1 + e ) ne ) E (Y ) = ne = n b0() = 1+ e 1+e ne b00() = [1+ e ]2 a() 1 9 = ; Var(Y ) = [1+nee]2 (1) = n 1+ee 1+1e = n(1 ) " V ar(Y ) is completely determined by the parameter that denes the mean. 12 Poisson Distribution: Y P oisson() b() = exp() b0() = exp() ) = E (Y ) = exp() b00() = exp() a() = 1 ) Var(Y ) = exp() = E (Y ) = Gamma Distribution: b() = b0() = b00() = 12 a() = ) log( ) 1 ) E (Y ) = 1 Var(Y ) = = = a()b00() 2 Binomial distribution: = XT ne n = E (Y ) = b0() = 1 + e ) = log 1 = XT Hence, the canonical (default) link is the logit link function = XT h() = log 1 Other available link functions: Probit: h() = 1() = XT where Z XT p1 e 2u 2du = (XT ) = 1 2 [E (Y )]2 13 Canonical link function: is the link function h() = XT that corresponds to = XT @b() = b0() = E (Y ) = @ 15 Complimentary log-log link (clog log): h() = log[ log(1 )] = XT The inverse of this link function is = 1 exp[ exp(XT )] Normal (Gaussian) distribution: The glm( ) function = XT = b0() = then E (Y ) = = = XT and the canonical link function is the identity function h() = = XT 14 glm(model, family, data, weights, control) family = binomial family = binomial(link=logit) family = binomial(link=probit) family = binomial(link=cloglog) 16 Poisson regression (log-linear models) Inverse Gaussian Distribution Y P oisson() = XT = E (Y ) = b0() = exp() and the canonical link function is = log() = XT This is called the log link function. Other available link functions: identity link: h() = = XT square root link: h() = p = XT glm(model, family, data, weights, control) family = poisson family = poisson(link=log) family = poisson(link=identity) family = poisson(link=sqrt) = XT p 2 b() = 1 0 = E (Y ) = b () = p 2 The canonical link is 1 = h() = 2 = XT 2 This is the only built-in link function for the inverse gaussian distribution. glm(model, family, data, weights, controls) family = inverse.guassian family = inverse.guassian(link=1=2) 17 19 Normal-theory Gauss-Markov model: Y N (X ; 2I ) Gamma distribution: or Y1; Y2; : : : ; Yn are independent with Yj N (XTj ; 2). Then XT = 1 E (Y ) = = the canonical link function is 1 = XT h() = This is called the \inverse" link. 8 9 < (Y XT )2 = f (Yj ; j 2) = p 2 exp : j 2 j ; 2 2 8 > > < glm(model, family, data, weights, control) family = gamma family = gamma(link=inverse) family = gamma(link=identity) family = gamma(link=log) (Y )(XT ) = exp > j j 2 > : 9 3> (XTj )2 2 2 > 2 1 4 Yj + log (2 2)5= > 2 2 > ; " C (Yj ; 2) Here, b() = (XTj )2 2 = 2 and a() = E (Yj ) = j = XTj 18 20 Logistic Regression: Y1; Y2; : : : ; Y8 are independent random The link function is the identity function h(j ) = j = XTj The class of generalized linear models includes the class of linear models: (i) Systematic part of the model (identity link function) j = E (Yj ) = XTj = 1X1j + : : : + pXpj counts with YBin(nj ; j ) log j 1 j = 0 + 1Xj Comments: * Here the logit link function is the canonical link function * Use of the binomial distribution follows from (ii) Stochastic part of the model Yj N (j ; 2) and Y1Y2 : : : Yn are independent. (i) Each beetle exposed to log-dose Xj responds independently (ii) Each beetle exposed to log-dose Xj has probability j of dying 23 21 Example 13.1 Mortality of a certain species of beetle after 5 hours exposure to gaseous carbon disulde (Bliss, 1935). Number Number of Number Dose Log(Dose) killed survivors exposed (mg/l) (Xj ) (Yj ) (nj Yj ) nj 49.057 3.893 6 53 59 52.991 3.970 13 47 60 56.911 4.041 18 44 62 60.842 4.108 28 28 56 64.759 4.171 52 11 63 68.691 4.230 53 6 59 72.611 4.258 61 1 62 76.542 4.338 60 0 60 22 Maximum Likelihood Estimation: Joint likelihood for Y1; Y2; : : : ; Yk L(; Y; X ) = jk=1f (Yj ; j ; ) ) ( Y b(j + C (Yj ; ) = jk=1 exp j j a(phi) n k = j =1 exp Yj (0 + 1Xj ) n o nj log(1 + e0+1Xj ) + log j Yj Since = 1 a() = = 1 j = log 1 jj = 0 + 1Xj b(j ) = nj log(1 + e0+1Xj ) C (Yj ; ) = log nYjj 24 log-likelihood function: # " k X Yj j b(j ) + C (Yj ; ) `( ) = a() j =1 = k h X j =1 Yj (0 + 1Xj ) nj log(1 + e0+1Xj ) +log nj Yj = 0 k X j =1 k X Yj + 1 k X j =1 Yj Xj k X n log j Yj j =1 j =1 To maximize the log-likelihood, solve the system of equations obtained by setting rst partial derivatives equal to zero. nj log(1 + e0+1Xj ) + The likelihood equations (or score function) can be expressed as h0i 0= = X T (Y m) = Q 0 " score function where 2 3 2 3 2 3 1 X1 Y1 n11 6 7 6 7 6 7 X = 664 1.. X2 775 Y = 664 Y.. 2 775 m = 664 n.. 22 775 1 Xk Yk n88 and from the inverse link function exp(0 + 1Xj j = 1 + exp(0 + 1Xj ^m We will use Q; ^ , and ^j to indicate when these quantities are evaluated at = ^ . 27 25 Second partial derivatives of the log-likelihood function: 3 2 @ 2`() @ 2`( ) 2 6 @ @ 0 1 7 7 0 H = 64 @@ 2`( ) @ 2`( ) 5 Estimating Equations (likelihood equations): 8 8 X X ^ exp(^ + ^1X1 nj 0 = @`(^) = Yj ^ ^ @ 0 j =1 j =1 1 + exp( + 1 X1) = 8 X j =1 (Yj nj ^j ) 8 X ^ 0 = @`(^) = Xj Yj @ 0 j =1 8 X j =1 Xj (Yj = 8 X j =1 nj Xj @0@1 T X VX @12 - called exp(^ + ^1X1 1 + exp(^ + ^1X1) nj ^j ) where Generally no closed form expression for the solution If a solution exists in the interior of the parameter space (0 < j < 1 for all j = 1; : : : ; k) then it is unique. 26 V = the Hessian matrix 2 6 4 n11(1 1) 3 ... nk k (1 k ) 7 5 = V ar(Y) We will use H^ = X T V^ X and V^ to denote H and V evaluated at = ^ . 28 Fisher scoring algorithm Fisher Information matrix: 3 2 @ 2`() @ 2`() E E 2 6 @0@1 7 7 @0 H = 64 2 2`( ) 5 @ E @@`(2) E @0@1 1 = E (X T V X ) = XT V X = H Since the second partial derivatives do not depend on Y = (Y1; : : : ; Yk)T . ^ ^ (S +1) = ^ (S ) + I^ 1Q " for the logistic regression model with binomial counts I^ = H^ Modication: - Check if the log-likelihood is larger at ^ (S +1) than at ^ (S ). - If it is, go to the next iteration - If not, invoke a halving step S +1)) S +1) = 1 ( ^ (S ) + ^ (old ^ (new 2 31 29 Iterative procedures for solving the likelihood equations: Start with some initial values 2 where ^(0) ^ (0) = 4 0(1) ^1 3 5 " P = log ( 1 P ) 0 Large sample inference: (see results 9.1-9.3) # From result 9.2, ^ _ N (; (X T V 1X ) 1) when n1; n2; : : : ; nk are all large " # \large" is nj j > 5 nj (1 j ) > 5 Estimate the covariance matrix as Vd ar(^ ) = (X T V^ 1X ) 1 where V^ is V evaluated at ^ Pk Y P = Pkj =1 j nj j =1 Newton-Raphson algorithm ^ ^ (S +1) = ^ (S ) + H^ 1Q " these are evaluated at ^ (S ) 30 32 For the Bliss beetle data: " # " # ^ 60:717 ^ = ^0 = 14 :883 1 with 2 3 " S2^ S^0;^1 d 0 4 5 ^ V ar() = S = 266::84 2 55 ^0;^1 S^1 6:55 1:60 # Approximate (1 ) 100% condence intervals ^0 Z=2S^0 ^1 Z=2S^1 " from the standard normal distribution Z:025 = 1:96 33 Wald tests: Likelihood ratio (deviance) tests: Test H0 : 0 = 0 vs. HA : 0 6= 0 2 32 ^ 0 X2 = 4 0 5 = S^0 2 60:717 = 137:36 5:18 p-value = P rf2(1) > 197:36g < :0001 Test H0 : 1 = 0 vs. HA : 1 6= 0 2 14:883 2 = 138:49 ^ 0 X2 = 4 1 5 = S^1 1:265 2 35 3 p-value = P rf2(1) > 138:49g < :0001 34 Model A: log 1 jj = 0 and Yj Bin(nj ; j ) Model B: log 1 jj = 0 + 1Xj and Yj Bin(nj ; j ) Model C: log 1 jj = 0 + j and Yj Bin(nj ; j ) Model C simply says that 0 < j < 1; j = 1; : : : ; k 36 Null: Deviance residual h DevianceNull = 2 log-likelihoodModel A i -log-likelihoodModel C = 2 k h X + j =1 k X Yj log(Pj =P) (nj Yj )log j =1 with (k 1)d:f: 1 Pj i 1 P where Yj j = 1; : : : ; k nj Pk nj Pj Yj = Pj =1 P = k nj j =1 nj are the m.l.e.'s of j under models C and A, respectively. Pj = 37 LOOSE: DEV (Lack-of-t test) j = + X (model B) H0 : log 0 1 j 1 j j = + (model C) HA : log 0 j 1 j G2 = deviancelack-of-t = devianceNULL devianceLOOSE 2 k P X = 2 4 Yj log j ^j j =1 3 k 1 X Pj 5 + (nj Yj )log 1 ^j j =1 = 11:42 on [(k 1) 1] = k 2d:f: (= 8 2 = 6d:f:) (p-value = 0:762) 39 LOOSE: deviance residuals DevianceLOOSE = h 2 log-likelihoodModel A i -log-likelihoodModel B k h X ^ = 2 Yj log j P j =1 k 1 X ^j i + (nj Yj )log 1 P j =1 = 272:7803 with (2 1) = 1d:f: where exp(^0 + ^1Xj ) 1 + exp(^0 + ^1Xj P exp(^0 = kj=1 Yj P = P k nj 1 + exp(^0 j =1 ^j = 38 Pearson chi-square test: k (Y X ^j )2 j nj X2 = nj ^j j =1 k (n X ^j ))2 j Yj nj (1 + nj (1 ^j ) j =1 k k X (1 Pj [1 ^j ])2 (P ^ )2 X nj = nj j j + ^j 1 ^j j =1 j =1 = 10:223 with d.f.= k 2 = 6 p-value = .1155 40 Estimates of expected counts under Model B. j nj ^j nj (1 ^j ) 1 3.43 55.57 2 9.78 50.22 3 22.66 39.34 4 33.83 22.17 Cochran's rule: 5 50.05 12.95 6 53.27 5.73 6 59.22 2.78 8 58.74 1.26 All expected counts should be larger than one. Dene Gj = @@^^j @@^^j = [^j (1 ^j )Xj ^j (1 ^j )] 0 1 2 Then, S^j = Gj [Vd ar(^ )]GTj and an approximate standard error is r ar(^ )]GTj S^j = Gj [Vd An approximate (1 ) 100% condence interval for j is ^j Z=2S^j At least 80% should be larger than 5. 43 41 Estimate mortality rates: exp(^0 + ^1Xj ) ^j = 1 + exp(^0 + ^1Xj Use the delta method to compute an approximate standard error exp(^0 + ^1Xj @ ^j = @ ^0 [1 + exp(^0 + ^1Xj )]2 = ^j (1 ^j ) Xj exp(^0 + ^1Xj @ ^j = ^ @ 0 [1 + exp(^0 + ^1Xj )]2 = Xj ^j (1 ^j ) 42 Estimate the log-dose that provides a certain mortality rate, say 0. 0 = + X ) log 0 1 0 1 0 log 1 00 0 X0 = 1 The m.l.e. for X0 is log 1 00 ^0 X^0 = ^1 44 The m.l.e. for the does is Use the delta method to obtain an approximate standard error: @ X^0 = ^1 @ ^0 h 1 i log 1 00 ^0 @ X^0 = @ ^1 ^12 Dene G^ = @@X^^00 @@X^^10 and compute h i SX2^ = G^ Vd ar(^ ) G^T 0 d ^0) dose 0 = exp(X Apply the delta method @ exp(X^0) = exp(X^0) @ X^0 Then Sdose = [exp(X^0)]2Vd ar(X^0) d 0 d )S Sdose = [exp(X^0)]SX^0 = (dose 0 X^0 d and an approximate (1 ) 100% condence interval for dose0 is exp(X^0) Z=2[exp(X^0)]SX^0 45 The standard error is r h i ar(^ ) G^T SX^0 = G^ Vd 47 Fit a probit model to the Bliss beetle data: and an approximate (1 ) 100% condence interval for X0 is X^0 Z=2SX^0 Suppose you want a 95% condence interval for the dose that provides a mortality rate of 0 . dose0 = exp(X0) 46 where 1(j ) = 0 + 1Xj Xj = log(dose) Yj = Bin(nj ; j ) j = 1; 2; : : : ; k Note that j = Z 0+1Xj 1 p12 e 1 2 2 t dt 48 Fit the complimentary log-log model to the Bliss beetle data where log[ log(1 j )] = 0 + 1Xj Note that Xj = log(dose) Yj = Bin(nj ; j ) j = 1 exp[ e0+1Xj ] j = 1; 2; : : : ; k Independent Poisson counts: Yjk P oisson(mj ) j = 1; : : : ; 5 k = 1; : : : ; 3 where my P rfYjk = yg = j e mj y! for y = 0; 1; 2; : : : E (Yjk = mj V ar(Yjk ) = mj Log-link function: log(mj = 0 + 1Xj 51 49 Poisson Regression (Log-linear models)[.2in] Example 13.2: Yjk = number of TA98 salmonella colonies on the k-th plate with the j-th dosage level of quinoline Xj = log10(dose of quinoline(g)) X1 = 0 X2 = 1:0 X3 = 1:5 X4 = 2:0 X5 = 2:5 and k = 1; 2; 3 plates at each ?? 50 Joint likelihood function mYj jk mj 5 3 e L() = j =1k=1 Yjk Log-likelihood function `() = = 5 X 3 X j =1 k=1 5 X 3 X [Yjk log(mj ) mj log(Yjk !)] [Yjk [0 + 1Xj ] e0+1Xj j =1 k=1 log(Yjk !)] = 0Y00 + 1 5 X 3 X j =1 k=1 5 X j =1 Xj Yj 0 3 5 X j =1 e0+Xj log(Yjk !) 52 Likelihood equations: 5 X 0 = @`() = Y:: 3 e0+1Xj @0 j =1 5 5 X X Xj Yj: 3 Xj e0+1Xj 0 = @`() = @1 j =1 j =1 Use the Fisher scoring algorithm: 2 3 2 3 Y11 1 X1 6 7 6 Y2 1 X 7 Y = 664 . 1 775 X = 664 . . 1 775 . . . Y 1 X5 53 3 2 @`() T 0 7 Q = 64 @`@ () 5 = X (Y m) where @1 2 Fisher information matrix 3 2 H = = = where @ 2`( @ 2`( ) 2 6 @ @ 7 @ E 64 @ 2`0() @02`(1 ) 75 @0@1 @12 " P P P P mj Xj mj P j Pk Pj Pk 2 X m j j j k j k Xj mj T X m X 2 m1 6 V ar(Y) = m = 3 m1 6 7 6 m1 7 m = 64 . 75 = exp(X ) . m5 # 6 6 6 6 6 6 6 4 3 m1 m1 m2 53 ... m5 7 7 7 7 7 7 7 7 5 55 Iterations: (S +1) = (S ) + H 1Q Second partial derivatives of `() @ 2 `( ) = @02 @ 2 `( ) = @0@1 @ 2 `( ) = @12 3 3 3 5 X j =1 5 X j =1 5 X j =1 e0+1Xj = XX e0+1Xj XX = j k Xj2e0+1Xj = j k Use halving steps as needed mj Xj mj XX j k Xj2mj 54 Starting values: Use the m.l.e.s for the model with 1 = 0, i.e., Yij P oisson(m) are i.e.d. with m = e0 . Then, m ^ (0) = Y:: and ^0(0) = log(Y::) ^1(0) = 0 56 Dene Large sample normal approximation for the distribution of ^ ^ N (; H 1) % estimate this by substituting ^ for in H to obtain ^ i h ^ Xm ^ ] G^ = @@m^^0 @@m^^1 = [m Then the estimated variance of is m ^ = e^0+^1X 2 = G ^H^ 1G^T Sm ^ = G^[X T m^ X ] 1GT q 2 and an apThe standard error is Sm^ = Sm ^ proximate (1 ) 100% condence interval is m ^ Z=2Sm^ 59 57 Estimate the mean number of colonies for a specic log10(dose) of quinoline, say X The m.l.e. is Four kinds of residuals: Response residuals: Yi mu ^ i= m ^ = e^0+^1X Apply the delta method: @m ^ = e^0+^1X = m ^ @ ^0 @m ^ = X e^0+^1X = X m ^ @ ^1 " Yi ni ^i # Pearson residuals: Yi ^i q (binomial) = q Yi ni^i ni^i(1 ^i) Vd ar(Yi) = Yip ^i (Poisson) ^i 58 60 Working residuals: Yi ^i = Yi n^i (binomial) @i ni^i(1 ^i) @i = Yi ^i (Poisson) ^i For the binomial family with logit link ! ! nii i = log = XTi i = log 1 i ni(1 i) exp(XTi ) = n exp(i) i = nii = ni i 1 + exp( ) 1 + exp(XTi i exp( i ) @i = ni = nii(1 i) @i [1 + exp(i)]2 Poisson family G2 = (deviance) = 2 where then q binomial family and k X i=1 " Yilog ! Yi + (ni Yi)log ni Yi ni^i ni(1 ^i) !# ei = v sign(Yi n^i) !# ! u " u ni Yi Yi t 2 Yilog n ^ + (ni Yi)log n (1 ^ ) i i i i 62 Yi ^i ! ^i = exp(^0 + ^1X1i+??) log(i) = 0 + 1X1i + : : : v u u t Yi ^i ! 63 ei = sign(Yi ^i) jdij where di is the contribution of the i-th case to the deviance G2 = 2 i=1 Yilog ei = sign(Yi ^i) 2Yilog 61 Deviance residual: k X