Generalized Linea r Mo

advertisement
Generalized Linear Models:
A subclass of nonlinear models with a \linear"
component
Data:
%
where
There is a link function h() that links
the conditional mean of Yj given Xj =
(Xij ; : : : ; Xpj )T to a linear function of unknown parameters, e.g.
(Y1; XT1 )
(Y2; XT2 )
..
(Yn; XTn )
response
variable
3. (Systematic part of the model)
h(E (Yj jXj )) = XTj = 1X1j + : : : + pXpj
Members of the exponential family of distributions available in
-
explantory
or regression,
variables
S-PLUS: glm() function
SAS: PROC GENMOD
XTj
= (X1j ; X2j ; : : : ; Xpj )
describe the conditions under which the response Yj was obtained.
1
3
Basic Features:
1. Y1; Y2; ; Yn are independent
2. Yj has a distribution in the exponential
family of distributions, j = 1; 2; ; n.
The joint likelihood has the form (canonical form):
where
Normal distribution: Y N (; 2)
(
2)
(
Y
)
1
f (Y ; ; ) = p 2 exp
22
2
8
>
<
2
"
>
;
" C (Y; )
L(; ; Y) = jn=1f (Yj ; j ; )
Here
Y b(j )
g
f (Yj ; j ) = exp f j j
a()
for some functions
9
#>
=
2
Y = exp > 2 2 1 Y 2 + log(22)
2 :
E (Y ) = V ar(Y ) = 2 = 2
and
b() = 2
a() = a(); b(); and c():
Here, j is called a canonical parameter.
2
4
Binomial Distribution: Y Bin(n; )
n
f (Y ; ; ) =
Y (1 )n Y
Y Y
(1 )n
= n
Y 1 )
(
= exp Y log + n log(1 )+ log n
1 Y
" C (Y; ).
f (Y ; ; ) = Y
= exp
= exp
n
Y
Y
1
exp( Y= )
()
+ ( 1) log(Y ) log( ) log( ())
1
+log
1
1
= log
" C (Y; )
) = 1 +e e
1 Here, =
1
= 1;
) = n log(1 + e ):
b() = n log(1
There is no dispersion parameter, so we will
take 1 and a() = 1.
b() = log( );
a() = 7
5
Poisson distribution: Y P oisson()
Y e
f (Y ; ; ) =
Y! n
o
= exp Y log() log(Y !)
" C (Y; ).
Here
and
o
+ ( 1) log(Y ) + 1 log( 1) log( ())
Here
and
Two parameter gamma distribution:Y gamma(; )
Inverse Gaussian Distribution: Y IG(; 2)
(
(Y )2
1
f (Y ; ; ) = p 2 3 exp
222Y
2 Y
for Y > 0; > 0; 2 > 0
also E (Y ) = ; V ar(Y ) = 23
8
>
<
Y ( 212 ) 1 1
+ 2
exp >
2
2 Y
:
= log() ) = exp()
b() = = exp()
and there is no additional dispersion parameter.
We will take 1 and a() = 1.
Then,
f (Y ; ; ) = expfY exp() log( (Y + 1))g
6
)
9
=
1 log(22Y 3)>
>
2
;
" C (Y; )
Here,
1
2
2p
b() = = 2
= 2
\dispersion parameter"
a() = = 2
=
8
Canonical form of the log-likelihood:
`(; ; Y ) = log(f (Y ; ; ))
= Y b() + C (Y; )
a()
paratial derivatives:
Normal distribution: Y N (; 2)
; Y ) = Y b0 ( )
(i) @`(;
@
a()
2
b() = 2
b 0 ( ) = 00
2
(ii) @ `(@;2 ;Y ) = ab(())
b00() = 1
a() = = 2
Apply the following results
!
@`(; ; Y )
=0
Ed
@
@`(; ; Y ) 2
@ 2`(; ; Y )
+
E
=0
@2
@
to obtain
!
0
@`(; ; Y
= E (Y ) b ()
0=E
@
a()
E
!
"
#
) E (Y ) = )
) Var(Y ) = b00()a()
= (1)
= = 2
" the variance does not depend on the parameter that
dene the mean
9
which implies
E (Y ) = b0()
Furthermore, from (i), (ii) and (iv) we have
!
!2
2
0 = E @ `(; 2 ; Y ) + E @`(; ; Y )
@
@
!
Y b0()
b00()
+ V ar
=
a()
a()
b00() V ar(Y )
+
=
a()
[a()]2
which implies
b00()
V (Y ) = a()
"
"
function of the dispersion called the
parameter
variance function
10
11
Binomial distribution: Y Bin(n; )
= log( 1 )
b() = (n)log(1 + e )
ne ) E (Y ) = ne = n
b0() = 1+
e
1+e
ne
b00() = [1+
e ]2
a() 1
9
=
;
Var(Y ) = [1+nee]2 (1)
= n 1+ee 1+1e
= n(1 )
" V ar(Y )
is completely determined by the parameter that
denes the mean.
12
Poisson Distribution: Y P oisson()
b() = exp()
b0() = exp() ) = E (Y ) = exp()
b00() = exp()
a() = 1
)
Var(Y ) = exp()
= E (Y )
= Gamma Distribution:
b() =
b0() =
b00() = 12
a() = )
log( )
1 ) E (Y ) = 1
Var(Y ) =
=
=
a()b00()
2
Binomial distribution:
= XT ne
n = E (Y ) = b0() =
1 + e
) = log 1 = XT Hence, the canonical (default) link is the logit
link function
= XT h() = log
1 Other available link functions:
Probit: h() = 1() = XT where
Z XT p1 e 2u 2du
= (XT ) =
1 2
[E (Y )]2
13
Canonical link function:
is the link function h() = XT that corresponds to
= XT @b()
= b0()
= E (Y ) =
@
15
Complimentary log-log link (clog log):
h() = log[ log(1 )] = XT The inverse of this link function is
= 1 exp[ exp(XT )]
Normal (Gaussian) distribution:
The glm( ) function
= XT = b0() = then E (Y ) = = = XT and the canonical link function is the identity
function
h() = = XT 14
glm(model, family, data, weights, control)
family = binomial
family = binomial(link=logit)
family = binomial(link=probit)
family = binomial(link=cloglog)
16
Poisson regression (log-linear models)
Inverse Gaussian Distribution
Y P oisson()
= XT = E (Y ) = b0() = exp()
and the canonical link function is
= log() = XT This is called the log link function. Other available link functions:
identity link: h() = = XT square root link: h() = p = XT glm(model, family, data, weights, control)
family = poisson
family = poisson(link=log)
family = poisson(link=identity)
family = poisson(link=sqrt)
= XT p
2
b() =
1
0
= E (Y ) = b () = p
2
The canonical link is
1
= h() = 2 = XT 2
This is the only built-in link function for the
inverse gaussian distribution.
glm(model, family, data, weights, controls)
family = inverse.guassian
family = inverse.guassian(link=1=2)
17
19
Normal-theory Gauss-Markov model:
Y N (X ; 2I )
Gamma distribution:
or Y1; Y2; : : : ; Yn are independent with Yj N (XTj ; 2). Then
XT =
1
E (Y ) = =
the canonical link function is
1 = XT h() =
This is called the \inverse" link.
8
9
< (Y
XT )2 =
f (Yj ; j 2) = p 2 exp : j 2 j
;
2
2
8
>
>
<
glm(model, family, data, weights, control)
family = gamma
family = gamma(link=inverse)
family = gamma(link=identity)
family = gamma(link=log)
(Y )(XT )
= exp > j j 2
>
:
9
3>
(XTj )2 2 2
>
2 1 4 Yj + log (2 2)5=
>
2 2
>
;
" C (Yj ; 2)
Here,
b() =
(XTj )2
2
= 2 and a() = E (Yj ) = j = XTj 18
20
Logistic Regression:
Y1; Y2; : : : ; Y8 are independent random
The link function is the identity function
h(j ) = j
= XTj The class of generalized linear models includes
the class of linear models:
(i) Systematic part of the model (identity link
function)
j = E (Yj ) = XTj = 1X1j + : : : + pXpj
counts with
YBin(nj ; j )
log
j 1 j = 0 + 1Xj
Comments:
* Here the logit link function is the canonical
link function
* Use of the binomial distribution follows
from
(ii) Stochastic part of the model
Yj N (j ; 2)
and Y1Y2 : : : Yn are independent.
(i) Each beetle exposed to log-dose Xj responds independently
(ii) Each beetle exposed to log-dose Xj has
probability j of dying
23
21
Example 13.1 Mortality of a certain species of
beetle after 5 hours exposure to gaseous carbon disulde (Bliss, 1935).
Number
Number
of
Number
Dose Log(Dose) killed survivors exposed
(mg/l)
(Xj )
(Yj ) (nj Yj )
nj
49.057
3.893
6
53
59
52.991
3.970
13
47
60
56.911
4.041
18
44
62
60.842
4.108
28
28
56
64.759
4.171
52
11
63
68.691
4.230
53
6
59
72.611
4.258
61
1
62
76.542
4.338
60
0
60
22
Maximum Likelihood Estimation: Joint likelihood for Y1; Y2; : : : ; Yk
L(; Y; X ) = jk=1f (Yj ; j ; )
)
(
Y b(j
+ C (Yj ; )
= jk=1 exp j j
a(phi)
n
k
= j =1 exp Yj (0 + 1Xj )
n o
nj log(1 + e0+1Xj ) + log j
Yj
Since = 1
a() = = 1
j = log 1 jj = 0 + 1Xj
b(j ) = nj log(1 + e0+1Xj )
C (Yj ; ) = log nYjj
24
log-likelihood function:
#
"
k
X
Yj j b(j )
+ C (Yj ; )
`( ) =
a()
j =1
=
k h
X
j =1
Yj (0 + 1Xj ) nj log(1 + e0+1Xj )
+log nj
Yj
= 0
k
X
j =1
k
X
Yj + 1
k
X
j =1
Yj Xj
k
X
n log j
Yj
j =1
j =1
To maximize the log-likelihood, solve the system of equations obtained by setting rst partial derivatives equal to zero.
nj log(1 + e0+1Xj ) +
The likelihood equations (or score function)
can be expressed as
h0i
0=
= X T (Y m) = Q
0
" score function where
2
3
2
3
2
3
1 X1
Y1
n11
6
7
6
7
6
7
X = 664 1.. X2 775 Y = 664 Y.. 2 775 m = 664 n.. 22 775
1 Xk
Yk
n88
and from the inverse link function
exp(0 + 1Xj
j =
1 + exp(0 + 1Xj
^m
We will use Q;
^ , and ^j to indicate when these
quantities are evaluated at = ^ .
27
25
Second partial derivatives of the log-likelihood
function:
3
2
@ 2`() @ 2`( )
2
6
@
@
0 1 7
7
0
H = 64 @@
2`( )
@ 2`( ) 5
Estimating Equations (likelihood equations):
8
8
X
X
^
exp(^ + ^1X1
nj
0 = @`(^) = Yj
^ ^
@ 0
j =1
j =1 1 + exp( + 1 X1)
=
8
X
j =1
(Yj nj ^j )
8
X
^
0 = @`(^) = Xj Yj
@ 0
j =1
8
X
j =1
Xj (Yj
=
8
X
j =1
nj Xj
@0@1
T
X VX
@12
- called
exp(^ + ^1X1
1 + exp(^ + ^1X1)
nj ^j )
where
Generally no closed form expression for the solution
If a solution exists in the interior of the parameter
space (0 < j < 1 for all j = 1; : : : ; k) then it is
unique.
26
V =
the Hessian
matrix
2
6
4
n11(1 1)
3
...
nk k (1 k )
7
5
= V ar(Y)
We will use H^ = X T V^ X and V^ to denote H
and V evaluated at = ^ .
28
Fisher scoring algorithm
Fisher Information matrix:
3
2
@ 2`()
@ 2`()
E
E
2
6
@0@1 7
7
@0 H = 64
2
2`( )
5
@
E @@`(2)
E @0@1
1
= E (X T V X )
= XT V X = H
Since the second partial derivatives do not depend on Y = (Y1; : : : ; Yk)T .
^
^ (S +1) = ^ (S ) + I^ 1Q
" for the
logistic regression
model with
binomial counts
I^ = H^
Modication:
- Check if the log-likelihood is larger at
^ (S +1) than at ^ (S ).
- If it is, go to the next iteration
- If not, invoke a halving step
S +1))
S +1) = 1 (
^ (S ) + ^ (old
^ (new
2
31
29
Iterative procedures for solving the likelihood
equations:
Start with some initial values
2
where
^(0)
^ (0) = 4 0(1)
^1
3
5
"
P
= log ( 1 P )
0
Large sample inference: (see results 9.1-9.3)
#
From result 9.2,
^ _ N (; (X T V 1X ) 1)
when n1; n2; : : : ; nk are all large
"
#
\large" is nj j > 5
nj (1 j ) > 5
Estimate the covariance matrix as
Vd
ar(^ ) = (X T V^ 1X ) 1
where V^ is V evaluated at ^
Pk
Y
P = Pkj =1 j
nj
j =1
Newton-Raphson algorithm
^
^ (S +1) = ^ (S ) + H^ 1Q
" these are
evaluated at ^ (S )
30
32
For the Bliss beetle data:
"
#
"
#
^
60:717
^ = ^0 = 14
:883
1
with
2
3
"
S2^
S^0;^1
d
0
4
5
^
V ar() = S
= 266::84
2
55
^0;^1 S^1
6:55
1:60
#
Approximate (1 ) 100% condence intervals
^0 Z=2S^0
^1 Z=2S^1
" from the standard
normal distribution
Z:025 = 1:96
33
Wald tests:
Likelihood ratio (deviance) tests:
Test H0 : 0 = 0 vs. HA : 0 6= 0
2
32
^ 0
X2 = 4 0 5 =
S^0
2
60:717 = 137:36
5:18
p-value = P rf2(1) > 197:36g < :0001
Test H0 : 1 = 0 vs. HA : 1 6= 0
2 14:883 2 = 138:49
^ 0
X2 = 4 1 5 =
S^1
1:265
2
35
3
p-value = P rf2(1) > 138:49g < :0001
34
Model A: log 1 jj = 0 and Yj Bin(nj ; j )
Model B: log 1 jj = 0 + 1Xj and Yj Bin(nj ; j )
Model C: log 1 jj = 0 + j and Yj Bin(nj ; j )
Model C simply says that 0 < j < 1; j =
1; : : : ; k
36
Null: Deviance residual
h
DevianceNull =
2 log-likelihoodModel A
i
-log-likelihoodModel C
= 2
k
h X
+
j =1
k
X
Yj log(Pj =P)
(nj Yj )log
j =1
with (k 1)d:f:
1 Pj i
1 P
where
Yj
j = 1; : : : ; k
nj
Pk
nj Pj
Yj
= Pj =1
P =
k
nj
j =1 nj
are the m.l.e.'s of j under models C and A,
respectively.
Pj =
37
LOOSE: DEV (Lack-of-t test)
j = + X (model B)
H0 : log
0
1 j
1 j
j = + (model C)
HA : log
0
j
1 j
G2 = deviancelack-of-t = devianceNULL
devianceLOOSE
2
k
P X
= 2 4 Yj log j
^j
j =1
3
k
1
X
Pj 5
+ (nj Yj )log
1 ^j
j =1
= 11:42
on
[(k 1) 1] = k 2d:f:
(= 8 2 = 6d:f:)
(p-value = 0:762)
39
LOOSE: deviance residuals
DevianceLOOSE =
h
2 log-likelihoodModel A
i
-log-likelihoodModel B
k
h X
^
= 2
Yj log j
P
j =1
k
1
X
^j i
+ (nj Yj )log
1 P
j =1
= 272:7803 with (2 1) = 1d:f:
where
exp(^0 + ^1Xj )
1 + exp(^0 + ^1Xj
P
exp(^0 = kj=1 Yj
P =
P
k nj
1 + exp(^0
j =1
^j =
38
Pearson chi-square test:
k (Y
X
^j )2
j nj X2 =
nj ^j
j =1
k (n
X
^j ))2
j Yj nj (1 +
nj (1 ^j )
j =1
k
k
X
(1 Pj [1 ^j ])2
(P ^ )2 X
nj
=
nj j j +
^j
1 ^j
j =1
j =1
= 10:223
with d.f.= k 2 = 6
p-value = .1155
40
Estimates of expected counts under Model B.
j nj ^j nj (1 ^j )
1 3.43
55.57
2 9.78
50.22
3 22.66
39.34
4 33.83
22.17 Cochran's rule:
5 50.05
12.95
6 53.27
5.73
6 59.22
2.78
8 58.74
1.26
All expected counts should be larger than
one.
Dene
Gj = @@^^j @@^^j = [^j (1 ^j )Xj ^j (1 ^j )]
0
1
2
Then, S^j = Gj [Vd
ar(^ )]GTj and an approximate standard error is
r
ar(^ )]GTj
S^j = Gj [Vd
An approximate (1 ) 100% condence interval for j is
^j Z=2S^j
At least 80% should be larger than 5.
43
41
Estimate mortality rates:
exp(^0 + ^1Xj )
^j =
1 + exp(^0 + ^1Xj
Use the delta method to compute an approximate standard error
exp(^0 + ^1Xj
@ ^j
=
@ ^0
[1 + exp(^0 + ^1Xj )]2
= ^j (1 ^j )
Xj exp(^0 + ^1Xj
@ ^j
=
^
@ 0
[1 + exp(^0 + ^1Xj )]2
= Xj ^j (1 ^j )
42
Estimate the log-dose that provides a certain
mortality rate, say 0.
0 = + X )
log
0
1 0
1 0
log 1 00 0
X0 =
1
The m.l.e. for X0 is
log 1 00
^0
X^0 =
^1
44
The m.l.e. for the does is
Use the delta method to obtain an approximate
standard error:
@ X^0
= ^1
@ ^0
h 1 i
log 1 00 ^0
@ X^0
=
@ ^1
^12
Dene G^ = @@X^^00 @@X^^10 and compute
h
i
SX2^ = G^ Vd
ar(^ ) G^T
0
d
^0)
dose
0 = exp(X
Apply the delta method
@ exp(X^0)
= exp(X^0)
@ X^0
Then
Sdose
= [exp(X^0)]2Vd
ar(X^0)
d
0
d )S
Sdose
= [exp(X^0)]SX^0 = (dose
0 X^0
d
and an approximate (1 ) 100% condence
interval for dose0 is
exp(X^0) Z=2[exp(X^0)]SX^0
45
The standard error is
r h
i
ar(^ ) G^T
SX^0 = G^ Vd
47
Fit a probit model to the Bliss beetle data:
and an approximate (1 ) 100% condence
interval for X0 is
X^0 Z=2SX^0
Suppose you want a 95% condence interval
for the dose that provides a mortality rate of
0 .
dose0 = exp(X0)
46
where
1(j ) = 0 + 1Xj
Xj = log(dose)
Yj = Bin(nj ; j )
j = 1; 2; : : : ; k
Note that
j =
Z
0+1Xj
1
p12 e
1 2
2 t dt
48
Fit the complimentary log-log model to the
Bliss beetle data
where
log[ log(1 j )] = 0 + 1Xj
Note that
Xj = log(dose)
Yj = Bin(nj ; j )
j = 1 exp[ e0+1Xj ] j = 1; 2; : : : ; k
Independent Poisson counts:
Yjk P oisson(mj ) j = 1; : : : ; 5 k = 1; : : : ; 3
where
my
P rfYjk = yg = j e mj
y!
for y = 0; 1; 2; : : :
E (Yjk = mj
V ar(Yjk ) = mj
Log-link function:
log(mj = 0 + 1Xj
51
49
Poisson Regression (Log-linear models)[.2in]
Example 13.2:
Yjk = number of TA98 salmonella colonies on
the k-th plate with the j-th dosage
level of quinoline
Xj = log10(dose of quinoline(g))
X1 = 0
X2 = 1:0
X3 = 1:5
X4 = 2:0
X5 = 2:5
and k = 1; 2; 3 plates at each ??
50
Joint likelihood function
mYj jk mj
5
3
e
L() = j =1k=1
Yjk
Log-likelihood function
`() =
=
5 X
3
X
j =1 k=1
5 X
3
X
[Yjk log(mj ) mj log(Yjk !)]
[Yjk [0 + 1Xj ] e0+1Xj
j =1 k=1
log(Yjk !)]
= 0Y00 + 1
5 X
3
X
j =1 k=1
5
X
j =1
Xj Yj 0 3
5
X
j =1
e0+Xj
log(Yjk !)
52
Likelihood equations:
5
X
0 = @`() = Y:: 3 e0+1Xj
@0
j =1
5
5
X
X
Xj Yj: 3
Xj e0+1Xj
0 = @`() =
@1
j =1
j =1
Use the Fisher scoring algorithm:
2
3
2
3
Y11
1 X1
6
7
6
Y2
1 X 7
Y = 664 . 1 775 X = 664 . . 1 775
.
. .
Y
1 X5
53
3
2
@`()
T
0 7
Q = 64 @`@
() 5 = X (Y m)
where
@1
2
Fisher information matrix
3
2
H =
=
=
where
@ 2`(
@ 2`( )
2
6
@
@ 7
@
E 64 @ 2`0() @02`(1 ) 75
@0@1
@12
" P P
P P
mj
Xj mj
P j Pk
Pj Pk 2
X
m
j
j
j k
j k Xj mj
T
X m X
2
m1
6
V ar(Y) = m =
3
m1
6
7
6 m1 7
m = 64 . 75 = exp(X )
.
m5
#
6
6
6
6
6
6
6
4
3
m1
m1
m2
53
...
m5
7
7
7
7
7
7
7
7
5
55
Iterations:
(S +1) = (S ) + H 1Q
Second partial derivatives of `()
@ 2 `( )
=
@02
@ 2 `( )
=
@0@1
@ 2 `( )
=
@12
3
3
3
5
X
j =1
5
X
j =1
5
X
j =1
e0+1Xj =
XX
e0+1Xj
XX
=
j k
Xj2e0+1Xj =
j k
Use halving steps as needed
mj
Xj mj
XX
j k
Xj2mj
54
Starting values: Use the m.l.e.s for the
model with 1 = 0, i.e.,
Yij P oisson(m)
are i.e.d. with m = e0 . Then,
m
^ (0) = Y:: and
^0(0) = log(Y::)
^1(0) = 0
56
Dene
Large sample normal approximation for the distribution of ^
^ N (; H 1)
% estimate this by
substituting ^ for in
H to obtain ^
i
h
^ Xm
^ ]
G^ = @@m^^0 @@m^^1 = [m
Then the estimated variance of
is
m
^ = e^0+^1X
2 = G
^H^ 1G^T
Sm
^
= G^[X T m^ X ] 1GT
q
2 and an apThe standard error is Sm^ = Sm
^
proximate (1 ) 100% condence interval
is
m
^ Z=2Sm^ 59
57
Estimate the mean number of colonies for a
specic log10(dose) of quinoline, say X
The m.l.e. is
Four kinds of residuals:
Response residuals:
Yi mu
^ i=
m
^ = e^0+^1X
Apply the delta method:
@m
^ = e^0+^1X = m
^
@ ^0
@m
^ = X e^0+^1X = X m
^
@ ^1
"
Yi
ni
^i
#
Pearson residuals:
Yi ^i
q
(binomial)
= q Yi ni^i
ni^i(1 ^i)
Vd
ar(Yi)
= Yip ^i (Poisson)
^i
58
60
Working residuals:
Yi ^i
= Yi n^i (binomial)
@i
ni^i(1 ^i)
@i
= Yi ^i (Poisson)
^i
For the binomial family with logit link
!
!
nii
i
= log
= XTi i = log
1 i
ni(1 i)
exp(XTi ) = n exp(i)
i = nii = ni
i 1 + exp( )
1 + exp(XTi i
exp(
i )
@i
= ni
= nii(1 i)
@i
[1 + exp(i)]2
Poisson family
G2 = (deviance)
= 2
where
then
q
binomial family
and
k
X
i=1
"
Yilog
!
Yi
+ (ni Yi)log ni Yi
ni^i
ni(1 ^i)
!#
ei = v
sign(Yi n^i)
!#
!
u "
u
ni Yi
Yi
t
2 Yilog n ^ + (ni Yi)log n (1 ^ ) i i
i
i
62
Yi
^i
!
^i = exp(^0 + ^1X1i+??)
log(i) = 0 + 1X1i + : : :
v
u
u
t
Yi
^i
!
63
ei = sign(Yi ^i) jdij
where di is the contribution of the i-th case to
the deviance
G2 = 2
i=1
Yilog
ei = sign(Yi ^i) 2Yilog
61
Deviance residual:
k
X
Download