Generalized Linear Models stat 557 Heike Hofmann

advertisement
Generalized Linear
Models
stat 557
Heike Hofmann
Outline
• GLM for Binomial Response
• Model-fitting: Newton-Raphson / Fisher
Scoring
• Deviance
• Residuals
� �
Binomial Distribution
n
P (X = x) =
π x (1 − π)n−x
x
� �
n x
θ
π) = −πlog(1
)
P (Xlog(1
= x)−=
(1 −+π)en−x
x
�
�
2
1
(x − µ)
f (x; µ, σ) = √
exp π −
=
2
P (X = x; θ2πσ
= 2log
, φ2σ
= 1) =
1−π
�
� �� 2
2
2
2
(−x + 2µx − µ )/(2σ )θ− 1/2 log(2πσ
)
n
= exp xθ�− n log(1 + e ) + log
�
2
x
θx − (θ/2)
f (x; θ, φ) = exp
−�x2 /φ − 1/2�
log(φπ)
φ
1
(x − µ)2
f (x; µ, σ) = √ θxexp
−
=
2
2 − b(θ)
2σ φ))
2πσ
f (x; θ, φ) = exp(
+ c(x,
a(φ)
2
2
(−x + 2µx − µ )/(2σ 2 ) − 1/2 log(2πσ 2 )
� �
Binomial Response
Binomial Response
Binomial GLM
Binomial Response
Binomial GLM
Binomial GLM
l GLM
ni Yi ∼ Bn ,π
i
Some probability link functions
i
Thenni Yi ∼ Bni ,πi
�
,πi
Then E [Yi ] = πi , Var [Yi ] = πi (1 − πi )/ni and ηi =
x�
ij βj
E [Yi ] = πi , Var [Yi ] = πi (1 −�
πi )/ni and ηi j =
xi
E [Yi ] = πi , Var [Yi ] = πi (1 − πi )/ni and ηi =
xij βj
j
For logit link:
�j
For function
logit link:g
logit(πi ) = ηi =
x�
link
ij βj .
nk:
�
logit(πi ) = ηij =
xij βj .
π
π
i
i
logit(π
xijlogit
βj . link (canonical link)
= log i ) = ηi =
g(πig(π
) =i )log
j
Then
1 − 1πi− πi
j
Then
x
∂η
∂η
1
1
i
i
λ
link
g(πi ) = πi = −λ = 1 + identity
=
P (X = x;∂µ
λ) =∂η
e ∂π
∂η
1
1
1
i
i
π
1
−
π
π
(1
−
π
)
i1
ix!
i=
i=
i
∂ηi
∂ηi
1
1 i
x =
+
λ
−1
−λ
=x;i )λ)==
=
+i )i
= πprobit
∂µ
∂π
1 −link
πi
πi (1 − πi )
g(π
Φ
(π
i
i
P
(X
=
e
And∂µthe
likelihood
equations
are
�
�
∂πi
πi x!1 − πi
πi (1 − πi )
i
θ are
And
the
likelihood
equations
; θ = log
λ,
φ
=
1)
=
exp
xθ
−
e
− log(x!)
�
� (θ )
�
�
∂Lg(π
y
−
b
(θ
)
∂µ
y
−
b
kelihood
equations
are
i�
i − πi )) i log-log
i
i
log(1
�
link
i ) = log(−
π
i
=
x
=
xij )πi (1 − πi
�
� (θ
�
�
ij )
�
�
θ
∂L
y
−
b
(θ
∂µ
y
−
b
g(π
)
=
log
i −)�
i log(x!)
�i
�
∂βφ =i 1)� = expVar
π (1 − πi )/n i
= log
λ,
xθ(y
e −∂η
transformation
1.0
0.8
0.6
0.4
logit
probit
cloglog
cauchit
0.2
0.0
-4
-2
0
logit(p)
2
4
µi = E[Yi ]
� �
�−1 !
∂L � yi − E[Yi ]
=
=0
1 xij g (πi )
�
g∂β
(πi ) = − V ar(Yi )
i (1 − πi ) log(1 − πi )
µi = E[Yi ]
g � (πi ) =
1
�
g (πi ) = −
�
�(1�− πi�)−1
∂L
yi − E[Yi ]
log(1
! − πi )
=
xij g (π
=0
1 i)
�
∂β
V ar(Y
)) =
g
(π
i
i
i
� i − E[Y
� �
�−1 !
πgi y(1
π=i ) i ]
∂L �
(π
)
i
=
xij g (πi )
=
µ∂β
� i ] V ar(Yi )
i = gE[Y
(πii ) = 1
identity link:
1
�
µ
=
E[Y
]
g
(π
)
=
i
i
i
logit link
ππii(1
− πi )
g(πi )g(π
= log(−
log(1
−
))
log(1 − πi )) 1
i ) = log(−
�
complementary log-log g (πig) �=
(πi−) (1
=−
1 π ) log(1 − π )
i
i
−1
g(πi ) = Φ (π−1
probit
g(πi ) = Φi ) (πi )�
g (πi )−=πi ))
g(πi ) = log(− log(1
πi
πi
g(πi ) = log
g(πi ) =1log
− �πi
1
1
−
π
−1
) =i (π )
g(π g) (π
=Φ
Likelihood Equations
•
•
•
•
i
logit(πi ) = ηi =
j
For logit link:
Then
�
logit(πi ) = ηi =
xij βj .
∂ηi
∂ηi
1
1
1
j
=
=
+
=
∂µi
∂πi
πi
1 − πi
πi (1 − πi )
Then
And the likelihood
∂ηiequations
∂ηi are
1
1
1
= �
=
+
=
�∂µ
− πi yi −
πib(1� (θ−i )π
∂L
yii − b∂π
(θi ) π
∂µ
i i 1�
xij link:=
likelihood=equations for logit
And∂βthe likelihoodVar
equations
are
(yi )
∂ηi
πi (1 − πi )/ni
i
i
� yi − b � (θi ) !∂µi
� yi − b � (θi )
�
∂L
==
ni (yi − πi )xijxij= 0 =
∂β
Var (yi )
∂ηi
πi (1 − πi )/n
i
ii
Stat 557 ( Fall 2008)
Intro to GLMs
�
!
�
=
ni (yi − b (θi ))xij = 0
Binomial GLM
•
xij βj .
i
Stat 557 ( Fall 2008)
Intro to GLMs
Choice of Link
Function?
• range of E[Y]?
• canonical link has nicer mathematical
properties (cancels Var(Yi) from
denominator)
Binomial GLM
X=0 X=1
• For 2 x 2 table:
Y=0
πx = P(Y=1 | X=x)
π00
• GLM: g(π ) = α + β x
• β is effect in X, β = g(π ) - g(π )
• link g is
• identity => β is difference of proportions
• log link => β is log relative risk
• logit link => β is log odds ratio
Y=1
x
1
0
π10
π01
π11
Moth Data
Frozen dead moths of two colors are placed on
trees at locations of increasing distance from
Liverpool, England. This species of moth rests
during the day on tree trunks and is active at
night. Trees near Liverpool are darkened by
smoke to a greater extent than those farther
away in the Welsh countryside.
At each location, the number of moths of each
color that are placed and removed 24 hours later
are recorded. One might expect that lighter
moths are more likely to be removed near
Liverpool and that darker moths are more likely
to be removed farther away, as the color of the
trees provides more camouflage when the color
of the moth is closer.
Moth Data
> head(moth)
Xmorph distance placed removed
location
1 light
0.0
56
17
Sefton Park
2
dark
0.0
56
14
Sefton Park
3 light
7.2
80
28 Eastham Ferry
4
dark
7.2
80
20 Eastham Ferry
5 light
24.1
52
18
Hawarden
6
dark
24.1
52
22
Hawarden
0.4
removed/placed
qplot(distance, removed/placed,
data=moth, size=I(3),
colour=Xmorph) + geom_smooth
(method="lm")
0.5
Xmorph
dark
light
0.3
0.2
0
10
20
distance
30
40
50
glm(formula = cbind(removed, placed) ~ Xmorph * distance, family = binomial(link =
"identity"),
data = moth)
Deviance Residuals:
Min
1Q
-1.76275 -0.28737
Median
0.02419
3Q
0.50867
Max
0.90052
Moth Data
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
0.1954860 0.0318781
6.132 8.66e-10 ***
Xmorphlight
0.0502263 0.0455106
1.104
0.2698
distance
0.0023054 0.0009614
2.398
0.0165 *
Xmorphlight:distance -0.0034066 0.0013472 -2.529
0.0114 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 18.3994
Residual deviance: 7.1539
AIC: 79.414
on 13
on 10
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
glm(formula = cbind(removed, placed) ~ Xmorph * distance, family = binomial(link = "logit"),
data = moth)
Deviance Residuals:
Min
1Q
Median
-1.7494 -0.3058
0.0076
3Q
0.4970
Max
0.9412
Moth Data
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-1.396032
0.189529 -7.366 1.76e-13
Xmorphlight
0.282320
0.261476
1.080
0.2803
distance
0.012127
0.005296
2.290
0.0220
Xmorphlight:distance -0.018771
0.007650 -2.454
0.0141
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 18.3994
Residual deviance: 7.1739
AIC: 79.434
on 13
on 10
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
***
*
*
’ 1
m, and repeat (ii) until the new and the old value
Model Fitting: Newton Raphson
is maximum, i.e.
Objective: find value for β, such that log-likelihood is
maximized:
argmaxβ L(β) = β̂
General Idea:
matrix1.Hguess
of aL:
value for a solution,
2. approximate the function of interest in the current value
∂
∂
by a (2nd degree) polynomial,
�
(
L, ...,
L)
3. find the maximum∂β
of the
∂βp
1 approximation,
� value2by the maximum,
�
4. replace the current
∂
5. repeat steps
H(β)
= 2-4 until the new and
L the old value are
sufficiently close. ∂βi ∂βj
1≤i,j≤p
u(β)
=
Newton Raphson - properties
•
•
•
Generally well-behaved, but starting values crucial
but problematic in case of
•
•
•
multi-modality, or
very flat functions (convergence problematic)
2nd order convergence
Alternative: Fisher Scoring: substitute hessian matrix
by expected Fisher information (same for canonical
link, better: ‘closer’ to data, simpler to compute in
many situations)
for poisson family taken to be 1)
2.79
7.71
on 172
on 171
Deviance
degrees of freedom
degrees of freedom
• Let L(µ; y) denote log-likelihood expressed
in means µ22
= (µ1, µ2, ..., µn)
ing iterations:
• L(µ;ˆ y) is maximized log-likelihood
• L(y; y) is maximal achievable log-likelihood;
called saturated model
or y = (y1 , ..., yN ) let L(µ; y) denote the log likeliho
achievable log likelihood is L(y; y) (fully saturated m
• Deviance of model M:
D := −2 (L(µ̂; y) − L(y; y))
tic, called the deviance.
stic, called the deviance.
GLM deviance is asymptotically χ2 with df = N −p, where p
Deviance
del checking and inferential comparisons of model.
model, with M1 being nested within M2 , i.e. M1 is a simp
for1 .Poisson and Binomial, Deviance is
eter of the M
2 distributed with n-p
asymptotically
χ
elihood of M1 will then be smaller than the maximum of t
degrees of freedom, if model has p
L(µˆ1 ; y) ≤ L(µˆ2 ; y),
parameters.
•
•
Model
comparisons: for model M1 nested
; µˆ2 ) for the
deviances.
within model M2, the deviance difference is
he deviances
given as
D(y; µˆ1 ) − D(y; µˆ2 ) = −2 (L(µˆ1 ; y) − L(µˆ2 ; y))
ymptotical χ2 distribution, where the degrees of freedom i
Null Deviance
•
—y) is the likelihood of the ‘null’ model,
L(y;
i.e. the model consisting of the intercept
only. It is nested within all more complex
models:
—y)
ˆ y) - L(y;
null deviance Do = L(µ;
•
has χ2 distribution with p-1 degrees of
freedom.
Residuals
Residuals
duals
uals
Deviance
eviance
Pearson
earson
Residuals
• Deviance Residuals
�
� di sign(yi − µ̂i )
di sign(yi − µ̂i )
• Pearson Residuals
y − µ̂
e =�
y − µ̂
i
i
i
i
i
ei = � Var (yi )
Var (yi )
sets underestimate variance
ets underestimate variance
Both sets of residuals under-estimate the variance
Next time:
Poisson Model
x
x
λ
−λλ
−λ
P
(X
=
x;
λ)
=
e
P (X = x; λ) = e x!
x!
�
��
�
θθ − log(x!)
P
(X
=
x;
θ
=
log
λ,
φ
=
1)
=
exp
xθ
−
e
P (X = x; θ = log λ, φ = 1) = exp xθ − e − log(x!)
� �
�
�
n x
n
n−x
x (1 − π)n−x
P
(X
=
x)
=
π
P (X = x) = x π (1 − π)
x
θθ )
log(1
−
π)
=
−
log(1
+
e
log(1 − π) = − log(1 + e )
(X =
= x;
x;θθ =
= log
log
PP(X
ππ
= 1)
1) =
=
,,φφ =
Download