Generalized linear models 1/15

advertisement
Generalized linear models
1/15
Generalized linear models
A generalized linear model contains the following three parts:
1. The random component: the response Yi belongs to an
exponential dispersion family with mean
E(Yi ) = µi = b0 (θi ) and Var(Yi ) = a(φ)V (µi ).
P
2. The systematic component: ηi = pj=1 Xij βj .
3. Link function (between random and systematic
components): g(µi ) = ηi .
2/15
Choice of models
I
Null model: ηi = β1 (a constant, one parameter) for
i = 1, · · · , n. This model consigns all the variation in Yi s to
the random component.
I
Saturated model: ηi = g(Yi ) (namely µi = Yi , n
parameters). This model consigns all the variation in Yi s to
the systematic component but leaving none for the random
component.
I
Null model is usually too simple and saturated model is
uninformative. A statistical model describes how we
partition the total variation into systematic structure and
random component.
3/15
Deviance in linear models
Consider the following linear models:
Yi = µi + εi , µi = XiT β and εi ∼ N(0, σ 2 ).
The log-likelihood function for the above model is
`(µi ) = −n log(2πσ 2 )/2 −
n
X
(Yi − µi )2 /(2σ 2 ).
i=1
4/15
Deviance in linear models
I
The log-likelihood for the full model (FM) with µi = XiT β is
`(FM) = −n log(2πσ 2 )/2 −
n
X
(Yi − XiT β̂)2 /(2σ 2 ).
i=1
where β̂ is the MLE of β.
I
The log-likelihood for the saturated model (SM) with
µi = Yi is
`(SM) = −n log(2πσ 2 )/2
5/15
Deviance in linear models
The deviance for linear models is
D = φ · 2{`(SM) − `(FM)} = σ 2 · 2{`(SM) − `(FM)}
= σ2 ·
n
2 X
(Yi − XiT β̂)2
2σ 2
i=1
=
n
X
(Yi − XiT β̂)2 .
i=1
Therefore, deviance in linear models is the same as SSE of the
full model.
6/15
Log-likelihood for generalized linear models
I
Let `(µ, φ) be the log-likelihood of the generalized linear
models in terms of mean parameters µ rather than the
canonical parameter θ.
I
For a saturated model, we estimate µ by Y , therefore,
`(Y , φ) is the log-likelihood for the saturated model.
I
For a full model with p parameters, ηi = XiT β, the
log-likelihood for the full model maximized over β for a
fixed φ is `(µ̂, φ) where µ̂ = X β̂ and β̂ is the MLE.
7/15
Log-likelihood ratio
I
Recall that the log-likelihood function for generalized linear
model is
`(µ, φ) =
n n
X
yi θi (µi ) − b{θi (µi )}
a(φ)
i=1
I
o
− h(yi , φ) .
For a saturated model, the log-likelihood `(Y , φ) is
n n
o
X
yi θ̃i − b(θ̃i )
− h(yi , φ) ,
`(Y , φ) =
a(φ)
i=1
where θ̃i = θi (Yi ).
8/15
Log-likelihood ratio
I
For a full model with p parameters, ηi = XiT β, the
log-likelihood is
`(θ̂, φ) =
n n
o
X
yi θ̂i − b(θ̂i )
− h(yi , φ) ,
a(φ)
i=1
where θ̂i = θi (µ̂i ) and µ̂i = g −1 (XiT β̂).
I
The -2 times the log-likelihood ratio is
n n
X
yi (θ̃i − θ̂i ) b(θ̃i ) − b(θ̂i ) o
2{`(Y , φ) − `(θ̂, φ)} = 2
−
.
a(φ)
a(φ)
i=1
9/15
Deviance for generalized linear models
Assume that the dispersion parameter function of the form
a(φ) = φ/w for some weight function w. The deviance for the
generalized linear model is
D(β) = φ · 2{`(Y , φ) − `(θ̂, φ)}
n n
o
X
= 2w
yi (θ̃i − θ̂i ) − b(θ̃i ) + b(θ̂i ) .
i=1
The scaled deviance is D(β)/φ.
10/15
Example: Gauss-Markov linear models
For linear models, we know that
θ̃i = yi , θ̂i = µ̂i , b(θi ) = θi2 /2, a(φ) = σ 2 , φ = σ 2 , and w = 1.
So the deviance is
D(β) = 2
n n
o
X
yi (yi − µ̂i ) − yi2 /2 + µ̂2i /2
i=1
n
X
=
(yi2 − 2µ̂i yi + µ̂2i )
i=1
n
X
=
(yi − µ̂i )2 .
i=1
11/15
Example: Logistic regression models
For logistic regression models, we know that
θ̃i = log(
µ̂i
yi
), θ̂i = log(
),
1 − yi
1 − µ̂i
b(θi ) = log(1 + eθi ), a(φ) = 1, φ = 1, and w = 1.
So the deviance is
D(β) = 2
=2
n n
X
y
1 − yi
1 − yi o
yi log( i ) − yi log(
) + log(
)
µ̂i
1 − µ̂i
1 − µ̂i
i=1
n n
X
yi log(
i=1
yi
1 − yi o
) + (1 − yi ) log(
)
µ̂i
1 − µ̂i
where µ̂i = p̂i .
12/15
Goodness-of-fit test
I
Goodness-of-fit test aims to examine if the current full
model fits the data well. It is a model diagnostic tool.
I
The scaled deviance has an asymptotic χ2n−p distribution if
the current model is good enough.
I
But the test can not be applied if the dispersion parameter
is unknown. For example, in the linear regression model,
φ = σ 2 is unknown. An F-test may be used if the dispersion
parameter is estimated using Pearson chi-square statistics.
13/15
Inference for part of parameters
For example, if we want to test for H0 : βl+1 = · · · = βk = 0 for
k > l. This can be done by comparing the following two nested
models
Model 1: x1 , · · · , xl with deviance D1 ;
Model 2: x1 , · · · , xl , xl+1 , · · · , xk with deviance D2 .
Under H0 , we know (D1 − D2 )/φ ∼ χ2k −l if φ is known.
If φ is unknown,
(D1 −D2 )/(k−l)
φ̂
φ̂ =
∼ Fk−l,n−p under H0 where
n
1 X (Yi − µ̂i )2
.
n−p
V (µ̂i )
i=1
14/15
Residuals
I
The deviance residual is
n
o1/2
d
di = sign(ei ) 2(yi (θ̃i − θ̂i ) − b(θ̃i ) + b(θ̂i ))/a(φ)
.
where ei = Yi − µ̂i .
I
The Pearson’s residual is
Y − µ̂i
ri = pi
.
V (µ̂i )
15/15
Download