Generalized linear models 1/16

advertisement
Generalized linear models
1/16
Linear regression model
Consider a Gauss-Markov model
Yi ∼ N(µi , σ 2 ) where µi = XiT β.
The pdf of the response Yi is
o
1
2
(Y
−
µ
)
i
i
σ2
2πσ
n y µ − µ2 /2
o
√
yi2
i i
i
= exp
−
−
log(
2πσ)
.
σ2
2σ 2
f (Yi ; µi , σ 2 ) = √
1
exp
n
−
2/16
Linear regression model
Define θi = µi , b(θi ) = µ2i /2, a(φ) = φ, φ = σ 2 and
√
h(yi ; φ) = yi2 /(2σ 2 ) + log( 2πσ).
Then the pdf of a normally distributed response can be written
as
f (yi ; θi , φ) = exp
n y θ − b(θ )
o
i i
i
− h(yi , φ) .
a(φ)
3/16
Logistic regression model
Consider a logistic regression model
Yi ∼ Bernoulli(pi ) and pi =
exp(XiT β)
.
1 + exp(XiT β)
The pmf of the response is
n
pi
f (yi ; pi ) = exp yi log(
) + log(1 − pi )}
1 − pi
n
o
= exp yi θi − log(1 + eθi )
o
n y θ − b(θ )
i
= exp i i
− h(yi , φ) ,
a(φ)
where θi = log{pi /(1 − pi )}, pi = eθi /(1 + eθi ), a(φ) = 1 and
h(yi , φ) = 0.
4/16
Poisson regression model
Consider a Poisson regression model
Yi ∼ Poisson(µi ) and µi = exp(XiT β).
The pmf of the response is
y
f (yi ; pi ) = exp(−µi )µi i /yi !
n
o
= exp − µi + yi log(µi ) − log(yi !)
n y θ − b(θ )
o
i
= exp i i
− h(yi , φ) ,
a(φ)
where θi = log(µi ), b(θi ) = eθi , a(φ) = 1 and h(yi , φ) = log(yi !).
5/16
Exponential dispersion family
I
Assume the response Y has pdf or pmf
f (y; θ, φ) = exp
n yθ − b(θ)
a(φ)
o
− h(y , φ) ,
for some functions a(·), b(·), h(·, ·). The parameter θ is
called “canonical parameter” and φ is a dispersion
parameter.
I
The above family is called exponential dispersion family.
I
This family includes many distributions as special cases:
normal, Poisson, Binomial, Gamma,· · ·
6/16
Statistical properties
The likelihood theory suggests that
E
n∂
o
log f (Y ; θ, φ)|θ0 ,φ0 = 0
∂θ
and
Var
n∂
o
n ∂2
o
log f (Y ; θ, φ)|θ0 ,φ0 = −E
,
log
f
(Y
;
θ,
φ)|
θ
,φ
0 0
∂θ
∂θ2
where θ0 , φ0 are true values of θ and φ.
7/16
Mean and variance
The statistical property of the exponential dispersion family
implies that
µ = E(Y ) =
∂b(θ)
∂θ
and
Var
n Y − b0 (θ) o
= b00 (θ)/a(φ).
a(φ)
Combining the above two results together, we obtain
Var(Y ) = b00 (θ)a(φ).
8/16
Mean and variance relationship
Because µ = b0 (θ), we obtain θ = b0−1 (µ). As a result, the
variance of Y depends on µ in the following way
Var(Y ) = a(φ)V (µ),
where V (µ) = b00 {b0−1 (µ)}.
9/16
Link function
I
The heart of a generalized linear model is to model some
function of µ as a linear function of predictors X . That is for
some link function g(·) and some p-dim coefficient vector
β, we assume that
g(µ) = X T β.
I
A canonical link is to set θ = b0−1 (µ) = X T β.
I
Therefore, a canonical link function is g(·) = b0−1 (·).
10/16
Canonial link functions
I
In the Gauss-Markov linear model, µ = θ = E(Y ). Then
b0 (θ) = θ and b0−1 (µ) = µ. The canonical link function is
g(µ) = µ. Therefore, µ = X T β.
I
In the Poisson regression model, µ = b0 (θ) = exp(θ). Then
b0−1 (µ) = log(µ). The canonical link function is
g(µ) = log(µ). Therefore, log(µ) = X T β.
I
In Bernoulli response case,
µ = b0 (θ) = exp(θ)/(1 + exp(θ)). Then
b0−1 (µ) = log{µ/(1 − µ)}. The canonical link function is
g(µ) = log{µ/(1 − µ)}. Therefore, log{µ/(1 − µ)} = X T β.
11/16
Maximum likelihood
I
Assume that Y1 , · · · , Yn are independent sample from
f (y; θ, φ) and X1 , · · · , Xn are the corresponding covariates.
The log-likelihood for β and φ is
`(β, φ) =
n n
o
X
Yi θi − b(θi )
− h(Yi , φ)
a(φ)
i=1
where θi = b0−1 (µi ) = b0−1 {g −1 (XiT β)}.
I
The maximum likelihood estimator of β is
P
β̂ = arg maxβ ni=1 {Yi θi − b(θi )}.
12/16
Score function
I
The score function of β is
n
1
1
∂`(β, φ) X
Xi .
=
{Yi −b0 (θi )}
∂β
a(φ)V {g −1 (XiT β)} g 0 {g −1 (XiT β)}
i=1
I
The MLE of β is a solution of ∂`(β, φ)/∂β = 0.
I
Note that the above score function does not depend on the
dispersion parameter φ. The estimation of φ can be
obtained seperately.
13/16
Score function in a matrix form
Let X = (X1 , · · · , Xn )T be an n × p matrix, Y = (Y1 , · · · , Yn )T
and µ = (µ1 , · · · , µn )T .
Define the following n × n matrices,
W −1 = diag{V (µ1 ){g 0 (µ1 )}2 , · · · , V (µn ){g 0 (µn )}2 }
and
∆ = diag{g 0 (µ1 ), · · · , g 0 (µn )}.
The score function can be written as
∂`(β, φ)
= {a(φ)}−1 X T W ∆(Y − µ).
∂β
14/16
Fisher’s information matrix
I
The Fisher’s information matrix for β is
∂`(β, φ)
∂µ
∂W ∆
= −{a(φ)}−1 X T W ∆ T +{a(φ)}−1 X T
(Y −µ).
T
∂β∂β
∂β
∂β T
I
The Fishers’s information matrix is
In = {a(φ)}−1 X T WX .
15/16
Asymptotic normality of MLE β̂
Using the large sample theory of likelihood method, we have
β̂ − β ∼ N(0, a(φ)(X T WX )−1 ).
The wald type inference can be performed based on the above
asymptotic normality.
16/16
Download