2. Fractional Imputation 1 Introduction

advertisement
2. Fractional Imputation
1
Introduction
• Consider the setup of Example 2, random effect model,
yij = x0ij β + ai + eij ,
i = 1, · · · , n1 , j = 1, · · · , n2 ,
where ai ∼ N (0, σa2 ) and eij ∼ N (0, σe2 ).
• Let yi = (yi1 , · · · , yin2 )0 be observed, but ai is never observed.
• The joint density of (yi , ai ) is
f (yi , ai ; θ) = f1 (yi | ai ; β, σe2 )f2 (ai ; σa2 )
where
(
1 X
f1 (yi | ai ; β, σe2 ) = (2πσe2 )−n2 /2 exp − 2
(yij − x0ij β − ai )2
2σe j
1 2
2
2 −1/2
f2 (ai ; σa ) = (2πσa )
exp − 2 ai .
2σa
• The score function for θ1 = (β, σe2 ):
n1
X
∂
S1 (θ1 ) =
log f1 (yi | ai ; θ1 )
∂θ1
i=1
−2 P P
0
σe
i Pj (yij − xij β − ai )xij
P
=
.
σe−4 i j {(yij − x0ij β − ai )2 − σe2 }
• The score function for θ2 = σa2 :
n1
X
∂
log f2 (ai ; θ2 )
S2 (θ2 ) =
∂θ2
i=1
X
= σa−4
a2i − σa2 .
i
1
)
• EM algorithm:
– E-step: Compute the conditional expectation of the score functions given
the observed data:
E{S(θ) | y; θ̂(t) }.
To compute the conditional expectation, we need to derive f (ai | yi ) under
the current parameter values
(t)
f (ai | yi ; θ̂(t) ) = R
(t)
f1 (yi | ai ; θ̂1 )f2 (ai ; θ̂2 )
(t)
(t)
f1 (yi | ai ; θ̂1 )f2 (ai ; θ̂2 )dai
.
(1)
When both f1 and f2 are normal, then the above conditional distribution
is also normal
ai | yi ∼ N [E(ai | yi ), V (ai | yi )]
where
σa2
E(ai | yi ) = 2
(ȳi − x̄i β)
σa + σe2 /n2
and
σ2
V (ai | yi ) = 2 a2
σa + σe /n2
σe2
n2
.
– M-step: Update the parameter by solving
E{S(θ) | y; θ̂(t) } = 0
for θ, where the conditional expectation is computed from the E-step.
• If the normality does not hold either in f1 or in f2 , then (1) is not necessarily
normal. In this case, E-step may involve Monte Carlo approximation.
2
Monte Carlo EM algorithm
• y: observe data, z: latent variable.
• We are interested in computing E{S(θ; y, Z) | y}.
2
• Wei and Tanner (1990) proposed Monte Carlo EM algorithm: In the E-step,
first draw
z ∗(1) , · · · , z ∗(m) ∼ f z | y; θ(t)
and approximate
m
1 X
E{S(θ; y, z) | y} ∼
S(θ; y, z ∗(j) )
=
m j=1
Example 1
• Suppose that
yi ∼ f (yi | xi ; θ)
Assume that xi is always observed but we observe yi only when δi = 1 where
δi ∼ Bernoulli [πi (φ)] and
πi (φ) =
exp (φ0 + φ1 xi + φ2 yi )
.
1 + exp (φ0 + φ1 xi + φ2 yi )
• To implement the MCEM method, in the E-step, we need to generate samples
from
f (yi | xi , δi = 0; θ̂, φ̂) = R
f (yi | xi ; θ̂){1 − πi (φ̂)}
f (yi | xi ; θ̂){1 − πi (φ̂)}dyi
.
• We can use the following rejection method to generate samples from f (yi |
xi , δi = 0; θ̂, φ̂):
1. Generate yi∗ from f (yi | xi ; θ̂).
2. Using yi∗ , compute
πi∗ (φ̂) =
exp(φ̂0 + φ̂1 xi + φ̂2 yi∗ )
1 + exp(φ̂0 + φ̂1 xi + φ̂2 yi∗ )
Accept yi∗ with probability 1 − πi∗ (φ̂).
3. If yi∗ is not accepted, then goto Step 1.
3
.
∗(1)
• Using the m imputed values of yi , denoted by yi
∗(m)
, · · · , yi
, and the M-step
can be implemented by solving
n X
m
X
∗(j)
=0
S θ; xi , yi
i=1 j=1
and
n X
m n
o
X
∗(j)
∗(j)
= 0,
δi − π(φ; xi , yi ) 1, xi , yi
i=1 j=1
where S (θ; xi , yi ) = ∂ log f (yi | xi ; θ)/∂θ.
Example 2 (Cont’d)
• Basic Setup: Let yij be a binary random variable (that takes 0 or 1) with
probability pij = P r (yij = 1 | xij , ai ) and we assume that
logit (pij ) = x0ij β + ai
where xij is a p-dimensional covariate associate with j-th repetition of unit i,
β is the parameter of interest that can represent the treatment effect due to x,
and ai represents the random effect associate with unit i. We assume that ai
are iid with N (0, σ 2 ).
• Missing data : ai
• Observed likelihood:
)
(
a YZ Y
1
i
dai
Lobs β, σ 2 =
p (xij , ai ; β)yij [1 − p (xij , ai ; β)]1−yij φ
σ
σ
i
j
where φ (·) is the pdf of the standard normal distribution.
• MCEM approach: generate a∗i from
f (ai | xi , yi ; β̂, σ̂) ∝ f1 (yi | xi , ai ; β̂)f2 (ai ; σ̂).
where
f1 (yi | xi , ai ; β̂) =
Y
p (xij , ai ; β)yij [1 − p (xij , ai ; β)]1−yij
j
and
f2 (ai ; σ̂) =
4
1 ai φ
.
σ
σ
• Metropolis-Hastings algorithm:
1. Generate a∗i from f2 (ai ; σ̂).
2. Set
(
(t)
ai =
where
a∗i
(t−1)
ai
(t−1)
w.p. ρ(ai
, a∗i )
(t−1) ∗
w.p. 1 − ρ(ai
, ai )


∗


f1 yi | xi , ai ; β̂
(t−1) ∗
, 1 .
ρ ai
, ai = min

 f y | x , a(t−1) ; β̂
1
i
i i
Remark
• Monte Carlo EM can be used as a frequentist approach to imputation.
• Convergence is not guaranteed (for fixed m).
• E-step can be computationally heavy. (May use MCMC method).
3
Parametric fractional imputation
Motivation
• We are interested in computing E{S(θ; yi , Zi ) | yi ; θ(t) }
• The conditional distribution is not of known form:
f (zi | yi ; θ̂) = R
f1 (yi | zi ; θ̂1 )f2 (zi ; θ̂2 )
f1 (yi | zi ; θ̂1 )f2 (zi ; θ̂2 )dzi
• Approximate the conditional expectation by
E{S(θ; yi , Zi ) | yi ; θ̂} ∼
=
m
X
j=1
∗(1)
where zi
∗(m)
, · · · , zi
are generated from f2 (zi ; θ̂2 ) and
∗(j)
∗
wij
f1 (yi | zi
= Pm
k=1
5
f1 (yi |
∗(j)
∗
S(θ; yi , zi
wij
; θ̂1 )
.
∗(k)
zi ; θ̂1 )
)
(2)
∗(1)
• More generally, we may use (2) where zi
and
∗(j)
∗
wij
∝
with
P
j
f1 (yi | zi
∗(m)
, · · · , zi
∗(j)
; θ̂1 )f2 (zi
are generated from h(zi ; θ̂2 )
; θ̂2 )
∗(j)
h(zi ; θ̂2 )
∗
wij
= 1.
Remark
• Importance sampling idea: For sufficiently large m,
m
X
R
∗
wij
g
∗
zij
∼
=
j=1
i ,zi ;θ̂)
n
o
h(zi )dzi
g(zi ) f (yh(z
i)
=
E
g
(z
)
|
y
;
θ̂
i
i
R f (yi ,zi ;θ̂)
h(zi )dzi
h(zi )
for any g such that the expectation exists.
• In the importance sampling literature, h(·) is called proposal distribution and
f (·) is called target distribution.
∗
are the normalized importance weights and can be called frac• The weight wij
tional weight.
• If ymis,i is categorical, then simply use all possible values of ymis,i as the imputed
values and then assign their conditional probabilities as the fractional weights.
Monte Carlo EM algorithm using PFI (Kim, 2011)
∗(j)
1. Imputation-step: generate zi
∼ h (z), where h(·) does not depend on θ.
2. Weighting-step: compute
∗(j)
∗
wij(t)
∝ f (yi , zi
where
Pm
j=1
∗(j)
; θ̂(t) )/h(zi
)
∗
wij(t)
= 1.
3. M-step: update
θ̂
(t+1)
: solution to
n X
m
X
i=1 j=1
6
∗
wij(t)
S
∗(j)
θ; yi , zi
= 0.
4. Repeat Step2 and Step 3 until convergence.
• “Imputation Step” + “Weighting Step” = E-step.
∗
is too large for some j. In this
• We may add an optional step that checks if wij(t)
case, h(z) needs to be changed.
• The imputed values are not changed for each EM iteration. Only the fractional
weights are changed.
1. Computationally efficient (because we use importance sampling only once).
2. Convergence is achieved (because the imputed values are not changed).
Return to Example 1
• Fractional imputation
1. Imputation Step: Generate
∗(1)
yi , · · ·
∗(m)
, yi
from f yi | xi ; θ̂(0) .
2. Weighting Step: Using the m imputed values generated from Step 1, compute the fractional weights by
∗(j)
o
f yi | xi ; θ̂(t) n
∗
1 − π(xi , yi∗(j) ; φ̂(t) )
wij(t)
∝ ∗(j)
f yi | xi ; θ̂(0)
where
exp φ̂0 + φ̂1 xi + φ̂2 yi
.
π(xi , yi ; φ̂) =
1 + exp φ̂0 + φ̂1 xi + φ̂2 yi
• Using the imputed data and the fractional weights, the M-step can be implemented by solving
n X
m
X
∗(j)
∗
wij(t)
S θ; xi , yi
=0
i=1 j=1
and
n X
m
X
∗
wij(t)
n
o
∗(j)
∗(j)
δi − π(φ; xi , yi ) 1, xi , yi
= 0,
i=1 j=1
where S (θ; xi , yi ) = ∂ log f (yi | xi ; θ)/∂θ.
7
4
Applications (Small Area Estimation)
• Hierarchical structural model
1. Level one model: yij ∼ f1 (yij | xij , ai ; θ1 )
2. Level two model: ai ∼ f2 (ai ; θ2 )
• Instead of observing (xij , yij ), we observe (xij , ŷij ), where
ŷij | yij ∼ N (yij , vij ).
You may think that i is a state-level index and j is a county-level index.
• Thus, we have two missing data: ai and yi .
• EM algorithm
1. E-step: We are interested in generating
(t)
(t)
(yi , ai ) | (xi , ŷi ) ∼ R
f1 (yi | xi , ai ; θ̂1 )f2 (ai ; θ̂2 )g(ŷi | yi )
(t)
(t)
f1 (yi | xi , ai ; θ̂1 )f2 (ai ; θ̂2 )g(ŷi | yi )dyi
.
(3)
One can use either MCMC method or PFI method to compute the conditional expectation. If PFI is used, then we may use the following steps:
∗(1)
(a) Generate ai
∗(k)
(b) For each ai
∗(k)
xij , ai
∗(m)
, · · · , ai
(t)
from f2 (ai ; θ̂2 ).
∗(k)
from some proposal distribution h(yij |
, generate yij
, ŷij ). One can use a normal distribution with mean
∗(k)
E(yij | xij , ai
, ŷij ) =
vij
2(t)
σe
+ vij
∗(k)
xij β̂ (t) + ai
2(t)
σe
+
2(t)
σe
and variance vij .
∗(k)
(c) The fractional weight assigned to (ai
∗(k)
∗
wik(t)
∝
with
P
k
f1 (yi
∗(k)
h(yi
∗
wik(t)
= 1.
8
∗(k)
| xi , ai
|
∗(k)
, yi
) is then
(t)
∗(k)
; θ̂1 )g(ŷi | yi
∗(k)
xi , ai , ŷi )
)
+ vij
ŷij
2. M-step: the parameters are updated by solving
XX
i
∗(k)
∗
wik(t)
S1 (θ1 ; xi , yi
)=0
k
and
XX
i
∗(k)
∗
wik(t)
S2 (θ2 ; ai
)=0
k
where S1 (θ1 ; xi , yi , ai ) = ∂ log f1 (yi | xi , ai ; θ1 )/∂θ1 and S2 (θ2 ; ai ) = ∂ log f2 (ai ; θ2 )/∂θ2 .
• Suppose that the problem is a measurement error problem such that we observe
(x̂ij , ŷij ) instead of observing (xij , yij ), where
x̂ij | xij ∼ g1 (x̂ij | xij ; vij1 )
and
ŷij | yij ∼ g2 (ŷij | yij ; vij2 ).
In this case, (3) is changed to
(t)
(t)
(xi , yi , ai ) | (x̂i , ŷi ) ∼ R
f1 (yi | xi , ai ; θ̂1 )f2 (ai ; θ̂2 )g2 (ŷi | yi )g̃1 (xi | x̂i )
(t)
(t)
f1 (yi | xi , ai ; θ̂1 )f2 (ai ; θ̂2 )g2 (ŷi | yi )g̃1 (xi | x̂i )dyi
,
(4)
where g̃1 (xi | x̂i ) ∝ g1 (x̂i | xi )g(xi ). The following PFI-EM algorithm can be
used:
1. E-step:
∗(1)
(a) Generate ai
∗(m)
from f2 (ai ; θ̂2 ).
∗(m)
from h1 (xij | x̂ij ), which can be N (x̂ij , vij1 ).
, · · · , ai
∗(1)
(b) Generate xij , · · · , xij
∗(k)
(c) For each ai
∗(k)
∗(k)
xij , ai
(t)
∗(k)
from some proposal distribution h(yij |
, generate yij
, ŷij ). One can use a normal distribution with mean
∗(k)
∗(k)
E(yij | xij , ai
, ŷij ) =
vij
2(t)
σe
and variance vij .
9
+ vij
∗(k)
∗(k)
xij β̂ (t) + ai
2(t)
+
σe
2(t)
σe
+ vij
ŷij
∗(k)
(d) The fractional weight assigned to (ai
∗(k)
∗
wik(t)
with
P
k
∝
f1 (yi
∗(k)
| xi
∗(k)
h(yi
∗(k)
, ai
|
∗(k)
, xi
∗(k)
, yi
∗(k)
(t)
; θ̂1 )g2 (ŷi | yi
)
∗(k) ∗(k)
xi , ai , ŷi )
) is then
×
g̃1 (xi
∗(k)
| x̂i )
∗(k)
h1 (xi
| x̂i )
∗
= 1. If we use h1 (·) = g̃1 (·), then the second term equals
wik(t)
to one.
2. M-step: the parameters are updated by solving
XX
i
∗(k)
∗
S1 (θ1 ; xi
wik(t)
∗(k)
, yi
)=0
k
and
XX
i
∗(k)
∗
wik(t)
S2 (θ2 ; ai
k
10
) = 0.
Download