8. Multiple Imputation: Part 2 1 Introduction

advertisement
8. Multiple Imputation: Part 2
1
Introduction
• yi (i = 1, · · · , n): random sample from a joint density f (y; θ).
• The first r elements are observed and the rest n − r elements are missing.
• Assume MAR (Missing At Random)
• The parameter of interest is η = E{g(Y )} =
R
g(y)f (y; θ)dy. Thus, η is a
function of θ. For example, g(Y ) = I(Y < 1) leads to η = P (Y < 1).
• MI estimator is justified when η̂n (complete sample estimator of η) is the MLE
of η. That is, η̂n = η(θ̂M LE,n ).
• What if η̂n is a MME (Method of Moment estimator) of η? The MME of η is
n
η̂M M E,n
1X
=
g(yi ).
n i=1
• MI estimator of η:
η̂M I
where
(k)
η̂M M E,I
∗(k)
where yi
1
=
n
M
1 X (k)
=
η̂
M k=1 M M E,I
( r
X
i=1
g(yi ) +
n
X
)
∗(k)
g(yi )
i=r+1
are generated from f (y; θ∗(k) ) and θ∗(k) are generated from p(θ | yobs ).
−1
• Under some regularity conditions, p(θ | yobs ) converges in distribution to N (θ̂, Iobs
)
where θ̂ is the MLE of θ and Iobs is the information matrix obtained from the
observed likelihood of θ.
1
• MI variance estimator:
V̂M I = WM
1
+ 1+
M
BM
(1)
where
WM
M
1 X (k)
=
V̂
M k=1 n,I
M
BM =
2
1 X (k)
η̂M M E,I − η̂M I .
M − 1 k=1
Here,
(k)
V̂n,I
" r
#
n
X
X
1
(k)
∗(k)
(k)
2
2
{g(yi ) − η̂M M E,I } +
{g(yi ) − η̂M M E,I }
=
n(n − 1) i=1
i=r+1
is the variance estimator of η̂M M E,n applied to the k-th imputed data.
• Is the MI variance estimator in (1) approximately unbiased for V (η̂M I )?
• The answer is “Yes” if the variance satisfies
V (η̂M I ) = V (η̂M M E,n ) + V (η̂M I − η̂M M E,n ).
(2)
It can be shown that
.
E{WM } = V (η̂M M E,n )
(3)
and
E
.
1 + M −1 BM = V (η̂M I − η̂M M E,n )
(4)
Kim et al (2006) proved (3) and (4) for more general cases.
• Meng (1994) called condition (2) congeniality condition.
• What is missing in (2) is the covariance term. That is, the true variance is
V (η̂M I ) = V (η̂M M E,n ) + V (η̂M I − η̂M M E,n ) + 2Cov(η̂M M E,n , η̂M I − η̂M M E,n ).
• We will show in Section 2 that
2Cov(η̂M M E,n , η̂M I − η̂M M E,n ) ≤ 0,
which implies that MI variance estimator overestimates the true variance.
2
(5)
2
Congeniality
• MI estimator of η:
η̂M I
∗(j)
where yi
r
n
M
1 1 X X
1X
∗(j)
g(yi ) +
g(yi )
=
n i=1
n M i=r+1 j=1
are generated from f (y; θ∗(j) ) and θ∗(j) are generated from p(θ | yobs ).
• If we define
η̂M LE,r = E{g(Y ); θ̂}
r
1X
η̂M M E,r =
g(yi ),
r i=1
then
M
1 X
∗(j)
g(yi ) = E [E {g(Y ); θ∗ } | yobs ] = E{g(Y ); θ̂}
p lim
M →∞ M
j=1
and the MI estimator of η (for M → ∞) can be written
η̂M I =
r
n−r
η̂M M E,r +
η̂M LE,r .
n
n
Thus, η̂M I is a convex combination of η̂M M E,r and η̂M LE,r . Note that
∂η
∂η
−1
V (η̂M LE,r ) =
Iobs
0
∂θ
∂θ
1
V (η̂M M E,r ) =
V {g(Y )}.
r
In general, we have V (η̂M M E,r ) ≥ V (η̂M LE,r ).
• Writing
η̂M M E,n = p · η̂M M E,r + (1 − p) · η̂M M E,n−r ,
where p = r/n, we can express
Cov(η̂M M E,n , η̂M I − η̂M M E,n )
= Cov{η̂M M E,n , (1 − p)(η̂M LE,r − η̂M M E,n−r )}
= (1 − p)Cov{η̂M M E,n , η̂M LE,r } − (1 − p)Cov{η̂M M E,n , η̂M M E,n−r }
= p(1 − p)Cov{η̂M M E,r , η̂M LE,r } − (1 − p)2 V {η̂M M E,n−r }.
3
Using Cov{η̂M M E,r , η̂M LE,r } = V {η̂M LE,r } and (1 − p)2 V {η̂M M E,n−r } = p(1 −
p)V {η̂M M E,r }, we have
Cov(η̂M M E,n , η̂M I − η̂M M E,n ) = p(1 − p) {V (η̂M LE,r ) − V (η̂M M E,r )}
which proves (5).
3
Application to survey sampling
• Finite population of N units identified by U = {1, · · · , N }. Let (xi , yi ) be the
measurement for unit i in the population.
• Let Ii = 1 if unit i is selected in the sample and Ii = 0 otherwise. Let A =
{i; Ii = 1} be the index set of the sample.
• Assume that x is always observed and y is subject to missingness. Let δi = 1 if
yi is observed and δi = 0 otherwise.
• Suppose that we are interested in estimating Y =
PN
i=1
yi . Let Ŷn =
P
i∈A
wi yi
be the full sample estimator of Y under complete response.
• MI estimator of Y :
ŶM I
M
1 X (k)
=
Ŷ
M k=1 I
(6)
where
(k)
ŶI
=
X
∗(k)
wi {δi yI + (1 − δi )yi
}.
i∈A
∗(k)
• How to obtain yi
in multiple imputation?
1. Assume SMAR (Sample Missing At Random):
f (y | x, I = 1, δ = 1) = f (y | x, I = 1, δ = 0)
2. Estimate f (y | x, I = 1, δ = 1).
3. Generate yi∗ ∼ fˆ(y | xi , Ii = 1, δi = 1).
• How to achieve SMAR in (7)?
4
(7)
• Lemma: Assume that
1. If the sampling mechanism is complete determined by x, that is,
P (I = 1 | x, y) = P (I = 1 | x)
(8)
for any y, and
2. PMAR holds (i.e. P (δ = 1 | x, y) = P (δ = 1 | x)),
then (7) holds.
Proof. Using the Bayes,
f (y | x, I = 1, δ = 0) =
P (I = 1 | x, y, δ = 0)
f (y | x, δ = 0).
P (I = 1 | x, δ = 0)
By (8), we have f (y | x, I = 1, δ = 0) = f (y | x, δ = 0). Similarly, we can
establish that f (y | x, I = 1, δ = 1) = f (y | x, δ = 1). By PMAR, we have
f (y | x, δ = 0) = f (y | x, δ = 1), which proves (7).
• In stratified sampling, for example, we should include the stratum indicator
functions into the model. (Augment x to include all the design variables.)
• MI variance estimation: Use (1) with
WM
M
1 X (k)
=
V̂
M k=1 n,I
BM
2
1 X (k)
=
ŶI − ŶM I .
M − 1 k=1
M
(k)
(k)
where V̂n,I is the naive variance estimator of ŶI
applied to the k-th multiple
imputation data.
• In computing WM , Rubin (1987) suggest using a design-based variance estimator for V̂n .
• Let’s consider multiple imputation under the linear regression model
yi = x0i β + ei ,
ei ∼ N (0, σ 2 ).
That is, the regression model holds for the sample. Also, assume SMAR.
5
• To implement multiple imputation, we assume that the first r units are the
respondents. Let yr = (y1 , y2 , · · · , yr )0 and Xr0 = (x1 , x2 , · · · , xr ) with xi =
0
= (xr+1 , xr+2 , · · · , xn ).
(1, xi )0 . Also, let yn−r = (yr+1 , yr+2 , · · · , yn )0 and Xn−r
The Bayesian regression imputation procedure is used as follows:
[Posterior Step] Draw
i.i.
σ ∗2 | yr ∼ (r − p) σ̂r2 /χ2r−p ,
(9)
and
∗
∗
i.i.
β | (yr , σ ) ∼ N
−1
β̂ r , (Xr0 Xr )
σ
∗2
,
(10)
where σ̂r2 = (r − p)−1 yr0 I − Xr (Xr0 Xr )−1 Xr0 yr and β̂r = (Xr0 Xr )−1 Xr0 yr .
[Imputation Step] For each missing unit j = r + 1, · · · , n, draw
i.i.
e∗j | (β ∗ , σ ∗ ) ∼ N 0, σ ∗2 .
Then, yj∗ = x0j β ∗ + e∗j is the imputed value associated with unit j.
• Kim (2004) suggested using χ2r−p+2 in (9) to improve small sample bias. (To
make σ ∗2 unbiased. )
• Note that
∗(k)
yi
∗(k)
= x0i β̂ + x0i (β ∗(k) − β̂) + ei
and
∗(k)
ŶI
= ŶM I,∞ +
X
wi x0i (β ∗(k) − β̂) +
i∈AM
X
∗(k)
wi ei
,
(11)
i∈AM
where
ŶM I,∞ =
X
wi yi +
i∈AR
X
wi x0i β̂,
i∈AM
AR is the set of respondents and AM is the set of nonrespondents.
• The three terms in (11) are mutually independent. Thus,
)
(
X
X
X
1
V (ŶM I ) = V (ŶM I,∞ ) +
(
wi xi )0 (Xr0 Xr )−1 (
wi xi ) + (
wi )2 σ 2 .
M
i∈A
i∈A
i∈A
M
M
M
(12)
6
• To discuss variance estimation, we use
V (ŶM I ) = V (Ŷn ) + V (ŶM I − Ŷn ) + 2Cov(Ŷn , ŶM I − Ŷn ).
The first component is estimated by naive variance estimator. That is, we have
.
E{WM } = V (Ŷn ). For the second part, we have
E{(1 + M −1 )BM } = V (ŶM I − Ŷn ).
(13)
A proof of (13) is given in Appendix A.
• It remains to check whether the third term, the covariance term, is zero. Note
that
Cov(Ŷn , ŶM I − Ŷn ) = Cov{
X
X
wi yi +
i∈AR
= Cov{
X
i∈AM
wi yi ,
i∈AR
= Cov{
X
X
X
+
wi x0i β̂,
X X
−V{
wi yi }
i∈AM
X
X
wi yi }
i∈AM
X
wi (yi −
x0i β̂),
wi x0i β̂} − V {
i∈AM
wi wj x0i (Xr0 Xr )−1 xj σ 2 −
X
X
X
wi x0i β̂} − V {
i∈AM
i∈AR
i∈AR
=
i∈AM
wi x0i β̂}
X
wi x0i β̂ −
i∈AM
wi x0i β̂
i∈AR
= Cov{
X
wi yi ,
i∈AM
wi yi }
i∈AM
X
wi2 σ 2 .
i∈AM
i∈AR j∈AM
If wi = a0 xi for some a, we have
X X
0
wi wj x0i (Xr0 Xr )−1 xj = a0 (Xr0 Xr )(Xr0 Xr )−1 (Xn−r
Xn−r )a
i∈AR j∈AM
0
= a0 (Xn−r
Xn−r )a
X
=
wi2 .
i∈AM
Thus, MI variance estimator is approximately unbiased when wi is included in
the column space of X.
Reference
Kim, J.K. (2004). “Finite sample properties of multiple imputation estimators,” The
Annals of Statistics, 32, 766-783.
7
wi yi }
Kim, J.K., Brick, M.J., Fuller, W.A., and Kalton, G. (2006). “On the bias of the
multiple imputation variance estimator in survey sampling,” Journal of the Royal
Statistical Society: Series B, 68, 509-521.
Meng, X. L. (1994). Multiple-imputation inferences with uncongenial sources of input
(with discussion). Statistical Science, 9:538–573.
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley &
Sons, New York.
Appendix
A. Proof of (13)
Let ŶM I = M −1
PM
k=1
(k)
ŶI
be the MI estimator of Y . We first use that
X X (i)
1
1
(j)
(ŶI − ŶI )2
2 M (M − 1) i j
BM =
(1)
(M )
and, since ŶI , · · · , ŶI
are identically distributed,
(1)
E(BM ) = 0.5V (ŶI
(2)
− ŶI ).
Thus, using (11), we have
)
(
X
X
X
wi )2 σ 2 .
w i xi ) + (
wi xi )0 (Xr0 Xr )−1 (
E(BM ) = (
(14)
i∈AM
i∈AM
i∈AM
Now, using (12), note that
1
V (ŶM I −Ŷn ) = V (ŶM I,∞ −Ŷn )+
M
(
)
(
X
wi xi )0 (Xr0 Xr )−1 (
X
w i xi ) + (
i∈AM
i∈AM
X
wi )2
σ2
i∈AM
(15)
and
V (ŶM I,∞ − Ŷn ) = V {
X
wi x0i β̂ −
i∈AM
X
wi yi }
i∈AM
(
=
(
)
X
wi xi )0 (Xr0 Xr )−1 (
i∈AM
X
wi xi ) + (
i∈AM
Thus, inserting (16) into (15) and using (14), we prove (13).
8
X
i∈AM
wi )2
σ 2 . (16)
Download