A note on multiple imputation for general purpose estimation Shu Yang

advertisement
A note on multiple imputation for general purpose
estimation
Shu Yang
Jae Kwang Kim
SSC meeting
June 16, 2015
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
1 / 32
Introduction
Basic Setup
Assume simple random sampling, for simplicity.
Under complete response, suppose that
η̂n,g = n−1
n
X
g (yi )
i=1
is an unbiased estimator of ηg = E {g (Y )}, for known g (·).
δi = 1 if yi is observed and δi = 0 otherwise.
yi∗ : imputed value for yi for unit i with δi = 0.
Imputed estimator of ηg
η̂I ,g = n−1
n
X
{δi g (yi ) + (1 − δi )g (yi∗ )}
i=1
Need E {g (yi∗ ) | δi = 0} = E {g (yi ) | δi = 0}.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
2 / 32
Introduction
ML estimation under missing data setup
Often, find x (always observed) such that
Missing at random (MAR) holds: f (y | x, δ = 0) = f (y | x)
Imputed values are created from f (y | x).
Computing the conditional expectation can be a challenging problem.
1
Do not know the true parameter θ in f (y | x) = f (y | x; θ):
E {g (y ) | x} = E {g (y ) | x; θ} .
2
Even if we know θ, computing the conditional expectation can be
numerically difficult.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
3 / 32
Introduction
Imputation
Imputation: Monte Carlo approximation of the conditional
expectation (given the observed data).
m
1 X ∗(j) g yi
E {g (yi ) | xi } ∼
=
m
j=1
1
2
Bayesian approach: generate yi∗ from
Z
f (yi | xi , yobs ) = f (yi | xi , θ) p(θ | xi , yobs )dθ
Frequentist approach: generate yi∗ from f yi | xi ; θ̂ , where θ̂ is a
consistent estimator.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
4 / 32
Introduction
Basic Setup (Cont’d)
Thus, imputation is a computational tool for computing the
conditional expectation E {g (yi ) | xi } for missing unit i.
To compute the conditional expectation, we need to specify a model
f (y | x; θ) evaluated at θ = θ̂.
Thus, we can write η̂I ,g = η̂I ,g (θ̂).
To estimate the variance of η̂I ,g , we need to take into account of the
sampling variability of θ̂ in η̂I ,g = θ̂I ,g (θ̂).
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
5 / 32
Introduction
Basic Setup (Cont’d)
Three approaches
Bayesian approach: multiple imputation by Rubin (1978, 1987),
Rubin and Schenker (1986), etc.
Resampling approach: Rao and Shao (1992), Efron (1994), Rao and
Sitter (1995), Shao and Sitter (1996), Kim and Fuller (2004), Fuller
and Kim (2005).
Linearization approach: Clayton et al (1998), Shao and Steel (1999),
Robins and Wang (2000), Kim and Rao (2009).
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
6 / 32
Comparison
Model
Computation
Prediction
Parameter update
Parameter est’n
Imputation
Variance estimation
Shu Yang, Jae Kwang Kim
Bayesian
Posterior distribution
f (latent, θ | data)
Data augmentation
I-step
P-step
Posterior mode
Multiple imputation
Rubin’s formula
Multiple Imputation
Frequentist
Prediction model
f (latent | data, θ)
EM algorithm
E-step
M-step
ML estimation
Fractional imputation
Linearization
or Bootstrap
June 16, 2015
7 / 32
MultipleImputation
Step 1. (Imputation) Create M complete datasets by filling in missing
values with imputed values generated from the posterior predictive
distribution.
To create the jth imputed dataset, first generate θ∗(j) from the
∗(j)
posterior distribution p(θ | Xn , yobs ), and then generate yi
from the
∗(j)
imputation model f (y | xi ; θ ) for each missing yi .
Step 2. (Analysis) Apply the user’s complete-sample estimation
procedure to each imputed dataset.
Let η̂ (j) be the complete-sample estimator of η = E {g (Y )} applied to
the jth imputed dataset and V̂ (j) be the complete-sample variance
estimator of η̂ (j) .
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
8 / 32
Multiple Imputation
Step 3. (Summarize) Use Rubin’s combining rule to summarize the
results from the multiply imputed datasets.
The multiple imputation estimator of η, denoted by η̂MI , is
η̂MI =
M
1 X (j)
η̂I
M
j=1
Rubin’s variance estimator is
1
V̂MI (η̂MI ) = WM + 1 +
M
where WM = M −1
Shu Yang, Jae Kwang Kim
PM
j=1
BM ,
V̂ (j) and BM = (M − 1)−1
Multiple Imputation
(j)
j=1 (η̂I
PM
− η̂MI )2 .
June 16, 2015
9 / 32
Multiple imputation
Motivating Example
Suppose that you are interested in estimating η = P(Y ≤ 3).
Assume a normal model for f (y | x; θ) for multiple imputation.
Two choices for η̂n :
1
2
Method-of-moments (MOM) estimator: η̂n1 = n−1
Maximum-likelihood estimator:
η̂n2 = n−1
n
X
Pn
i=1
I (yi ≤ 3).
P(Y ≤ 3 | xi ; θ̂),
i=1
where P(Y ≤ 3 | xi ; θ̂) =
R3
−∞
f (y | xi ; θ̂)dy .
Rubin’s variance estimator is nearly unbiased for η̂n2 , but provide
conservative variance estimation for η̂n1 (30-50% overestimation of
the variances in most cases).
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
10 / 32
Introduction
The goal is to
1
understand why MI provide biased variance estimation for MOM
estimation.
2
characterize the asymptotic bias of MI variance estimator when MOM
estimator is used in the complete-sample analysis;
3
give an alternative variance estimator that can provide asymptotically
valid inference for MOM estimation.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
11 / 32
Multiple imputation
Rubin’s variance estimator is based on the following decomposition,
var(η̂MI ) = var(η̂n ) + var(η̂MI ,∞ − η̂n ) + var(η̂MI − η̂MI ,∞ ),
(1)
where η̂n is the complete-sample estimator of η and η̂MI ,∞ is the
probability limit of η̂MI for M → ∞.
Under some regularity conditions, WM term estimates the first term,
the BM term estimates the second term, and the M −1 BM term
estimates the last term of (1), respectively.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
12 / 32
Multiple imputation
In particular, Kim et al (2006, JRSSB) proved that the bias of
Rubin’s variance estimator is
Bias(V̂MI ) ∼
= −2cov(η̂MI − η̂n , η̂n ).
(2)
The decomposition (1) is equivalent to assuming that
cov(η̂MI − η̂n , η̂n ) ∼
= 0, which is called the congeniality condition by
Meng (1994).
The congeniality condition holds when η̂n is the MLE of η. In such
cases, Rubin’s variance estimator is asymptotically unbiased.
If method of moment (MOM) estimator is used to estimate
η = E {g (Y )}, Rubin’s variance estimator can be asymptotically
biased.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
13 / 32
MI for MOM estimator: No x variable
Assume that y is observed for the first r elements.
MI for MOM estimator of η = E {g (Y )}:
η̂MI =
r
n
M
1X
1 1 X X
∗(j)
g (yi ) +
g (yi )
n
nM
i=1
i=r +1 j=1
∗(j)
where yi
are generated from f (y | θ∗(j) ) and θ∗(j) are generated
from p(θ | yobs ).
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
14 / 32
MI for MOM estimator: No x variable (Cont’d)
Since, conditional on the observed data
M
1 X
∗(j)
g (yi ) = E [E {g (Y ); θ∗ } | yobs ] ∼
= E {g (Y ); θ̂}
M→∞ M
p lim
j=1
and the MI estimator of η (for M → ∞) can be written
η̂MI
=
r
n−r
η̂MME ,r +
η̂MLE ,r .
n
n
Thus, η̂MI is a convex combination of η̂MME ,r and η̂MLE ,r , where
P
η̂MLE ,r = E {g (Y ); θ̂} and η̂MME ,r = r −1 ri=1 g (yi ).
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
15 / 32
MI for MOM estimator: No x variable (Cont’d)
Since
Bias(V̂MI ) ∼
= −2Cov (η̂MME ,n , η̂MI − η̂MME ,n ),
we have only to evaluate the covariance term.
Writing
η̂MME ,n = p · η̂MME ,r + (1 − p) · η̂MME ,n−r ,
where p = r /n, we can obtain
Cov (η̂MME ,n , η̂MI − η̂MME ,n )
= Cov {η̂MME ,n , (1 − p)(η̂MLE ,r − η̂MME ,n−r )}
= p(1 − p)Cov {η̂MME ,r , η̂MLE ,r } − (1 − p)2 V {η̂MME ,n−r }
= p(1 − p) {V (η̂MLE ,r ) − V (η̂MME ,r )} ,
which proves Bias(V̂MI ) ≥ 0.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
16 / 32
Bias of Rubin’s variance estimator
Theorem
P
Let η̂n = n−1 ni=1 g (yi ) be the method of moments estimator of
η = E {g (Y )} under complete response. Assume that E (V̂ (j) ) ∼
= var(η̂n )
holds for j = 1, . . . , M. Then for M → ∞, the bias of Rubin’s variance
estimator is
T
Bias(V̂MI ) ∼
Iθ−1 ṁθ,1 ,
= 2n−1 (1 − p) E [var{g (Y ) | X } | δ = 0] − ṁθ,0
where p = r /n, Iθ = −E {∂ 2 log f (Y | X ; θ)/∂θ∂θT },
m(x; θ) = E {g (Y ) | x; θ}, ṁθ (x) = ∂m(x; θ)/∂θ,
ṁθ,0 = E {ṁθ (X ) | δ = 0}, and ṁθ,1 = E {ṁθ (X ) | δ = 1}.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
17 / 32
Bias of Rubin’s variance estimator
Under MCAR, the bias simplifies to
Bias(V̂MI ) ∼
= 2p(1 − p){var(η̂r ,MME ) − var(η̂r ,MLE )},
where η̂r ,MME = r −1
Pr
i=1
g (yi ) and η̂r ,MLE = r −1
Pr
i=1
E {g (Y ) | xi ; θ̂}.
This shows that Rubin’s variance estimator is unbiased if and only if the
method of moments estimator is as efficient as the maximum likelihood
estimator, that is, var(η̂r ,MME ) ∼
= var(η̂r ,MLE ).
Otherwise, Rubin’s variance estimator is positively biased.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
18 / 32
Rubin’s variance estimator can be negatively biased
Now consider a simple linear regression model which contains one
covariate X and no intercept.
By Theorem 1,
2
∼ 2(1 − p)σ
Bias(V̂MI ) =
n
E (X | δ = 0)E (X | δ = 1)
1−
E (X 2 | δ = 1)
,
which can be zero, positive or negative.
If
E (X | δ = 0) >
E (X 2 | δ = 1)
E (X | δ = 1)
the Rubins’ variance estimator can be negatively biased.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
19 / 32
Alternative variance estimation
Decompose
var (η̂MI,∞ ) = n−1 V1 + r −1 V2 ,
where
V1 = var{g (Y )} − (1 − p)E [var{g (Y ) | X } | δ = 0],
and
T
V2 = ṁθT Iθ−1 ṁθ − p 2 ṁθ,1
Iθ−1 ṁθ,1 .
The first term n−1 V1 , is the variance of the sample mean of
δi g (yi ) + (1 − δi )m(xi ; θ).
The second term r −1 V2 , reflects the variability associated with the
estimated value of θ instead of the true value θ in the imputed values.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
20 / 32
Alternative variance estimation
Estimate n−1 V1
The first term n−1 V1 can be estimated by W̃M = WM − CM , where
P
(j)
WM = M −1 M
j=1 V̂ , and
CM
M X
n
X
1
= 2
n (M − 1)
(
∗(k)
g (yi )
k=1 i=r +1
M
1 X
∗(k)
−
g (yi )
M
)2
.
k=1
∼ n−1 var{g (Y )} and
since E {WM } =
E (CM ) ∼
= n−1 (1 − p)E [var{g (Y ) | X } | δ = 0].
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
21 / 32
Alternative variance estimation
Estimate r −1 V2
To estimate the second term term, we use over-imputation:
Table: Over-Imputation Data Structure
Shu Yang, Jae Kwang Kim
Row
1
..
.
∗(1)
g1
1
r
r +1
..
.
gr
∗(1)
gr +1
..
.
n
gn
average
η̄n
...
...
..
.
∗(1)
∗(1)
∗(1)
M
∗(M)
g1
..
.
∗(M)
ḡr∗
ḡr∗+1
..
.
∗(M)
ḡn∗
∗(M)
η̄n∗
...
...
gr
∗(M)
gr +1
..
.
...
gn
...
η̄n
Multiple Imputation
average
ḡ1∗
..
.
June 16, 2015
22 / 32
Alternative variance estimation
Estimate r −1 V2
The second term r −1 V2 can be estimated by DM = DM,n − DM,r ,
where
−1
DM,n = (M −1)
M
X
k=1
DM,r = (M −1)−1
since
n
X
∗(k)
di )2 −(M −1)−1
i=1
M
X
k=1
n
−2
n
X
∗(k)
(di )2 ,
i=1
r
M
M
r
X
X
X
X
∗(k)
∗(k)
di )2 −(M −1)−1
n−2
(di )2 .
(n−1
k=1
with
(n
−1
i=1
P
∗(k)
∗(l)
= g (yi ) − M −1 M
),
l=1 g (yi
−1
−1
T
∼
E (DM,n ) = r ṁθ Iθ ṁθ and E (DM,r )
k=1
i=1
(k)
di
Shu Yang, Jae Kwang Kim
Multiple Imputation
T I −1 ṁ .
∼
= r −1 p 2 ṁθ,1
θ,1
θ
June 16, 2015
23 / 32
Alternative variance estimation
Theorem
Under the assumptions of Theorem 1, the new multiple imputation
variance estimator is
V̂MI = W̃M + DM + M −1 BM ,
where W̃M = WM − CM , and BM being the usual between-imputation
variance. V̂MI is asymptotically unbiased for estimating the variance of the
multiple imputation estimator as n → ∞.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
24 / 32
Simulation study
Set up 1
Samples of size n = 2, 000 are independently generated from
Yi = βXi + ei ,
where β = 0.1, Xi ∼ exp(1) and ei ∼ N(0, σe2 ) with σe2 = 0.5.
Let δi be the response indicator of yi and δi ∼ Bernoulli(pi ), where
pi = 1/{1 + exp(−φ0 − φ1 xi )}.
We consider two scenarios:
(i) (φ0 , φ1 ) = (−1.5, 2) and (ii) (φ0 , φ1 ) = (3, −3)
The parameters of interest are η1 = E (Y ) and η2 = pr(Y < 0.15).
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
25 / 32
Simulation study
Result 1
Table: Relative biases of two variance estimators and mean width and coverages
of two interval estimates under two scenarios in simulation one
Relative bias
(%)
Scenario
Rubin New
1
η1 96.8
0.7
η2 123.7 2.9
2
η1 -19.8
0.4
η2
-9.6
-0.4
C.I., confidence interval; η1
Shu Yang, Jae Kwang Kim
Mean Width
Coverage
for 95% C.I.
for 95% C.I.
Rubin New Rubin New
0.038 0.027 0.99 0.95
0.027 0.018 1.00 0.95
0.061 0.069 0.91 0.95
0.037 0.039 0.93 0.95
= E (Y ); η2 = pr(Y < 0.15).
Multiple Imputation
June 16, 2015
26 / 32
Simulation study
Result 1
For η1 = E (Y ), under scenario (i), the relative bias of Rubin’s
variance estimator is 96.8%, which is consistent with our result with
E1 (X 2 ) − E0 (X )E1 (X ) > 0, where E1 (X 2 ) = 3.38, E1 (X ) = 1.45, and
E0 (X ) = 0.48.
Under scenario (ii), the relative bias of Rubin’s variance estimator is
−19.8%, which is consistent with our result with
E1 (X 2 ) − E0 (X )E1 (X ) < 0, where E1 (X 2 ) = 0.37, E1 (X ) = 0.47, and
E0 (X ) = 1.73.
The empirical coverage for Rubin’s method can be over or below the
nominal coverage due to variance overestimation or underestimation.
On the other hand, the new variance estimator is essentially unbiased
for these scenarios.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
27 / 32
Simulation study
Set up 2
Samples of size n = 200 are independently generated from
Yi = β0 + β1 Xi + ei ,
where β = (β0 , β1 ) = (3, −1), Xi ∼ N(2, 1) and ei ∼ N(0, σe2 ) with
σe2 = 1.
The parameters of interest are η1 = E (Y ) and η2 = pr(Y < 1).
We consider two different factors:
1
2
The response mechanism: MCAR and MAR:
For MCAR, δi ∼ Bernoulli(0.6).
For MAR, δi ∼ Bernoulli(pi ), where pi = 1/{1 + exp(−0.28 − 0.1xi )}
with the average response rate about 0.6.
The size of multiple imputation, with two levels M = 10 and M = 30.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
28 / 32
Simulation study
Result 2
Table: Relative biases of two variance estimators and mean width and coverages
of two interval estimates under two scenarios of missingness in simulation two
Relative Bias
Mean Width
Coverage
(%)
for 95% C.I.
for 95% C.I.
M Rubin
New
Rubin
New
Rubin New
Missing completely at random
η1 10 -0.9
-1.58
0.240
0.250
0.95
0.95
30 -0.6
-1.68 0.230
0.235
0.95
0.95
η2 10 22.7
-1.14 0.083
0.083
0.98
0.95
30 23.8
-1.23 0.082
0.075
0.98
0.95
Missing at random
η1 10
-1.0
-1.48
0.23
0.25
0.95
0.95
30
-0.9
-1.59
0.231
0.23
0.95
0.95
η2 10
20.7
-1.64
0.081
0.081
0.98
0.95
30
21.5
-1.74
0.074
0.071
0.98
0.95
C.I., confidence interval; η1 = E (Y ); η2 = pr(Y < 1).
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
29 / 32
Simulation study
Result 2
Both Rubin’s variance estimator and our new variance estimator are
unbiased for η1 = E (Y ).
Rubin’s variance estimator is biased upward for η2 = pr(Y < 1), with
absolute relative bias as high as 24%; whereas our new variance
estimator reduces absolute relative bias to less than 1.74%.
For Rubin’s method, the empirical coverage for η2 = pr(Y < 1)
reaches to 98% for 95% confidence intervals, due to variance
overestimation.
In contrast, our new method provides more accurate coverage of
confidence interval for both η1 = E (Y ) and η2 = pr(Y < 1) at 95%
levels.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
30 / 32
Conclusion
Investigate asymptotic properties of Rubin’s variance estimator.
If method of moment is used, Rubin’s variance estimator can be
asymptotically biased.
New variance estimator, based on multiple over-imputation, can
provide valid variance estimation in this case.
Our method can be extended to a more general class of parameters
obtained from estimating equations.
n
X
U(η; xi , yi ) = 0.
i=1
This is a topic of future study.
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
31 / 32
The end
Shu Yang, Jae Kwang Kim
Multiple Imputation
June 16, 2015
32 / 32
Download