A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume simple random sampling, for simplicity. Under complete response, suppose that η̂n,g = n−1 n X g (yi ) i=1 is an unbiased estimator of ηg = E {g (Y )}, for known g (·). δi = 1 if yi is observed and δi = 0 otherwise. yi∗ : imputed value for yi for unit i with δi = 0. Imputed estimator of ηg η̂I ,g = n−1 n X {δi g (yi ) + (1 − δi )g (yi∗ )} i=1 Need E {g (yi∗ ) | δi = 0} = E {g (yi ) | δi = 0}. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 2 / 32 Introduction ML estimation under missing data setup Often, find x (always observed) such that Missing at random (MAR) holds: f (y | x, δ = 0) = f (y | x) Imputed values are created from f (y | x). Computing the conditional expectation can be a challenging problem. 1 Do not know the true parameter θ in f (y | x) = f (y | x; θ): E {g (y ) | x} = E {g (y ) | x; θ} . 2 Even if we know θ, computing the conditional expectation can be numerically difficult. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 3 / 32 Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). m 1 X ∗(j) g yi E {g (yi ) | xi } ∼ = m j=1 1 2 Bayesian approach: generate yi∗ from Z f (yi | xi , yobs ) = f (yi | xi , θ) p(θ | xi , yobs )dθ Frequentist approach: generate yi∗ from f yi | xi ; θ̂ , where θ̂ is a consistent estimator. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 4 / 32 Introduction Basic Setup (Cont’d) Thus, imputation is a computational tool for computing the conditional expectation E {g (yi ) | xi } for missing unit i. To compute the conditional expectation, we need to specify a model f (y | x; θ) evaluated at θ = θ̂. Thus, we can write η̂I ,g = η̂I ,g (θ̂). To estimate the variance of η̂I ,g , we need to take into account of the sampling variability of θ̂ in η̂I ,g = θ̂I ,g (θ̂). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 5 / 32 Introduction Basic Setup (Cont’d) Three approaches Bayesian approach: multiple imputation by Rubin (1978, 1987), Rubin and Schenker (1986), etc. Resampling approach: Rao and Shao (1992), Efron (1994), Rao and Sitter (1995), Shao and Sitter (1996), Kim and Fuller (2004), Fuller and Kim (2005). Linearization approach: Clayton et al (1998), Shao and Steel (1999), Robins and Wang (2000), Kim and Rao (2009). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 6 / 32 Comparison Model Computation Prediction Parameter update Parameter est’n Imputation Variance estimation Shu Yang, Jae Kwang Kim Bayesian Posterior distribution f (latent, θ | data) Data augmentation I-step P-step Posterior mode Multiple imputation Rubin’s formula Multiple Imputation Frequentist Prediction model f (latent | data, θ) EM algorithm E-step M-step ML estimation Fractional imputation Linearization or Bootstrap June 16, 2015 7 / 32 MultipleImputation Step 1. (Imputation) Create M complete datasets by filling in missing values with imputed values generated from the posterior predictive distribution. To create the jth imputed dataset, first generate θ∗(j) from the ∗(j) posterior distribution p(θ | Xn , yobs ), and then generate yi from the ∗(j) imputation model f (y | xi ; θ ) for each missing yi . Step 2. (Analysis) Apply the user’s complete-sample estimation procedure to each imputed dataset. Let η̂ (j) be the complete-sample estimator of η = E {g (Y )} applied to the jth imputed dataset and V̂ (j) be the complete-sample variance estimator of η̂ (j) . Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 8 / 32 Multiple Imputation Step 3. (Summarize) Use Rubin’s combining rule to summarize the results from the multiply imputed datasets. The multiple imputation estimator of η, denoted by η̂MI , is η̂MI = M 1 X (j) η̂I M j=1 Rubin’s variance estimator is 1 V̂MI (η̂MI ) = WM + 1 + M where WM = M −1 Shu Yang, Jae Kwang Kim PM j=1 BM , V̂ (j) and BM = (M − 1)−1 Multiple Imputation (j) j=1 (η̂I PM − η̂MI )2 . June 16, 2015 9 / 32 Multiple imputation Motivating Example Suppose that you are interested in estimating η = P(Y ≤ 3). Assume a normal model for f (y | x; θ) for multiple imputation. Two choices for η̂n : 1 2 Method-of-moments (MOM) estimator: η̂n1 = n−1 Maximum-likelihood estimator: η̂n2 = n−1 n X Pn i=1 I (yi ≤ 3). P(Y ≤ 3 | xi ; θ̂), i=1 where P(Y ≤ 3 | xi ; θ̂) = R3 −∞ f (y | xi ; θ̂)dy . Rubin’s variance estimator is nearly unbiased for η̂n2 , but provide conservative variance estimation for η̂n1 (30-50% overestimation of the variances in most cases). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 10 / 32 Introduction The goal is to 1 understand why MI provide biased variance estimation for MOM estimation. 2 characterize the asymptotic bias of MI variance estimator when MOM estimator is used in the complete-sample analysis; 3 give an alternative variance estimator that can provide asymptotically valid inference for MOM estimation. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 11 / 32 Multiple imputation Rubin’s variance estimator is based on the following decomposition, var(η̂MI ) = var(η̂n ) + var(η̂MI ,∞ − η̂n ) + var(η̂MI − η̂MI ,∞ ), (1) where η̂n is the complete-sample estimator of η and η̂MI ,∞ is the probability limit of η̂MI for M → ∞. Under some regularity conditions, WM term estimates the first term, the BM term estimates the second term, and the M −1 BM term estimates the last term of (1), respectively. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 12 / 32 Multiple imputation In particular, Kim et al (2006, JRSSB) proved that the bias of Rubin’s variance estimator is Bias(V̂MI ) ∼ = −2cov(η̂MI − η̂n , η̂n ). (2) The decomposition (1) is equivalent to assuming that cov(η̂MI − η̂n , η̂n ) ∼ = 0, which is called the congeniality condition by Meng (1994). The congeniality condition holds when η̂n is the MLE of η. In such cases, Rubin’s variance estimator is asymptotically unbiased. If method of moment (MOM) estimator is used to estimate η = E {g (Y )}, Rubin’s variance estimator can be asymptotically biased. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 13 / 32 MI for MOM estimator: No x variable Assume that y is observed for the first r elements. MI for MOM estimator of η = E {g (Y )}: η̂MI = r n M 1X 1 1 X X ∗(j) g (yi ) + g (yi ) n nM i=1 i=r +1 j=1 ∗(j) where yi are generated from f (y | θ∗(j) ) and θ∗(j) are generated from p(θ | yobs ). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 14 / 32 MI for MOM estimator: No x variable (Cont’d) Since, conditional on the observed data M 1 X ∗(j) g (yi ) = E [E {g (Y ); θ∗ } | yobs ] ∼ = E {g (Y ); θ̂} M→∞ M p lim j=1 and the MI estimator of η (for M → ∞) can be written η̂MI = r n−r η̂MME ,r + η̂MLE ,r . n n Thus, η̂MI is a convex combination of η̂MME ,r and η̂MLE ,r , where P η̂MLE ,r = E {g (Y ); θ̂} and η̂MME ,r = r −1 ri=1 g (yi ). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 15 / 32 MI for MOM estimator: No x variable (Cont’d) Since Bias(V̂MI ) ∼ = −2Cov (η̂MME ,n , η̂MI − η̂MME ,n ), we have only to evaluate the covariance term. Writing η̂MME ,n = p · η̂MME ,r + (1 − p) · η̂MME ,n−r , where p = r /n, we can obtain Cov (η̂MME ,n , η̂MI − η̂MME ,n ) = Cov {η̂MME ,n , (1 − p)(η̂MLE ,r − η̂MME ,n−r )} = p(1 − p)Cov {η̂MME ,r , η̂MLE ,r } − (1 − p)2 V {η̂MME ,n−r } = p(1 − p) {V (η̂MLE ,r ) − V (η̂MME ,r )} , which proves Bias(V̂MI ) ≥ 0. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 16 / 32 Bias of Rubin’s variance estimator Theorem P Let η̂n = n−1 ni=1 g (yi ) be the method of moments estimator of η = E {g (Y )} under complete response. Assume that E (V̂ (j) ) ∼ = var(η̂n ) holds for j = 1, . . . , M. Then for M → ∞, the bias of Rubin’s variance estimator is T Bias(V̂MI ) ∼ Iθ−1 ṁθ,1 , = 2n−1 (1 − p) E [var{g (Y ) | X } | δ = 0] − ṁθ,0 where p = r /n, Iθ = −E {∂ 2 log f (Y | X ; θ)/∂θ∂θT }, m(x; θ) = E {g (Y ) | x; θ}, ṁθ (x) = ∂m(x; θ)/∂θ, ṁθ,0 = E {ṁθ (X ) | δ = 0}, and ṁθ,1 = E {ṁθ (X ) | δ = 1}. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 17 / 32 Bias of Rubin’s variance estimator Under MCAR, the bias simplifies to Bias(V̂MI ) ∼ = 2p(1 − p){var(η̂r ,MME ) − var(η̂r ,MLE )}, where η̂r ,MME = r −1 Pr i=1 g (yi ) and η̂r ,MLE = r −1 Pr i=1 E {g (Y ) | xi ; θ̂}. This shows that Rubin’s variance estimator is unbiased if and only if the method of moments estimator is as efficient as the maximum likelihood estimator, that is, var(η̂r ,MME ) ∼ = var(η̂r ,MLE ). Otherwise, Rubin’s variance estimator is positively biased. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 18 / 32 Rubin’s variance estimator can be negatively biased Now consider a simple linear regression model which contains one covariate X and no intercept. By Theorem 1, 2 ∼ 2(1 − p)σ Bias(V̂MI ) = n E (X | δ = 0)E (X | δ = 1) 1− E (X 2 | δ = 1) , which can be zero, positive or negative. If E (X | δ = 0) > E (X 2 | δ = 1) E (X | δ = 1) the Rubins’ variance estimator can be negatively biased. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 19 / 32 Alternative variance estimation Decompose var (η̂MI,∞ ) = n−1 V1 + r −1 V2 , where V1 = var{g (Y )} − (1 − p)E [var{g (Y ) | X } | δ = 0], and T V2 = ṁθT Iθ−1 ṁθ − p 2 ṁθ,1 Iθ−1 ṁθ,1 . The first term n−1 V1 , is the variance of the sample mean of δi g (yi ) + (1 − δi )m(xi ; θ). The second term r −1 V2 , reflects the variability associated with the estimated value of θ instead of the true value θ in the imputed values. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 20 / 32 Alternative variance estimation Estimate n−1 V1 The first term n−1 V1 can be estimated by W̃M = WM − CM , where P (j) WM = M −1 M j=1 V̂ , and CM M X n X 1 = 2 n (M − 1) ( ∗(k) g (yi ) k=1 i=r +1 M 1 X ∗(k) − g (yi ) M )2 . k=1 ∼ n−1 var{g (Y )} and since E {WM } = E (CM ) ∼ = n−1 (1 − p)E [var{g (Y ) | X } | δ = 0]. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 21 / 32 Alternative variance estimation Estimate r −1 V2 To estimate the second term term, we use over-imputation: Table: Over-Imputation Data Structure Shu Yang, Jae Kwang Kim Row 1 .. . ∗(1) g1 1 r r +1 .. . gr ∗(1) gr +1 .. . n gn average η̄n ... ... .. . ∗(1) ∗(1) ∗(1) M ∗(M) g1 .. . ∗(M) ḡr∗ ḡr∗+1 .. . ∗(M) ḡn∗ ∗(M) η̄n∗ ... ... gr ∗(M) gr +1 .. . ... gn ... η̄n Multiple Imputation average ḡ1∗ .. . June 16, 2015 22 / 32 Alternative variance estimation Estimate r −1 V2 The second term r −1 V2 can be estimated by DM = DM,n − DM,r , where −1 DM,n = (M −1) M X k=1 DM,r = (M −1)−1 since n X ∗(k) di )2 −(M −1)−1 i=1 M X k=1 n −2 n X ∗(k) (di )2 , i=1 r M M r X X X X ∗(k) ∗(k) di )2 −(M −1)−1 n−2 (di )2 . (n−1 k=1 with (n −1 i=1 P ∗(k) ∗(l) = g (yi ) − M −1 M ), l=1 g (yi −1 −1 T ∼ E (DM,n ) = r ṁθ Iθ ṁθ and E (DM,r ) k=1 i=1 (k) di Shu Yang, Jae Kwang Kim Multiple Imputation T I −1 ṁ . ∼ = r −1 p 2 ṁθ,1 θ,1 θ June 16, 2015 23 / 32 Alternative variance estimation Theorem Under the assumptions of Theorem 1, the new multiple imputation variance estimator is V̂MI = W̃M + DM + M −1 BM , where W̃M = WM − CM , and BM being the usual between-imputation variance. V̂MI is asymptotically unbiased for estimating the variance of the multiple imputation estimator as n → ∞. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 24 / 32 Simulation study Set up 1 Samples of size n = 2, 000 are independently generated from Yi = βXi + ei , where β = 0.1, Xi ∼ exp(1) and ei ∼ N(0, σe2 ) with σe2 = 0.5. Let δi be the response indicator of yi and δi ∼ Bernoulli(pi ), where pi = 1/{1 + exp(−φ0 − φ1 xi )}. We consider two scenarios: (i) (φ0 , φ1 ) = (−1.5, 2) and (ii) (φ0 , φ1 ) = (3, −3) The parameters of interest are η1 = E (Y ) and η2 = pr(Y < 0.15). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 25 / 32 Simulation study Result 1 Table: Relative biases of two variance estimators and mean width and coverages of two interval estimates under two scenarios in simulation one Relative bias (%) Scenario Rubin New 1 η1 96.8 0.7 η2 123.7 2.9 2 η1 -19.8 0.4 η2 -9.6 -0.4 C.I., confidence interval; η1 Shu Yang, Jae Kwang Kim Mean Width Coverage for 95% C.I. for 95% C.I. Rubin New Rubin New 0.038 0.027 0.99 0.95 0.027 0.018 1.00 0.95 0.061 0.069 0.91 0.95 0.037 0.039 0.93 0.95 = E (Y ); η2 = pr(Y < 0.15). Multiple Imputation June 16, 2015 26 / 32 Simulation study Result 1 For η1 = E (Y ), under scenario (i), the relative bias of Rubin’s variance estimator is 96.8%, which is consistent with our result with E1 (X 2 ) − E0 (X )E1 (X ) > 0, where E1 (X 2 ) = 3.38, E1 (X ) = 1.45, and E0 (X ) = 0.48. Under scenario (ii), the relative bias of Rubin’s variance estimator is −19.8%, which is consistent with our result with E1 (X 2 ) − E0 (X )E1 (X ) < 0, where E1 (X 2 ) = 0.37, E1 (X ) = 0.47, and E0 (X ) = 1.73. The empirical coverage for Rubin’s method can be over or below the nominal coverage due to variance overestimation or underestimation. On the other hand, the new variance estimator is essentially unbiased for these scenarios. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 27 / 32 Simulation study Set up 2 Samples of size n = 200 are independently generated from Yi = β0 + β1 Xi + ei , where β = (β0 , β1 ) = (3, −1), Xi ∼ N(2, 1) and ei ∼ N(0, σe2 ) with σe2 = 1. The parameters of interest are η1 = E (Y ) and η2 = pr(Y < 1). We consider two different factors: 1 2 The response mechanism: MCAR and MAR: For MCAR, δi ∼ Bernoulli(0.6). For MAR, δi ∼ Bernoulli(pi ), where pi = 1/{1 + exp(−0.28 − 0.1xi )} with the average response rate about 0.6. The size of multiple imputation, with two levels M = 10 and M = 30. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 28 / 32 Simulation study Result 2 Table: Relative biases of two variance estimators and mean width and coverages of two interval estimates under two scenarios of missingness in simulation two Relative Bias Mean Width Coverage (%) for 95% C.I. for 95% C.I. M Rubin New Rubin New Rubin New Missing completely at random η1 10 -0.9 -1.58 0.240 0.250 0.95 0.95 30 -0.6 -1.68 0.230 0.235 0.95 0.95 η2 10 22.7 -1.14 0.083 0.083 0.98 0.95 30 23.8 -1.23 0.082 0.075 0.98 0.95 Missing at random η1 10 -1.0 -1.48 0.23 0.25 0.95 0.95 30 -0.9 -1.59 0.231 0.23 0.95 0.95 η2 10 20.7 -1.64 0.081 0.081 0.98 0.95 30 21.5 -1.74 0.074 0.071 0.98 0.95 C.I., confidence interval; η1 = E (Y ); η2 = pr(Y < 1). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 29 / 32 Simulation study Result 2 Both Rubin’s variance estimator and our new variance estimator are unbiased for η1 = E (Y ). Rubin’s variance estimator is biased upward for η2 = pr(Y < 1), with absolute relative bias as high as 24%; whereas our new variance estimator reduces absolute relative bias to less than 1.74%. For Rubin’s method, the empirical coverage for η2 = pr(Y < 1) reaches to 98% for 95% confidence intervals, due to variance overestimation. In contrast, our new method provides more accurate coverage of confidence interval for both η1 = E (Y ) and η2 = pr(Y < 1) at 95% levels. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 30 / 32 Conclusion Investigate asymptotic properties of Rubin’s variance estimator. If method of moment is used, Rubin’s variance estimator can be asymptotically biased. New variance estimator, based on multiple over-imputation, can provide valid variance estimation in this case. Our method can be extended to a more general class of parameters obtained from estimating equations. n X U(η; xi , yi ) = 0. i=1 This is a topic of future study. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 31 / 32 The end Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 32 / 32