8. Multiple Imputation: Part 2 1 Introduction • yi (i = 1, · · · , n): random sample from a joint density f (y; θ). • The first r elements are observed and the rest n − r elements are missing. • Assume MAR (Missing At Random) • The parameter of interest is η = E{g(Y )} = R g(y)f (y; θ)dy. Thus, η is a function of θ. For example, g(Y ) = I(Y < 1) leads to η = P (Y < 1). • MI estimator is justified when η̂n (complete sample estimator of η) is the MLE of η. That is, η̂n = η(θ̂M LE,n ). • What if η̂n is a MME (Method of Moment estimator) of η? The MME of η is n η̂M M E,n 1X = g(yi ). n i=1 • MI estimator of η: η̂M I where (k) η̂M M E,I ∗(k) where yi 1 = n M 1 X (k) = η̂ M k=1 M M E,I ( r X i=1 g(yi ) + n X ) ∗(k) g(yi ) i=r+1 are generated from f (y; θ∗(k) ) and θ∗(k) are generated from p(θ | yobs ). −1 • Under some regularity conditions, p(θ | yobs ) converges in distribution to N (θ̂, Iobs ) where θ̂ is the MLE of θ and Iobs is the information matrix obtained from the observed likelihood of θ. 1 • MI variance estimator: V̂M I = WM 1 + 1+ M BM (1) where WM M 1 X (k) = V̂ M k=1 n,I M BM = 2 1 X (k) η̂M M E,I − η̂M I . M − 1 k=1 Here, (k) V̂n,I " r # n X X 1 (k) ∗(k) (k) 2 2 {g(yi ) − η̂M M E,I } + {g(yi ) − η̂M M E,I } = n(n − 1) i=1 i=r+1 is the variance estimator of η̂M M E,n applied to the k-th imputed data. • Is the MI variance estimator in (1) approximately unbiased for V (η̂M I )? • The answer is “Yes” if the variance satisfies V (η̂M I ) = V (η̂M M E,n ) + V (η̂M I − η̂M M E,n ). (2) It can be shown that . E{WM } = V (η̂M M E,n ) (3) and E . 1 + M −1 BM = V (η̂M I − η̂M M E,n ) (4) Kim et al (2006) proved (3) and (4) for more general cases. • Meng (1994) called condition (2) congeniality condition. • What is missing in (2) is the covariance term. That is, the true variance is V (η̂M I ) = V (η̂M M E,n ) + V (η̂M I − η̂M M E,n ) + 2Cov(η̂M M E,n , η̂M I − η̂M M E,n ). • We will show in Section 2 that 2Cov(η̂M M E,n , η̂M I − η̂M M E,n ) ≤ 0, which implies that MI variance estimator overestimates the true variance. 2 (5) 2 Congeniality • MI estimator of η: η̂M I ∗(j) where yi r n M 1 1 X X 1X ∗(j) g(yi ) + g(yi ) = n i=1 n M i=r+1 j=1 are generated from f (y; θ∗(j) ) and θ∗(j) are generated from p(θ | yobs ). • If we define η̂M LE,r = E{g(Y ); θ̂} r 1X η̂M M E,r = g(yi ), r i=1 then M 1 X ∗(j) g(yi ) = E [E {g(Y ); θ∗ } | yobs ] = E{g(Y ); θ̂} p lim M →∞ M j=1 and the MI estimator of η (for M → ∞) can be written η̂M I = r n−r η̂M M E,r + η̂M LE,r . n n Thus, η̂M I is a convex combination of η̂M M E,r and η̂M LE,r . Note that ∂η ∂η −1 V (η̂M LE,r ) = Iobs 0 ∂θ ∂θ 1 V (η̂M M E,r ) = V {g(Y )}. r In general, we have V (η̂M M E,r ) ≥ V (η̂M LE,r ). • Writing η̂M M E,n = p · η̂M M E,r + (1 − p) · η̂M M E,n−r , where p = r/n, we can express Cov(η̂M M E,n , η̂M I − η̂M M E,n ) = Cov{η̂M M E,n , (1 − p)(η̂M LE,r − η̂M M E,n−r )} = (1 − p)Cov{η̂M M E,n , η̂M LE,r } − (1 − p)Cov{η̂M M E,n , η̂M M E,n−r } = p(1 − p)Cov{η̂M M E,r , η̂M LE,r } − (1 − p)2 V {η̂M M E,n−r }. 3 Using Cov{η̂M M E,r , η̂M LE,r } = V {η̂M LE,r } and (1 − p)2 V {η̂M M E,n−r } = p(1 − p)V {η̂M M E,r }, we have Cov(η̂M M E,n , η̂M I − η̂M M E,n ) = p(1 − p) {V (η̂M LE,r ) − V (η̂M M E,r )} which proves (5). 3 Application to survey sampling • Finite population of N units identified by U = {1, · · · , N }. Let (xi , yi ) be the measurement for unit i in the population. • Let Ii = 1 if unit i is selected in the sample and Ii = 0 otherwise. Let A = {i; Ii = 1} be the index set of the sample. • Assume that x is always observed and y is subject to missingness. Let δi = 1 if yi is observed and δi = 0 otherwise. • Suppose that we are interested in estimating Y = PN i=1 yi . Let Ŷn = P i∈A wi yi be the full sample estimator of Y under complete response. • MI estimator of Y : ŶM I M 1 X (k) = Ŷ M k=1 I (6) where (k) ŶI = X ∗(k) wi {δi yI + (1 − δi )yi }. i∈A ∗(k) • How to obtain yi in multiple imputation? 1. Assume SMAR (Sample Missing At Random): f (y | x, I = 1, δ = 1) = f (y | x, I = 1, δ = 0) 2. Estimate f (y | x, I = 1, δ = 1). 3. Generate yi∗ ∼ fˆ(y | xi , Ii = 1, δi = 1). • How to achieve SMAR in (7)? 4 (7) • Lemma: Assume that 1. If the sampling mechanism is complete determined by x, that is, P (I = 1 | x, y) = P (I = 1 | x) (8) for any y, and 2. PMAR holds (i.e. P (δ = 1 | x, y) = P (δ = 1 | x)), then (7) holds. Proof. Using the Bayes, f (y | x, I = 1, δ = 0) = P (I = 1 | x, y, δ = 0) f (y | x, δ = 0). P (I = 1 | x, δ = 0) By (8), we have f (y | x, I = 1, δ = 0) = f (y | x, δ = 0). Similarly, we can establish that f (y | x, I = 1, δ = 1) = f (y | x, δ = 1). By PMAR, we have f (y | x, δ = 0) = f (y | x, δ = 1), which proves (7). • In stratified sampling, for example, we should include the stratum indicator functions into the model. (Augment x to include all the design variables.) • MI variance estimation: Use (1) with WM M 1 X (k) = V̂ M k=1 n,I BM 2 1 X (k) = ŶI − ŶM I . M − 1 k=1 M (k) (k) where V̂n,I is the naive variance estimator of ŶI applied to the k-th multiple imputation data. • In computing WM , Rubin (1987) suggest using a design-based variance estimator for V̂n . • Let’s consider multiple imputation under the linear regression model yi = x0i β + ei , ei ∼ N (0, σ 2 ). That is, the regression model holds for the sample. Also, assume SMAR. 5 • To implement multiple imputation, we assume that the first r units are the respondents. Let yr = (y1 , y2 , · · · , yr )0 and Xr0 = (x1 , x2 , · · · , xr ) with xi = 0 = (xr+1 , xr+2 , · · · , xn ). (1, xi )0 . Also, let yn−r = (yr+1 , yr+2 , · · · , yn )0 and Xn−r The Bayesian regression imputation procedure is used as follows: [Posterior Step] Draw i.i. σ ∗2 | yr ∼ (r − p) σ̂r2 /χ2r−p , (9) and ∗ ∗ i.i. β | (yr , σ ) ∼ N −1 β̂ r , (Xr0 Xr ) σ ∗2 , (10) where σ̂r2 = (r − p)−1 yr0 I − Xr (Xr0 Xr )−1 Xr0 yr and β̂r = (Xr0 Xr )−1 Xr0 yr . [Imputation Step] For each missing unit j = r + 1, · · · , n, draw i.i. e∗j | (β ∗ , σ ∗ ) ∼ N 0, σ ∗2 . Then, yj∗ = x0j β ∗ + e∗j is the imputed value associated with unit j. • Kim (2004) suggested using χ2r−p+2 in (9) to improve small sample bias. (To make σ ∗2 unbiased. ) • Note that ∗(k) yi ∗(k) = x0i β̂ + x0i (β ∗(k) − β̂) + ei and ∗(k) ŶI = ŶM I,∞ + X wi x0i (β ∗(k) − β̂) + i∈AM X ∗(k) wi ei , (11) i∈AM where ŶM I,∞ = X wi yi + i∈AR X wi x0i β̂, i∈AM AR is the set of respondents and AM is the set of nonrespondents. • The three terms in (11) are mutually independent. Thus, ) ( X X X 1 V (ŶM I ) = V (ŶM I,∞ ) + ( wi xi )0 (Xr0 Xr )−1 ( wi xi ) + ( wi )2 σ 2 . M i∈A i∈A i∈A M M M (12) 6 • To discuss variance estimation, we use V (ŶM I ) = V (Ŷn ) + V (ŶM I − Ŷn ) + 2Cov(Ŷn , ŶM I − Ŷn ). The first component is estimated by naive variance estimator. That is, we have . E{WM } = V (Ŷn ). For the second part, we have E{(1 + M −1 )BM } = V (ŶM I − Ŷn ). (13) A proof of (13) is given in Appendix A. • It remains to check whether the third term, the covariance term, is zero. Note that Cov(Ŷn , ŶM I − Ŷn ) = Cov{ X X wi yi + i∈AR = Cov{ X i∈AM wi yi , i∈AR = Cov{ X X X + wi x0i β̂, X X −V{ wi yi } i∈AM X X wi yi } i∈AM X wi (yi − x0i β̂), wi x0i β̂} − V { i∈AM wi wj x0i (Xr0 Xr )−1 xj σ 2 − X X X wi x0i β̂} − V { i∈AM i∈AR i∈AR = i∈AM wi x0i β̂} X wi x0i β̂ − i∈AM wi x0i β̂ i∈AR = Cov{ X wi yi , i∈AM wi yi } i∈AM X wi2 σ 2 . i∈AM i∈AR j∈AM If wi = a0 xi for some a, we have X X 0 wi wj x0i (Xr0 Xr )−1 xj = a0 (Xr0 Xr )(Xr0 Xr )−1 (Xn−r Xn−r )a i∈AR j∈AM 0 = a0 (Xn−r Xn−r )a X = wi2 . i∈AM Thus, MI variance estimator is approximately unbiased when wi is included in the column space of X. Reference Kim, J.K. (2004). “Finite sample properties of multiple imputation estimators,” The Annals of Statistics, 32, 766-783. 7 wi yi } Kim, J.K., Brick, M.J., Fuller, W.A., and Kalton, G. (2006). “On the bias of the multiple imputation variance estimator in survey sampling,” Journal of the Royal Statistical Society: Series B, 68, 509-521. Meng, X. L. (1994). Multiple-imputation inferences with uncongenial sources of input (with discussion). Statistical Science, 9:538–573. Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, New York. Appendix A. Proof of (13) Let ŶM I = M −1 PM k=1 (k) ŶI be the MI estimator of Y . We first use that X X (i) 1 1 (j) (ŶI − ŶI )2 2 M (M − 1) i j BM = (1) (M ) and, since ŶI , · · · , ŶI are identically distributed, (1) E(BM ) = 0.5V (ŶI (2) − ŶI ). Thus, using (11), we have ) ( X X X wi )2 σ 2 . w i xi ) + ( wi xi )0 (Xr0 Xr )−1 ( E(BM ) = ( (14) i∈AM i∈AM i∈AM Now, using (12), note that 1 V (ŶM I −Ŷn ) = V (ŶM I,∞ −Ŷn )+ M ( ) ( X wi xi )0 (Xr0 Xr )−1 ( X w i xi ) + ( i∈AM i∈AM X wi )2 σ2 i∈AM (15) and V (ŶM I,∞ − Ŷn ) = V { X wi x0i β̂ − i∈AM X wi yi } i∈AM ( = ( ) X wi xi )0 (Xr0 Xr )−1 ( i∈AM X wi xi ) + ( i∈AM Thus, inserting (16) into (15) and using (14), we prove (13). 8 X i∈AM wi )2 σ 2 . (16)