5. Fractional Hot deck Imputation 1 Introduction • Suppose that we are interested in estimating θ1 = E(Y ) or even θ2 = P r(Y < c) where y ∼ f (y | x) where x is always observed and y is subject to missingness. • Assume MAR in the sense that P r(δ = 1 | x, y) does not depend on y. • We do not want to make strong parametric assumptions on f (y | x). • If x is a categorical variable with support {1, · · · , G}, then f (y | x = g) can be described as i.i.d y | (x = g) ∼ (µg , σg2 ) (1) which is sometimes called cell mean model. • Even when x is not categorical, one can consider an approximation f (y | x) ∼ = G X πg (x)fg (y) g=1 where πg (x) = P (z = g | x) and fg (y) = f (y | z = g). Such approximation is accurate when f (y | x, group) = f (y | group). (2) • Group g is often called imputation cell. • Under MAR, one imputation method is to take a random sample from the set of respondents in the same imputation cell. • Hot deck imputation 1 – Partition the sample into G groups: A = A1 ∪ A2 ∪ · · · ∪ AG . – In group g, we have ng elements and rg respondents. We need to impute y values for the nonrespondents. – For each group Ag , select mg = ng −rg imputed values from rg respondents with replacement (or without replacement). • Example 1 – Ag = ARg ∪ AM g with ARg = {i ∈ Ag ; δi = 1} and AM g = {i ∈ Ag ; δi = 0}. – Imputation mechanism: yj∗ ∼ U nif orm {yi ; i ∈ ARg }. That is, yj∗ = yi with probability 1/rg for i ∈ ARg and j ∈ AM g . – Note that 1 X yi ≡ ȳRg rg i∈S Rg X 1 2 VI (yj∗ ) = (yi − ȳRg )2 ≡ S̃Rg rg i∈S EI (yj∗ ) = Rg where EI (·) is the expectation taken with respect to the imputation mechanism. – Imputed estimator of θ = E(Y ): n 1X {δi yi + (1 − δi ) yi∗ } θ̂I = n i=1 – Variance n o n o V θ̂I = V EI θ̂I + E VI θ̂I ) ( ) ( G G X X 2 = V n−1 ng ȳRg + E n−2 mg S̃Rg . g=1 g=1 • Under the model i.i.d. yi | (i ∈ Sg ) ∼ µg , σg2 , the variance can be written ( ) ( ) G G X X m (m − 1) g g V θ̂I = V n−1 ng µg + E n−2 ng + 2mg + σg2 . r g g=1 g=1 2 Note that V θ̂n = V ( n−1 G X ) ng µg ( +E n−2 G X ) ng σg2 . g=1 g=1 Thus, the variance is increased. • Variance is increased after imputation. Two sources: 1. Reduced sample size. 2. Randomness due to imputation mechanism in stochastic imputation. • Variance estimation after imputation: A Naive approach is treating the imputed values as if observed and applying the standard variance estimation formula to the imputed data. Such naive approach underestimates the true variance ! • Lemma 5.1 If E(yi∗ ) = E(yi ) and V (yi∗ ) = V (yi ), the naive variance estimator V̂I = n−1 SI2 has expectation E{V̂I } = V (θ̂n ) − o 1 n V (θ̂n ) − V (θ̂I ) ∼ = V (θ̂n ). n−1 (3) A proof of Lemma 5.1 is given in Appendix A. • Example 1 (Continued): Note that V (yi∗ ) = E {VI (yi∗ )} + V {EI (yi∗ )} 2 = V (ȳRg ) + E{S̃Rg } = rg−1 σg2 + (1 − rg−1 )σg2 = σg2 = V (yi ). Thus, the assumptions for (3) are satisfied and obtain ( ) ( G ) G n o X X 1 m m (m − 1) n g g g g E V̂I = V n−1 ng µg + E ng − −2 − σg2 . n (n − 1) n n nr g g=1 g=1 Thus, we can write V θ̂I = V θ̂n + E ( G X ) cg σg2 g=1 for some cg . The (approximate) bias-corrected estimator is V̂ = V̂I + G X g=1 3 2 cg SRg . 2 Fractional Hot deck imputation • For each j ∈ AM g , select M imputed values from ARg using SRS (simple random P sampling) without replacement. Let ȳIj = M −1 i∈AR dij yi be the mean of the M imputed values for yj where dij = 1 if yi is selected as one of the imputed values for missing yj . • The resulting fractional hot deck imputation (FHDI) estimator of θ1 = E(Y ) is n θ̂F HDI,1 1X = {δj yj + (1 − δj )ȳIj } n j=1 ( ) n X 1X ∗ = δj yj + (1 − δj ) wij yi n j=1 i∈A (4) R ∗ = dij /M . Also, the FHDI estimator of θ2 = P (Y < 1) is where wij ( ) n M X 1X ∗ θ̂F HDI,2 = δi I(yi < 1) + (1 − δi ) wij I(yij∗ < 1) . n i=1 j=1 Thus, one set of FHDI data can be used to estimate several parameters. • How to estimate the variance of FHDI estimators ? • Kim and Fuller (2004) considered a modified jackknife method where the fractional weights are modified to estimate the variance correctly. • Jackknife method for complete data: For θ̂n = ȳn , the jackknife variance estimator is defined by V̂JK (ȳn ) = n X n−1 k=1 where ȳn(k) = n n X ȳn(k) − ȳn 2 (k) wi yi i=1 and (k) wi = 0 (n − 1)−1 if i = k otherwise. Note that ȳn(k) − ȳn = − 4 yk − ȳn n−1 and so n X n−1 V̂JK (ȳn ) = ȳn(k) − ȳn n k=1 2 n X 1 = (yk − ȳn )2 n(n − 1) k=1 = n−1 Sn2 and it is unbiased for V (ȳn ). Similarly, the jackknife variance estimator of θ̂n,2 = P n−1 ni=1 I(yi < 1) is computed by V̂JK (θ̂n,2 ) = n X n−1 n k=1 where (k) θ̂n,2 = n X (k) θ̂n,2 − θ̂n,2 2 (k) wi I(yi < 1). i=1 Thus, jackknife is easy to implement as we have only to repeat the same esti(k) mation procedure using wi instead of wi . • Now, to estimate the FHDI estimator in (4), first note that the FHDI estimator of θ1 = E(Y ) can be written as θ̂F HDI,1 = n X δi α i y i i=1 where 1 X 1+ dij M j∈A 1 αi = n ! M and dij = 1 if yi is selected as one of the imputed values for missing yj . The variance of θ̂F HDI,1 is then V θ̂F HDI,1 = V ( n−1 G X ) ng µg +E G X X g=1 g=1 αi2 σg2 . (5) i∈ARg • The modified jackknife method of Kim and Fuller (2004) is computed by V̂JK (θ̂F HDI,1 ) = n X n−1 n k=1 5 (k) θ̂F HDI,1 − θ̂F HDI,1 2 , (6) where (k) θ̂F HDI,1 = n X ( (k) wj ) δj yj + (1 − δj ) j=1 and X ∗(k) wij dij yi := i∈AR X (k) αi y i i∈AR ∗ wij − φk w∗ + φk (M − 1)−1 = ij ∗ wij if i = k and dkj = 1 if i 6= k and dkj = dij = 1 otherwise P ∗(k) Here, φk ∈ (0, 1/M ) is a constant to be determined. (Note that j wij = 1.) ∗(k) wij • The expected value of the modified jackknife variance estimator in (6) is given by ( E{V̂JK (θ̂F HDI,1 )} = V n−1 G X ) ng µg +E n G X n−1X X g=1 k=1 n (k) αi − αi 2 σg2 g=1 i∈ARg (7) • To obtain the variance of FHDI estimator correctly, we want to achieve n X X (k) αi − αi 2 = k=1 i∈ARg n X 2 αi . n − 1 i∈A (8) Rg • There are two approaches of computing the suitable set of replicated fractional weights that satisfy (8). One is a bisection method and the other is a closed form solution described in Appendix A. • To describe the bisection method, we use the following steps: (0) 1. Set φk = 1/(2M ). (t) 2. Using the current value of φk , compute Q∗g (φ(t) ) = n X X (k) αi − αi k=1 i∈Ag (t+1) 3. If Q∗g (φ(t) ) < 0, then set φk 2 − n X 2 αi . n − 1 i∈A Rg (t) = φk + (1/2)t+1 (1/M ) for all k ∈ ARg . (t+1) If Q∗g (φ(t) ) > 0, then set φk (t) = φk − (1/2)t+1 (1/M ) for all k ∈ ARg . Continue this update for g = 1, · · · , G. 4. Continue the process until Q∗g (φ(t) ) < for a sufficiently small > 0. 6 . 3 Fully efficient fractional imputation • Under the cell mean model (1), the best imputation is to use the cell mean imputation: θ̂Id = n−1 G X X {δi yi + (1 − δi )ȳRg } , g=1 i∈Ag which is algebraically equivalent to fully efficient fractional imputation (FEFI) estimator θ̂F EF I = n−1 G X X g=1 j∈Ag δj yi + (1 − δj ) X ∗ yi wij i∈ARg , (9) ∗ = 1/rg . The FEFI estimator uses all the respondents in the same cell with wij as the imputed values for each missing value. Note that (9) is equivalent to θ̂F EF I = n −1 G X ng X g=1 rg yi , (10) i∈ARg which is equivalent to using π̂g−1 = ng /rg as the nonresponse adjustment factor that is multiplied to the original weight of the respondents in the cell g. • For the fractional hot deck imputation estimator θ̂F HDI in Section 2, we can obtain V (θ̂F HDI ) = V θ̂F EF I + V θ̂F HDI − θ̂F EF I (11) because we have EI (θ̂F HDI ) = θ̂F EF I . The second term is the variance due to random selection in the fractional hot deck imputation and will be zero if M → ∞. • How to reduce the second term in (11) ? 1. Increase M . 2. Use calibration weighting. 3. Use a balanced sampling mechanism to select donors. • Calibration weighting approach (Fuller and Kim, 2005): 7 1. For each j ∈ AM g , we select M elements from {yi ; i ∈ ARg } at random with equal probability. ∗(0) 2. The initial fractional weight for each donor is wij = 1/M for dij = 1. 3. The fractional weights are modified to satisfy X X X yi + i∈ARg ∗ wij dij yi = ng ȳRg j∈AM g i∈ARg and X ∗ dij = 1. wij i∈ARg One solution is ∗(0) ∗ wij = wij where ȳIj = P i∈ARg ∗(0) + (ȳRg − ȳIg )Tg−1 wij dij (yi − ȳIj ) (12) P P ∗(0) ∗(0) wij yi , Tg = j∈AM g i∈ARg wij dij (yi − ȳIj )2 , and X X 1 ȳIg = ȳIj . yi + ng j∈AM g i∈ARg • Modifying the weights to satisfy certain constraints is a popular problem in statistics. Such weighting is often called calibration weighting. Regression weighting is of the form in (12) and is extensively discussed in survey sampling courses (Stat 521, Stat 621). • Balanced imputation approach (Chauvet et al., 2011): Apply a balanced sampling technique to achieve that X i∈ARg yi + X j∈AM g M −1 X dij yi = ng ȳRg . i∈ARg • Note: Balanced sampling is a set of sampling method that satisfies some constraints: X w i zi = i∈A X zi i∈U where wi is the design weight ( = inverse of the first-order inclusion probability) and zi is the design variable. Stratified sampling is one example of balanced 8 sampling when zi is categorical. For more general case, we may use Cube method (Deville and Tillé, 2004) or rejective method (Fuller, 2009). • For variance estimation, the first term of (11) is easy to estimate because we can easily take into account of the sampling variability of π̂g into estimation. That is, writing π̂ = (π̂1 , · · · , π̂G ), we can express θ̂F EF I = θ̂F EF I (π̂). To estimate the variance of θ̂F EF I , we need to incorporate the sampling variability of π̂ in θ̂F EF I . Either Taylor linearization or replication method can be used. • Linearization method: Let Ug (πg ) = P i∈Ag (δi − πg ) be an estimating function for πg . (i.e. π̂g is obtained by solving Ug (πg ) = 0 for πg .) Now, using ( ) G X ∂ θ̂ F EF I E (π̂g − πg ) θ̂F EF I (π̂) ∼ = θ̂F EF I (π) + ∂π g g=1 and 0 = Ug (π̂g ) ∼ = Ug (πg ) + E ∂Ug ∂πg (π̂g − πg ), we have ! −1 ∂ θ̂ ∂Ug F EF I ∼ E θ̂F EF I (π̂) = θ̂F EF I (π) − E Ug (πg ) ∂πg ∂πg g=1 G G 1X rg 1X X 1 ng − yi + µg = n g=1 i∈A πg n g=1 πg G X Rg = 1 n G X X δi µg + (yi − µg ) πg g=1 i∈A g n 1X ηi := n i=1 where ηi = µg + (δi /πg )(yi − µg ) if i ∈ Ag . Once ηi is calculated, we can apply a standard variance formula to η̂i = µ̂g + (δi /π̂g )(yi − µ̂g ) to obtain the linearized variance estimator. • Replication method (such as jackknife) is straightforward. 9 4 FHDI method using a parametric model • Assume y ∼ f (y | x) where x is always observed and y is subject to missingness. • Now suppose that (2) does not hold and the cell mean model (1) is not satisfied. (That is, we does not create imputation cells.) But, we still want to take the real observations as the imputed values. • Kim and Yang (2014) approach: Three steps 1. Fully efficient fractional imputation (FEFI) by choosing all the respondents as donors. That is, we use M = r imputed values for each missing unit, where r is the number of respondents in the sample. Compute the fractional weights. 2. Use a systematic PPS sampling to select m (<< r) donors from the FEFI using the fractional weights as the size measure. 3. Use a calibration weighting technique to compute the final fractional weights (which lead to the same estimates of FEFI for some items). • Step 1: FEFI step ∗(j) ∗ – Want to find the fractional weights wij when the j-th imputed value yi is taken from the j-th value in the set of the respondents. – Without loss of generality, we assume that the first r elements respond and ∗(j) write yi = yj . – Recall that ∗(j) ∗ wij ∝ f (yi ∗(j) when yi ∗(j) | xi ; θ̂)/h(yi | xi ) are generated from h(y | xi ). ∗(j) – We have only to find h(yi ∗(j) | xi ) when we use yi = yj . – We can treat {yi ; δi = 1} as a realization from f (y | δ = 1), the marginal distribution of y among respondents. 10 – Now, we can write Z f (yj |δj = 1) = f (yj | x, δj = 1) f (x | δj = 1)dx Z f (yj | x) f (x | δj = 1)dx = n 1X ∼ δk f (yj | xk ) . = r k=1 ∗(j) – Thus, the fractional weight for yi ∗ ∝ Pn wij = yj becomes f (yj | xi ; θ̂) k=1 δk f (yj with P j∈AR | xk ; θ̂) (13) ∗ wij = 1, where θ̂ is computed from n X δi S(θ; xi , yi ) = 0 i=1 with S(θ; x, y) = ∂ log f (y | x; θ)/∂θ. • Step 2: Sampling Step – FEFI uses all the elements in AR as donors for each missing i. – Want to reduce the number of donors to, say, m = 10. – For each i, we can treat the FEFI donor set as the weighted population and apply a sampling method to select a smaller set of donors. – Fractional weights (13) for FEFI can be used as the selection probabilities for the PPS sampling. – That is, our goal is to obtain a (systematic) PPS sample Di of size m from ∗ the FEFI donor set of size M = r, using wij as the selection probability P ∗ ∗ assigned to the j-th element in AR . (Note that wij satisfies M j=1 wij = 1 ∗ and wij > 0.) • Step 3: Calibration Step – After we select Di from the complete set of respondents, the selected donors ∗ in Di are assigned with the initial fractional weights wij0 = 1/m. 11 – The fractional weights are further adjusted to satisfy n X {(1 − δi ) i=1 X ∗ wij,c q(xi , yj )} i=1 j∈Di for some q(xi , yj ), and = n X P j∈Di {(1 − δi ) X ∗ wij q(xi , yj )}, (14) j∈AR ∗ ∗ is = 1 for all i with δi = 0, where wij wij,c the fractional weights for FEFI method, as defined in (13). – Regarding the choice of the control function q(x, y) in (14), we can use q(x, y) = (y, y 2 )0 , which will lead to fully efficient estimates for the mean and the variance of y. • For variance estimation, replication method can be used. The imputed values are not changed, only the fractional weights are changed for each replication. (Details skipped) Reference Chauvet, G., Deville J.C. and Haziza, D. (2011). On balanced random imputation in surveys. Biometrika, 98, 459-471. Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: The cube method. Biometrika, 91, 893-912. Fuller, W.A. and Kim, J.K. (2005). Hot deck imputation for the response model, Survey Methodology, 31, 139-149. Kim, J.K. and Fuller, W.A. (2004). Fractional hot deck imputation, Biometrika, 91, 559-578. Kim, J.K. and Yang, S. (2014). Fractional hot deck imputation for robust inference under item nonresponse in survey sampling, Survey Methodology, Accepted for publication. 12 Appendix A: Proof of Lemma 5.1 Write ηi = yi if δi = 1 and ηi = yi∗ if δi = 0. Since, 1 n(n − 1) V̂I = n X ηi2 − i=1 1 n−1 n 1X ηi n i=1 !2 , we have E{V̂I } = = = = = !2 n X 1 1 1 E(ηi2 ) − E ηi n(n − 1) i=1 n − 1 n i=1 !2 n n X X 1 1 1 2 E(ηi ) − E(ηi ) + V (θ̂I ) n(n − 1) i=1 n − 1 n i=1 !2 n n 1X X 1 1 E(yi2 ) − E(yi ) + V (θ̂I ) n(n − 1) i=1 n − 1 n i=1 !2 n n X X 1 1 1 E(yi2 ) − yi + V (θ̂I ) − V (θ̂n ) E n(n − 1) i=1 n−1 n i=1 o 1 n V (θ̂I ) − V (θ̂n ) . V {θ̂n } − n−1 n X Appendix B: A closed form solution to (8) To find a set of φk that satisfies (8), note first that we can express the left side of (8) as X X (k) αi − αi 2 X X + k∈AR i∈ARg (k) αi − αi 2 . k∈AM i∈ARg For k ∈ AM , we have (k) αi = (n − 1)−1 (1 + M −1 di ) (n − 1)−1 {1 + M −1 (di − dik )} if dik = 0 if dik = 1 and X i∈ARg (k) αi − αi 2 X n 1 = αi − αi − dik n−1 (n − 1)M i∈ARg 2 X 1 dik = αi − . (n − 1)2 i∈A M Rg 13 2 Thus, equation (8) reduces to X X (k) αi − αi 2 k∈AR i∈ARg 2 X X 1 dik n X 2 αi − . αi − = n − 1 i∈A (n − 1)2 k∈A i∈A M Rg If we define M (15) Rg 2 X n 1 dkj 2 Qk = α − αk − , n − 1 k (n − 1)2 j∈A M M a sufficient condition to (15) is to find φk such that (k) (αk − αk )2 + 2 2 X (k) X (k) αi − αi + αi − αi = Qk i∈R(k) / i∈R(k) where R(k) = {i; (k) (αk where dk = j∈AM − αk ) P j∈AM (k) (αi where bik = P − αi ) P 2 j∈AM 2 (16) dij dkj > 0, i 6= k}. Note that 2 1 1 1 dk 1 = 0+ − φk dk − − n−1 M n nM 2 2 1 1 dk n−1 = + φk dk − n−1 n nM dkj and, for i ∈ R(k), 2 1 1 φk bik 1 1 di 1 = + + di − − n − 1 (n − 1) M M −1 n nM 2 2 φk 1 bik αi + = n−1 M −1 dij dkj . For i ∈ / R(k), we have (k) (αi 2 − αi ) = 1 n−1 2 αi2 . Thus, (16) reduces to (1 − αk + φk dk )2 + X {αi + φk bik /(M − 1)}2 + X αi2 = (n − 1)2 Qk , i∈R(k) / i∈R(k) which is approximately equal to 2 i∈R(k) bik (M − 1)2 P 2 (1 + φk dk ) + φ2k for sufficiently large n. 14 2 dk dk = 1+ − 2, M M (17)