6. Fractional Imputation in Survey Sampling 1 Introduction • Consider a finite population of N units identified by a set of indices U = {1, 2, · · · , N } with N known. Associated with each unit i in the population are study variables, zi = (xi , yi ) with yi = (yi1 , · · · , yip ) being the vector of study variables that are subject to missingness and xi being the vector of auxiliary variables that are always observed. • We are interested in estimating η, defined as a (unique) solution to the populaP tion estimating equation N i=1 U (η; zi ) = 0. Examples of η includes 1. Population mean: U (η; x, y) = y − η 2. Population proportion of Y less than q: U (η; x, y) = I(y < q) − η 3. Population p-th quantitle : U (η; x, y) = I(y < η) − p 4. Population regression coefficient: U (η; x, y) = (y − xη)x0 5. Domain mean: U (η; x, y) = (y − η)D(x) • Let A denote the set of indices for the units in a sample selected by a probability sampling mechanism with sample size n. Under complete response, a consistent estimator of η is obtained by solving X wi U (η; zi ) = 0. (1) i∈A where wi = 1/πi is the inverse of the first-order inclusion probability. Under some regularity conditions, we can establish that the solution η̂n to (1) converges in probability to η and is asymptotically normally distributed. 1 • Things to consider – Complex sampling: unequal probability of selection, multi-stage sampling – General purpose estimation: We do not know which parameter η will be used by data analyst at the time of imputation. – Multivariate missingness with arbitrary missing pattern. – We cannot use large M . (We do not want to create a huge data file.) • Two concepts of missing at random (MAR): (For simplicity, assume p = 1.) – Population missing at random (PMAR): MAR holds in the population level f (y | x) = f (y | x, δ) That is, Y ⊥ δ | x. – Sample missing at random (SMAR): MAR holds in the sample level f (y | x, I = 1) = f (y | x, I = 1, δ) That is, Y ⊥ δ | (x, I = 1), where Ii = 1 if unit i ∈ A and Ii = 0 otherwise. • If the sampling design is such that P (I = 1 | x, y) = P (I = 1 | x), which is often called noninformative sampling design, then PMAR implies SMAR. For informative sampling design, PMAR does not necessarily imply SMAR. • Under PMAR, an imputed value yi∗ of missing yi satisfies E{yi∗ | xi , δi = 0} = E{yi | xi , δi = 0} (2) while, under SMAR, it satisfies E{yi∗ | xi , Ii = 1, δi = 0} = E{yi | xi , δi = 0}. (3) • Roughly speaking, fractional imputation is based on PMAR assumption, while multiple imputation (which will be covered next week) is based on SMAR assumption. 2 2 Parametric fractional imputation • We assume that the finite population at hand is a realization from an infinite population, called superpopulation. In the superpopulation model, we often postulate a parametric distribution, f (y | x; θ), which is known up to the parameter θ with parameter space Ω. The parametric model has a joint density f (y | x; θ) = f1 (y1 | x; θ1 )f2 (y2 | x, y1 ; θ2 ) · · · fp (yp | x, y1 , · · · , yp−1 ; θp ) (4) where θk is the parameter in the conditional distribution of yk given x and y1 , · · · , yk−1 . • For each yi = (y1i , · · · , ypi ), we have δi = (δ1i , · · · , δpi ) where δki = 1 if yki is observed and δki = 0 otherwise. • For example, p = 3, there are 8 = 23 possible missing patterns: A = A111 ∪ A110 ∪ · · · ∪ A000 , where, for example, A100 is the set of sample indices with δ1i = 1, δ2i = 0, and δ3i = 0. For i ∈ A100 , we need to create imputed values for y2i and y3i from f (y2i , y3i | xi , y1i ). • Without loss of generality, we can express yi = (yobs,i , ymis,i ), where yobs,i and ymis,i are the observed and the missing part of yi , respectively. • Three steps for PFI under complex sampling 1. Compute the pseudo maximum likelihood estimator of θ using EM by PFI method with sufficiently large imputation size M (say, M = 1, 000). 2. Select m (say, m = 10) imputed values from the set of M imputed values. 3. Construct the final fractional weights for the m imputed data. • The first step is called Fully Efficient Fractional Imputation (FEFI) step. The second step is Sampling Step, The third step is called Weighting Step. • Step 1 (FEFI step): The pseudo maximum likelihood estimator of θ is computed by the following EM algorithm. 3 1. [I-step]: Set t = 0. Obtain M imputed values of ymis,i generated from a proposal distribution h(ymis,i | xi , yobs,i ). One simple choice of h(·) is h(ymis,i | xi , yobs,i ) = f (ymis,i | xi , yobs,i ; θ̂(0) ) (5) where θ̂(0) is the initial estimator of θ obtained from the available respondents. Generating samples from (5) may require MCMC method or SIR (Sampling Importance Resampling) method. See Appendix B for an illus∗ tration of SIR method. Let wij(0) = 1/M be the initial fractional weights ∗(j) for ymis,i . 2. [M-step]: Update the parameter θ̂(t+1) by solving the following imputed score equation, θ̂(t+1) : solution to X wi i∈A M X ∗ wij(t) S(θ; xi , yij∗ ) = 0, j=1 ∗(j) where yij∗ = (yobs,i , ymis,i ) and S(θ; x, y) = ∂ log f (y | x; θ)/∂θ is the score function of θ. 3. [W-step]: Set t = t + 1. Using the current value of the parameter estimates θ̂(t) , compute the fractional weights ∗(j) ∗ wij(t) with PM j=1 ∝ f (yobs,i , ymis,i | xi ; θ̂(t) ) ∗(j) h(ymis,i | xi , yobs,i ) ∗ ∗ = 1/M . = 1. For t = 0, we set wij(t) wij(t) ∗ 4. Check if wij(t) > 1/m for some j = 1, · · · , M . If yes, update the proposal distribution with θ̂0 replaced by θ̂(t) and goto [I-step]. If no, goto [M-step]. Stop if θ̂(t) meets the convergence criterion. [I-step] is the imputation step, [W-step] is the weighting step, and [M-step] is the maximization step. Note that the imputed values are not changed in the EM iteration. Only the fractional weights are updated. ∗(j) • Step 2 (Sampling Step): For each i, we have M possible imputed values zi ∗(j) = ∗ ∗ (xi , yobs,i , ymis,i ) with their fractional weights wij , where wij is computed from 4 ∗(j) the EM algorithm after convergence. For each i, we treat z∗i = {zi ;j = ∗ 1, 2, · · · , M } as as a weighted finite population (with weight wij ) and use an unequal probability sampling method to select a sample of size m from z∗i using ∗ as the selection probability. (We can use a PPS sampling or systematic πPS wij ∗(1) sampling to obtain an imputed data of size m.) Let z̃i ∗(m) , · · · , z̃i be the m elements sampled from z∗i by the PPS sampling. That is, ∗(k) P r(z̃i ∗(j) = zi ∗ ) = wij , ∀j = 1, · · · , M ; k = 1, · · · , m. ∗ = 1/m. The fractional weights for the final m imputed values are given by w̃ij0 ∗ • Step 3 (Weighting Step): Modify the initial fractional weights w̃ij0 = 1/m to satisfy the calibration constraints. The constraint is X wi Pm j=1 ∗ w̃ij S(θ̂; xi , ỹij∗ ) = 0 (6) j=1 i∈A with m X ∗ w̃ij = 1, where θ̂ is the pseudo MLE of θ computed from the FEFI ∗(j) step and ỹij∗ = (yobs,i , ỹmis,i ). That is, ỹij∗ is the j-th imputed element of yi selected from the PPS sampling in Step 2. A solution to this calibration problem is ( ) X ∗ ∗ ∗ ∗ ∗ w̃ij = w̃ij0 − wi S̄Ii0 T̂ −1 w̃ij0 (Sij∗ − S̄Ii0 ) i∈A where Sij∗ = S(θ̂; xi , ỹij∗ ) m X ∗ ∗ S̄Ii0 = w̃ij0 Sij∗ j=1 T̂ = X wi m X ∗ w̃ij Sij∗ − S̄Ii0 0 Sij∗ − S̄Ii0 . j=1 i∈A • Once the final fractional weights are computed, then the PFI estimator of η is obtained by solving X i∈A wi m X ∗ w̃ij U (η; xi , ỹij∗ ) = 0. j=1 5 (7) Note that the above fractionally imputed estimating equation is an approximation to the following expected estimating equation X wi E{U (η; xi , yobs,i , Ymis,i ) | xi , yobs,i ; θ̂} = 0. i∈A • For variance estimation, we can use replication methods (such as jackknife or bootstrap). Details are given in Appendix C. 3 Nonparametric approach: Fractional Hot deck Imputation for multivariate continuous variable • We do not want to make a parametric model assumptions about f (y1 , · · · , yp | x). However, some assumption of joint distribution of (y1 , · · · , yp ) is needed in order to preserve the correlation structure between the items. • Easy if the data were categorical. • Example (SRS of size n = 10) ID Weight 1 0.10 2 0.10 3 0.10 4 0.10 5 0.10 6 0.10 7 0.10 8 0.10 9 0.10 10 0.10 M: Missing x y1 1 y1,1 1 y1,2 1 M 1 y1,4 1 y1,5 2 y1,6 2 M 2 M 2 y1,9 2 y1,10 y2 y2,1 M y2,3 y2,4 y2,5 y2,6 y2,7 M y2,9 y2,10 • Fractional Imputation Idea: If both y1 and y2 are categorical, then fractional imputation is easy to apply. – We have only finite number of possible values. – Imputed values = possible values 6 – The fractional weights are the conditional probabilities of the possible values given the observations. – Can use “EM by weighting” method of Ibrahim (1990) to compute the fractional weights. • Example (y1 , y2 : dichotomous, taking 0 or 1) ID 1 2 3 4 5 6 7 8 9 10 Weight 0.10 ∗ 0.10w2,1 ∗ 0.10w2,2 ∗ 0.10w3,1 ∗ 0.10w3,2 0.10 0.10 0.10 ∗ 0.10w7,1 ∗ 0.10w7,2 ∗ 0.10w8,1 ∗ 0.10w8,2 ∗ 0.10w8,3 ∗ 0.10w8,4 0.10 0.10 x y1 1 y1,1 1 y1,2 1 y1,2 1 0 1 1 1 y1,4 1 y1,5 2 y1,6 2 0 2 1 2 0 2 0 2 1 2 1 2 y1,9 2 y1,10 y2 y2,1 0 1 y2,3 y2,3 y2,4 y2,5 y2,6 y2,7 y2,7 0 1 0 1 y2,9 y2,10 • Fractional weights are the conditional probabilities of the imputed values given the observations. For example, ∗ w2,1 = P̂ (y2 = 0 | x = x2 , y1 = y1,2 ) ∗ w3,1 = P̂ (y1 = 0 | x = x3 , y2 = y3,2 ) ∗ w7,1 = P̂ (y1 = 0 | x = x7 , y2 = y2,7 ) and ∗ w8,1 = P̂ (y1 = 0, y2 = 0 | x = x8 ). The conditional probabilities are computed from the joint probabilities. • M-step: Update the joint probability πbc|a = P (y1 = b, yc = b | x = a) by solving Pn PMi ∗(j) ∗(j) ∗ i=1 j=1 wi wi,j I(xi = a, y1i = b, y2i = c) P π̂bc|a = . n i=1 wi I(xi = a) 7 • For continuous y, Let’s consider an approximation using categorical transformation. For simplicity, let Y = (Y1 , Y2 , Y3 ) be the study variables that have missingness. 1. Preliminary Step For each item k, create a transformation of Yk into Ỹk , a discrete version of Yk . The value of Ỹk will serve the role of imputation cell for Yk . If Yk is missing, then Ỹk is also missing. Let Mk be the number of cells for item Yk . The maximum number of cells for p = 3 is then G = M1 × M2 × M3 . 2. FEFI Step: Two-stage imputation is used. For each i in the sample, ỹi is decomposed into ỹi = (ỹobs,i , ỹmis,i ). In the stage 1 imputation, we impute the imputation cells. In the stage 2 imputation, we impute for missing observations within imputation cells. To perform two-stage imputation, we first compute the estimated joint probability π̃ijk = Pr (ỹ1 = i, ỹ2 = j, ỹ3 = k) using the EM algorithm (or other estimation methods). (a) Stage 1 imputation: For each i, identify all possible values of ỹmis,i . Let Gi be the number of possible values of ỹmis,i . In the stage 1 FEFI method, we create Gi imputed values of ỹmis,i with the fractional weights corresponding ∗ is to ỹmis,i(g) ∗ ) π̃(ỹi,obs , ỹi,mis(g) ∗ wig(1) =P ∗ g π̃(ỹi,obs , ỹi,mis(g) ) (8) ∗ where ỹi,mis(g) is the g-th realization of the ỹi,mis , the missing part of Ỹ for unit i. (b) Stage 2 imputation: For each g-th imputed cell in Stage 1 imputation, we identify the donor set from the respondents of Ymis(i) to impute all the observed values of Yk in the same cell. For example, if we observe y1i and y3i but y2i is not observed. In this case, the donor set for unit i is Di = {j ∈ A; δ1j = δ3j = 1, ỹ1j = ỹ1i , ỹ3j = ỹ3i }. The within-cell fractional 8 weight for donor j is then ∗ wij(2) =P wj j∈Di wj . The final fractional weight for donor j in unit i is ∗ ∗ ∗ wij = wig(1) wij(2) . Note that P j (9) ∗ ∗ , the imputed = 1. Note that, for the fractional weight wij wij value for yi = (yobs,i , ymis,i ) is yij∗ = (yobs,i , ymis(i),j ), where ymis(i),j is the value corresponding to variable ymis(i) for unit j. 3. Sampling Step From the two stage FEFI data, we use a PPS sampling (with rejective sampling) and calibration weighting to obtain an approximation. That is, for each i, from ∗ the set {(wij , yij∗ ); j ∈ A}, we perform a systematic PPS sampling of size m ∗ ∗∗ ∗∗ using wij in (9) as the size measure. Let yi1 , · · · , yim be the m imputed values from the PPS selection. The initial fractional weight assigned to yij∗∗ is given b ∗ = 1/m. w̃ij0 4. Weighting Step The fractional weights are further adjusted to match to the marginal probabilities π̃i++ , π̃+j+ , and π̃++k . That is, we may use ( ) m X X ∗ wi δi I(ỹi = g) + (1 − δi ) w̃ij I(ỹij∗∗ = g) = π̂g , g = 1, · · · , G. i∈A j=1 In this case, a raking ratio estimation method can be used to achieve these calibration constraints. For variance estimation, we can use the replication method by repeating [Step 2]-[Step 4] to obtain replicated fractional weights. 9 Appendix A. Lemma 6.1 P Lemma 6.1: If either (2) or (3) holds, the imputed estimator of Y = N i=1 yi of the P form ŶI = i∈A wi {δi yi +(1−δi )yi∗ } is unbiased for Y in the sense that E(ŶI −Y ) = 0. Proof. We first introduce an extended definition of δi , where δi = 1 if unit i responds when sampled and δi = 0 otherwise. With this extended definition, δi is defined throughout the finite population. Fay (1992), Shao and Steel (1999) and Kim and Rao (2009) also used this extended definition. Now, we will first show that (3) implies unbiased. Let Ŷn = P i∈A wi yi be an unbiased estimator of Y . Note that ŶI − Ŷn = N X Ii wi (1 − δi )(yi∗ − yi ). (10) i=1 Thus, writing I = (I1 , I2 , · · · , IN ) and δ = (δ1 , · · · , δN ), E{ŶI − Ŷn | I, δ} = N X Ii wi (1 − δi )E{yi∗ − yi | Ii = 1, δi = 0} = 0. (11) i=1 Since we can write E{ŶI − Y } = E{ŶI − Ŷn } + E{Ŷn − Y }, (12) the first term is zero by (11) and the second term is zero by the design-unbiasedness of Ŷn . Finally, we will show that condition (2) also implies unbiasedness. From (10), by taking expectation with respect to the sampling design, we have E{ŶI − Ŷn | δ, Y} = N X (1 − δi ) (yi∗ − yi ) i=1 and so E{ŶI − Ŷn | δ} = N X (1 − δi ) E (yi∗ − yi | δi = 0) = 0. i=1 Therefore, the unbiasedness of the imputed estimator also follows as the first term of (12) can be shown to be zero from (2). 10 B. SIR algorithm in the I-step To discuss I-step in Step 1, we give an illustration for p = 2. Extension to p > 2 case is straightforward. Note that the joint density can be written f (y1 , y2 | x) = f1 (y1 | x; θ1 )f2 (y2 | x, y1 ; θ2 ). The sample is partitioned into four sets, A11 , A10 , A01 , A00 , according to the missing patterns. We first obtain the initial parameter estimate θ̂(0) = (θ̂1(0) , θ̂2(0) ) using available respondents. That is, we use the observations in A11 ∪ A10 to estimate θ1 and use the observations in A11 to estimate θ2 . Now, we want to generate M imputed values from (5). In the case of p = 2, we have the proposal distribution can be written as f2 (y2i | xi , y1i ; θ̂2(0) ) h(ymis,i | xi , yobs,i ) = f (y1i | xi , y2i ; θ̂(0) ) f (y1i , y2i | xi ; θ̂(0) ) if i ∈ A10 if i ∈ A01 if i ∈ A00 where f (y1i | xi , y2i ; θ̂(0) ) = R f1 (y1i | xi ; θ̂1(0) )f2 (y2i | xi , y1i ; θ̂2(0) ) f1 (y1i | xi ; θ̂1(0) )f2 (y2i | xi , y1i ; θ̂2(0) )dy1i . (13) Except for some special cases such as normal f1 and normal f2 , the conditional distribution in (13) is not of know form. Thus, some computational tools (such as Metropolis-Hastings algorithm) to generate samples from (13) for i ∈ A01 . We introduce SIR (Sampling Importance Resampling) algorithm as an alternative computational tool for generating imputed values from (13). The SIR consists of the following steps: 1. Generate B (san B = 100) samples from f1 (y1i | xi ; θ̂1(0) ). ∗ 2. Select a PPS sample of size one from the B elements of y1i with size measure ∗ f2 (y2i | xi , y1i ; θ̂2(0) ). 3. Repeat Step 1 and Step 2 independently M times to obtain M imputed values. Once we obtain M imputed values of y1i , we can use ĥ(ymis,i | xi , yobs,i ) ∝ f1 (y1i | xi ; θ̂1(0) )f2 (y2i | xi , y1i ; θ̂2(0) ) 11 as an estimator for the proposal density in (5). Since PM j=1 ∗ wij = 1, we do not need to compute the density for the conditional density in (13). C. Replication variance estimation In the variance estimation, we use a replication variance method. Let Ŷn = P i∈A wi yi be the complete sample estimator of Y under complete response and let V̂rep = n X 2 ck Ŷn(k) − Ŷn k=1 (k) be a replication variance estimator with Ŷn = P i∈A (k) wi yi . To discuss variance estimation of the PFI method presented in Section 2, recall that the PFI method consists of three steps: (1) FEFI step, (2) Sampling Step, (3) Weighting Step. We mimic the procedures for each replication but want to avoid regenerating the imputed values. The proposed variance estimation employs similar (k) steps but uses the replication weights wi instead of the original weight wi . 1. FEFI step: Compute the replicate θ̂(k) of θ̂ by applying the same EM algorithm (k) using wi replaced by wi . 2. Sampling Step: We will use the same imputed data for each replication. The replicates for the fractional weights for the final m imputed values are given by ∗(j) ∗(k) w̃ij0 with Pm j=1 ∝ f (yobs,i , ỹmis,i ; θ̂(k) ) (14) ∗(j) f (yobs,i , ỹmis,i ; θ̂) ∗(k) w̃ij0 = 1. ∗(k) 3. Weighting step: Modify the initial fractional weights w̃ij0 in (14) to satisfy the following calibration constraint X i∈A with Pm j=1 ∗(k) w̃ij (k) wi m X ∗(k) w̃ij S(θ̂(k) ; xi , ỹij∗ ) = 0 j=1 = 1, where θ̂(k) is the pseudo MLE of θ computed from the FEFI step using the replication weights. 12 Once the final replicated fractional weights are computed, then the variance estimation of η̂P F I obtained from (7) is given by V̂P F I = n X ck (k) η̂P F I − η̂P F I 2 k=1 (k) where η̂P F I is computed from X i∈A (k) wi m X ∗(k) w̃ij U (η; xi , ỹij∗ ) = 0. j=1 13