Fractional Hot Deck Imputation for Robust Inference Jae-Kwang Kim June 26, 2013

advertisement
Fractional Hot Deck Imputation for Robust Inference
Under Item Nonresponse in Survey Sampling
Jae-Kwang Kim
1
Iowa State University
June 26, 2013
1
Joint work with Shu Yang
Introduction
1
Introduction
2
Review
3
Fractional Hot deck imputation
4
Simulation Study
5
Conclusion
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
2 / 44
Introduction
Basic Setup
U = {1, 2, · · · , N}: index set of finite population
(xi , yi ): study variables in unit i in the population.
η: parameter of interest defined by the solution to
N
X
U(η; xi , yi ) = 0.
i=1
Examples:
1
2
3
4
5
Population mean: U(η; x, y ) = y − η
Population proportion of Y less than q: U(η; x, y ) = I (y < q) − η
Population p-th quantitle : U(η; x, y ) = I (y < η) − p
Population regression coefficient: U(η; x, y ) = (y − xη)x0
Domain mean: U(η; x, y ) = (y − η)D(x)
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
3 / 44
Introduction
Basic Setup (Cont’d)
A: index set of the sample (A ⊂ U) obtained from a probability
sampling design, with πi being the first-order inclusion probability of
unit i.
From the sample, we collect measurement for (xi , yi ).
Under complete response, a consistent estimator of η can be obtained
by solving
X
wi U(η; xi , yi ) = 0,
(1)
i∈A
for η, where wi = πi−1 .
Under some regularity conditions, the solution to (1) is consistent and
asymptotically normally distributed (Binder and Patak, 1994).
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
4 / 44
Introduction
Basic Setup (Cont’d)
Assume that xi are always observed and yi are subject to
non-response.
Define
δi =
1
0
if yi is observed
otherwise.
A consistent estimator of η is then obtained by taking the conditional
expectation and solving Ū(η) = 0 for η, where
X
wi [δi U(η; xi , yi ) + (1 − δi ) E {U(η; xi , Y ) | xi , δi = 0}] .
Ū(η) =
i∈A
(2)
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
5 / 44
Introduction
How to compute the conditional expectation in (2)?
1
Often, start with assuming missing-at-random (MAR). That is,
f (y | x, δ) = f (y | x)
2
Build a (parametric) model on f (y | x). That is,
f (y | x) = f (y | x; θ)
3
for some θ.
Obtain a consistent estimator θ̂ of θ from the set of respondents. That
is, solve
X
wi δi S(θ; xi , yi ) = 0
i∈A
4
for θ, where S(θ; x, y ) is the score function of θ.
Compute the conditional expectation by a Monte Carlo approximation
using the samples from f (y | x; θ̂):
M
1 X
∗(j)
∗(j)
E {U(η; xi , Y ) | xi } ∼
U(η; xi , yi ), where yi
∼ f (y | xi ; θ̂).
=
M
j=1
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
6 / 44
Introduction
Imputation
Imputation: Monte Carlo approximation of the conditional
expectation (given the observed data).
E {U (η; xi , Y ) | xi } ∼
=
M
1 X ∗(j)
U η; xi , yi
M
j=1
1
2
Bayesian approach: generate yi∗ from f (yi | xi , θ∗ ) where θ∗ is
generated from p(θ | x, y ).
Frequentist approach: generate yi∗ from f (yi | xi ; θ̂), where θ̂ is a
consistent estimator.
Once the conditional expectation is computed (approximately), we
can obtain η̂ by solving the imputed estimating equation.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
7 / 44
Introduction
Imputation
Remark
Imputation can be applied even when η is unknown. Thus, it is a
useful tool for general-purpose estimation.
Works even when M = 1 (single imputation).
To reduce the variance and to enable variance estimation, M > 1 is
often used.
Bayesian approach: Multiple imputation of Rubin (1987)
Frequentist approach: Parametric fractional imputation of Kim
(2011).
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
8 / 44
Review
1
Introduction
2
Review
3
Fractional Hot deck imputation
4
Simulation Study
5
Conclusion
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
9 / 44
Review
Multiple imputation
Generate M imputed values (with equal weights)
Features
1
2
Imputed values are generated from the posterior predictive distribution,
which is the average of f (yi | xi ; θ) evaluated at the posterior
distribution π (θ | x, yobs ).
Variance estimation formula is simple (Rubin’s formula).
1
)BM
M
2
PM
PM
where WM = M −1 m=1 V̂I (m) , BM = (M − 1)−1 m=1 η̂(m) − η̄M ,
P
M
η̄M = M −1 m=1 η̂(m) is the average of M imputed estimators of η,
and V̂I (m) is the imputed version of the variance estimator of η̂ under
complete response.
V̂MI (η̄M ) = WM + (1 +
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
10 / 44
Review
Multiple imputation
Remark
Sampling design is incorporated by including wi into covariates in
order to make the sampling design non-informative.
Thus, the imputed values are generated from the sample model, not
from population model.
yi∗ ∼ f (y | xi , Ii = 1)
where Ii is the indicator function for the sample inclusion.
MAR is assumed in the sample level:
f (y | x, I = 1, δ = 0) = f (y | x, I = 1, δ = 1),
which is different from MAR in the population level:
f (y | x, δ = 0) = f (y | x, δ = 1).
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
11 / 44
Review
Multiple imputation
Remark (Cont’d)
If the sampling design is non-informative, then the sample model and
the population model are equivalent and the sample MAR and the
population MAR are equivalent.
Variance estimation (using Rubin’s formula) does not work when the
sampling design is informative.
Even when the sampling design is non-informative, consistency of
variance estimator is questionable (Kim et al., 2006) .
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
12 / 44
Review
Multiple imputation
Variance estimation
Rubin’s formula is based on the following decomposition:
V (η̂MI ) = V (η̂n ) + V (η̂MI − η̂n ).
Basically, WM term estimates V (η̂n ) and (1 + M −1 )BM term
estimates V (η̂MI − η̂n ).
In general, we have
V (η̂MI ) = V (η̂n ) + V (η̂MI − η̂n ) + 2Cov (η̂MI − η̂n , η̂n )
and the covariance terms can be non-negligible.
The condition of zero covariance is called congeniality by Meng
(1994).
Congeniality holds when η̂MI is a smooth function of the MLE of θ in
f (y | x; θ). Otherwise, Rubin’s variance estimator can be biased,
which will be discussed in the simulation section.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
13 / 44
Review
Parametric Fractional Imputation
Parametric fractional imputation of Kim (2011)
1
2
∗(1)
where
∗(j)
wij∗ ∝ f (yi
3
∗(M)
More than one (say M) imputed values of yi : yi , · · · , yi
generated from some (initial) density h (yi | xi ).
Create weighted data set
o
n
∗(j)
; j = 1, 2, · · · , M; i ∈ A
wi wij∗ , xi , yi
∗(j)
| xi ; θ̂)/h(yi
| xi ),
θ̂ is the (pseudo) maximum likelihood estimator of θ.
The weight wij∗ are the normalized importance weights and are called
fractional weights.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
14 / 44
Review
Parametric Fractional Imputation (Cont’d)
Product: fractionally imputed data set of size nM
n
o
∗(j)
(wi wij∗ , xi , yi ); j = 1, 2, · · · , M; i ∈ A
Property: for sufficiently large M,
R
(y |xi ;θ̂)
M
n
o
X
g (xi , y ) f h(y
|xi ) h(y | xi )dy
∗(j) ∼
∗
wij g (xi , yi ) =
=
E
g
(x
,
Y
)
|
x
;
θ̂
i
i
R f (y |xi ;θ̂)
j=1
h(y |xi ) h(y | xi )dy
for any g such that the expectation exists.
Can handle informative sampling design by incorporating the sampling
weights into the score equation. That is, solve
X
wi δi S(θ; xi , yi ) = 0
(3)
i∈A
where S(θ; x, y ) = ∂ log f (y | x; θ)/∂θ is the score function of θ.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
15 / 44
Review
Parametric Fractional Imputation (Cont’d)
Remark
Imputed values are generated from the population model, not from
the sample model.
yi∗ ∼ f (y | xi ) 6= f (y | xi , Ii = 1).
Thus, we assume population MAR, not sample MAR.
For variance estimation, either linearization method or replication
method can be used.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
16 / 44
Fractional Hot deck imputation
1
Introduction
2
Review
3
Fractional Hot deck imputation
4
Simulation Study
5
Conclusion
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
17 / 44
Fractional Hot deck imputation
Fractional Hot deck imputation
Motivation
Hot deck imputation
Imputed values are real observations
Very popular in household surveys
Want to implement hot deck version of fractional imputation.
Kim (2004) and Fuller and Kim (2005) already considered fractional
hot deck imputation: x is categorical in f (y | x).
Kim, Fuller and Bell (2011) extended the method of Kim (2004) to
nearest neighbor imputation.
We now want to extend it to the case when x has continuous
components.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
18 / 44
Fractional Hot deck imputation
Fractional Hot deck imputation
Proposed method: Three steps
1
Fully efficient fractional imputation (FEFI) by choosing all the
respondents as donors. That is, we use M = nR imputed values for
each missing unit, where nR is the number of respondents in the
sample.
2
Use a systematic PPS sampling to select m (<< nR ) donors from the
FEFI.
3
Use a calibration weighting technique to compute the final fractional
weights (which lead to the same estimates of FEFI for some items).
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
19 / 44
Fractional Hot deck imputation
Fractional Hot deck imputation
Step 1: FEFI step
Want to find the fractional weights wij∗ when the j-th imputed value
∗(j)
yi
is taken from the j-th value in the set of the respondents.
Without loss of generality, we assume that the first nR elements
∗(j)
respond and write yi
= yj .
Recall that
∗(j)
wij∗ ∝ f (yi
∗(j)
when yi
| xi )
are generated from h(y | xi ).
∗(j)
We have only to find h(yi
Kim (ISU)
∗(j)
| xi ; θ̂)/h(yi
∗(j)
| xi ) when we use yi
Fractional Hot Deck Imputation
= yj .
June 26, 2013
20 / 44
Fractional Hot deck imputation
Fractional Hot deck imputation
Step 1: FEFI step (Cont’d)
We can treat {yi ; δi = 1} as a realization from f (y | δ = 1), the
marginal distribution of y among respondents.
Now, we can write
Z
f (yj |δj = 1) =
f (yj | x, δj = 1) f (x | δj = 1)dx
Z
=
∼
=
f (yj | x) f (x | δj = 1)dx
N
1 X
δk f (yj | xk ) ,
NR
k=1
where NR =
Kim (ISU)
PN
i=1 δi
is the population size of (potential) respondents.
Fractional Hot Deck Imputation
June 26, 2013
21 / 44
Fractional Hot deck imputation
Fractional Hot deck imputation
Step 1: FEFI step (Cont’d)
Using the survey weights, we can approximate
P
k∈AR wk f (yj |xk )
P
f (yj |δj = 1) ∼
=
k∈AR wk
∗(j)
and the fractional weight for yi
wij∗ ∝ P
= yj becomes
f (yj | xi ; θ̂)
k∈AR
(4)
wk f (yj | xk ; θ̂)
P
with j∈AR wij∗ = 1, where AR = {i ∈ A; δi = 1} and θ̂ is computed
from the weighted score equation in (3).
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
22 / 44
Fractional Hot deck imputation
Fractional Hot deck imputation
Step 2: Sampling Step
FEFI uses all the elements in AR as donors for each missing i.
Want to reduce the number of donors to, say, m = 10.
For each i, we can treat the FEFI donor set as the weighted
population and apply a sampling method to select a smaller set of
donors.
Fractional weights (4) for FEFI can be used as the selection
probabilities for the PPS sampling.
That is, our goal is to obtain a (systematic) PPS sample Di of size m
from the FEFI donor set of size M = nR , using wij∗ as the selection
probability assigned to the j-th element in AR . (Note that wij∗
P
∗
∗
satisfies M
j=1 wij = 1 and wij > 0.)
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
23 / 44
Fractional Hot deck imputation
Fractional Hot deck imputation
Step 3: Weighting Step
After we select Di from the complete set of respondents, the selected
donors in Di are assigned with the initial fractional weights
∗ = 1/m.
wij0
The fractional weights are further adjusted to satisfy
X
X
X
X
∗
wi {(1 − δi )
wij,c
q(xi , yj )} =
wi {(1 − δi )
wij∗ q(xi , yj )},
i∈A
j∈Di
i∈A
j∈AR
(5)
P
∗ = 1 for all i with δ = 0, where w ∗
for some q(xi , yj ), and j∈Di wij,c
i
ij
is the fractional weights for FEFI method, as defined in (4).
Regarding the choice of the control function q(x, y ) in (5), we can
use q(x, y ) = (y , y 2 )0 , which will lead to fully efficient estimates for
the mean and the variance of y .
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
24 / 44
Fractional Hot deck imputation
Fractional Hot deck imputation
Remark
For variance estimation, replication method can be used. The
imputed values are not changed, only the fractional weights are
changed for each replication. (Details skipped)
The proposed fractional hot deck imputation is less sensitive against
model mis-specification in f (y | x; θ). (Details skipped.)
The proposed method can be extended to a non-ignorable missing
case under a parametric model assumption on the response
mechanism. (Details skipped).
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
25 / 44
Simulation Study
1
Introduction
2
Review
3
Fractional Hot deck imputation
4
Simulation Study
5
Conclusion
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
26 / 44
Simulation Study
Simulation Study - Study One
Factors considered
Correct vs incorrect imputation model: to see the effect of model
misspecification of f (y | x).
Imputation methods: MI, PFI, FHDI
Parameters of interest: mean, proportion
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
27 / 44
Simulation Study
Simulation Study - Study One
Simulation Setup
Two sets of models
1
2
Model A: yi = 0.5xi + ei , where xi ∼ exp(1) and ei ∼ N(0, 1).
Model B: same as model A except for ei ∼ {χ2 (2) − 2)}/2
Response mechanism: yi is observed only when δi = 1 where
δi ∼ Bernoulli(π), πi = {1 + exp(−0.2 − xi )}−1
Thus, we have MAR with 65% overall response in both models.
B = 5, 000 Monte Carlo samples of size n = 200.
We used yi ∼ N(β0 + β1 xi , σ 2 ) as the imputation model under both
cases. (Thus, the imputation model is mis-specified under Model B.)
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
28 / 44
Simulation Study
Simulation Study - Study One
Simulation Setup (Cont’d)
Two parameters considered:
1
2
η1 = E (Y ): the population mean of y
η2 = Pr (Y < 1): the proportion of Y less than one.
Four estimators computed:
1
2
3
4
Full sample estimator (FULL) that is computed using the full sample.
Multiple imputation (MI) estimator with imputation size m = 10
Parametric fractional imputation (PFI) with imputation size m = 10
Fractional hot deck imputation (FHDI) with imputation size m = 10
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
29 / 44
Simulation Study
Simulation Study - Study One
Simulation Results under Model A
Table : Point estimation
Parameter
η 1 = µy
η2 = pr(Y < 1)
Kim (ISU)
Method
Full
MI
PFI
FHDI
Full
MI
PFI
FHDI
Mean
.50
.50
.50
.50
.68
.68
.68
.68
Fractional Hot Deck Imputation
Var
.00625
.00955
.00907
.00926
.00107
.00130
.00129
.00158
Std Var
100
153
145
148
100
126
121
153
June 26, 2013
30 / 44
Simulation Study
Simulation Study - Study One
Simulation Results under Model A
Table : Variance estimation
Parameter
V (η̂1 )
V (η̂2 )
Kim (ISU)
Method
MI
PFI
FHDI
MI
PFI
FHDI
R.B. (%)
0.66
2.18
0.44
19.35
0.99
5.19
Fractional Hot Deck Imputation
t-statistics
0.32
1.11
0.22
9.39
0.50
2.56
June 26, 2013
31 / 44
Simulation Study
Simulation Study - Study One
Discussion for Model A results
Point estimation unbiased for both parameters under correct model.
For η1 = E (Y ), imputation increases variance roughly 45-53%:
1 2
1
1
V (η̂1,imp ) =
σ +
−
σe2
n y
nR
n
1.25
1
1
∼
+
−
1
=
200
200 0.65
.
= 0.00625 + 0.0027 = 0.00895
and 0.00895/0.00625 = 1.43.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
32 / 44
Simulation Study
Simulation Study - Study One
Discussion for Model A results (Cont’d)
For η2 = Pr (Y < 1), imputation increases variance roughly 25% for
MI and PFI. Note that
n
1X
η̂2,imp ∼
[δi I (yi < 1) + (1 − δi )E {I (Y < 1) | xi }]
=
n
i=1
where we used the imputation model in computing the conditional
expectation.
Thus, it “borrows strength” by making use of normality assumption
at the time of imputation.
In some sense, the above imputation estimator can be viewed as a
composite estimator, where “composite” estimator is a weighted
average of “direct”’ estimator and “synthetic” estimator.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
33 / 44
Simulation Study
Simulation Study - Study One
Discussion for Model A results (Cont’d)
In fact, under full response, there are two estimators of
η2 = Pr (Y < 1):
η̂2,MME
= n
−1
n
X
I (yi < 1)
i=1
Z
η̂2,MLE
1
=
φ
−∞
y − µ̂
σ̂
dy .
The MLE is more efficient than the MME but it is less robust.
The congeniality condition holds when MLE is used, but not when
MME is used. Rubin’s variance estimator for MI requires the
congeniality condition. FI does not require congeniality.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
34 / 44
Simulation Study
Simulation Study - Study One
Simulation Results under Model B
Table : Point estimation
Parameter
η 1 = µy
η2 = pr(Y < 1)
Kim (ISU)
Method
Full
MI
PFI
FHDI
Full
MI
PFI
FHDI
Mean
.502
.499
.501
.500
.748
.729
.730
.751
Fractional Hot Deck Imputation
Var
.00619
.00952
.00917
.00911
.00093
.00149
.00144
.00147
Std Var
100
155
148
148
100
159
155
157
June 26, 2013
35 / 44
Simulation Study
Simulation Study - Study One
Simulation Results under Model B
Table : Variance estimation
Parameter
V (η̂1 )
V (η̂2 )
Kim (ISU)
Method
MI
PFI
FHDI (m = 10)
MI (m = 10)
PFI (m = 10)
FHDI (m = 10)
R.B. (%)
1.43
1.15
1.00
-3.08
3.26
4.50
Fractional Hot Deck Imputation
t-statistics
0.71
0.57
0.51
-1.52
1.62
2.22
June 26, 2013
36 / 44
Simulation Study
Simulation Study - Study One
Discussion for Model B results
Point estimation unbiased for η1 = E (Y ) even when the imputation
model is incorrect.
Note that, for m → ∞, the imputed estimator of η1 can be written
n
η̂1,imp =
=
1X
{δi yi + (1 − δi )ŷi }
n
1
n
i=1
n
X
ŷi
i=1
which is called the projection estimator.
Kim and Rao (2012) showed design-consistency of the projection
estimator.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
37 / 44
Simulation Study
Simulation Study - Study One
Discussion for Model B results (Cont’d)
However, all imputed estimator are biased for η2 = Pr (Y < 1).
The biases are much higher for MI and PFI than FHDI, with the
corresponding z-statistics are -34.8,-33.5, and 5.5 for MI, PFI, and
FHDI, respectively.
Note that the true error distribution is ei ∼ {χ2 (2) − 2)/2 while the
imputation model errors are generated from ei∗ ∼ N(0, σ̂e2 ). (See the
picture next page).
In FHDI, the donors are still generated from the true distribution,
only the fractional weights are computed from the wrong model.
Thus, the effect of model mis-specification is less severe than the
other imputation methods that create synthetic values from the
wrong model.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
38 / 44
0.8
Simulation Study
0.4
0.0
0.2
Density
0.6
True model
Imputation model
−1
0
1
2
3
4
5
x
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
39 / 44
Simulation Study
Simulation Study - Study Two
Bivariate data (xi , yi ) of size n = 100 with
Yi = β0 + β1 xi + β2 xi2 − 1 + ei
(6)
where (β0 , β1 , β2 ) = (0, 0.9, 0.06), xi ∼ N (0, 1), ei ∼ N (0, 0.16), and
xi and ei are independent. The variable xi is always observed but the
probability that yi responds is 0.5.
The imputation model is
Yi = β0 + β1 xi + ei .
That is, imputer’s model uses extra information of β2 = 0.
From the imputed data, we fit model (6) and computed power of a
test H0 : β2 = 0 with 0.05 significant level.
In addition, we also considered the Complete-Case (CC) method that
simply uses the complete cases only for the regression analysis.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
40 / 44
Simulation Study
Simulation Study - Study Two
Table 5 Simulation results for the Monte Carlo experiment based on
10,000 Monte Carlo samples.
Method
MI
FI
CC
E (θ̂)
0.028
0.046
0.060
V (θ̂)
0.00056
0.00146
0.00234
R.B. (V̂ )
1.81
0.02
-0.01
Power
0.044
0.314
0.285
Table 5 shows that MI provides efficient point estimator than CC method
but variance estimation is very conservative (more than 100%
overestimation). Because of the serious positive bias of MI variance
estimator, the statistical power of the test based on MI is actually lower
than the CC method.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
41 / 44
Conclusion
1
Introduction
2
Review
3
Fractional Hot deck imputation
4
Simulation Study
5
Conclusion
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
42 / 44
Conclusion
Concluding Remarks
Advantage
1
2
3
4
Hot deck imputation: uses real observations for imputed values.
Robust against model mis-specification.
Applicable even when the sampling design is informative.
Does not require congeniality condition for valid variance estimation.
Disadvantage : May have a higher imputation variance than the
imputation methods using synthetic values.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
43 / 44
Conclusion
Future work
Extension to single imputation (m = 1).
Imputation variance component needs to be estimated.
Instead of the calibration weighting step (in Step 3), we may consider
using balanced imputation (Chauvet et al., 2011)
FHDI for multivariate missing
To be presented at the ISI meeting in Hong Kong
To be implemented in SAS (in Proc Surveyimpute).
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
44 / 44
References
REFERENCES
Binder, D. and Z. Patak (1994), ‘Use of estimating functions for
estimation from complex surveys’, Journal of the American Statistical
Association 89, 1035–1043.
Chauvet, G., J.-C. Deville and D. Haziza (2011), ‘On balanced random
imputation in surveys’, Biometrika 98, 459–471.
Fuller, W. A. and J. K. Kim (2005), ‘Hot deck imputation for the response
model’, Survey Methodology 31, 139–149.
Kim, J. K. (2004), ‘Finite sample properties of multiple imputation
estimators’, The Annals of Statistics 32, 766–783.
Kim, J. K. (2011), ‘Parametric fractional imputation for missing data
analysis’, Biometrika 98, 119–132.
Kim, J. K. and J. N. K. Rao (2012), ‘Combining data from two
independent surveys: a model-assisted approach’, Biometrika
99, 85–100.
Kim, J. K., M. J. Brick, W. A. Fuller and G. Kalton (2006), ‘On the bias
of the multiple imputation variance estimator in survey sampling’,
Journal of the Royal Statistical Society: Series B 68, 509–521.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
44 / 44
Conclusion
Kim, J.K., W.A. Fuller and W.R. Bell (2011), ‘Variance estimation for
nearest neighbor imputation for u.s. census long form data’, Annals of
Applied Statistics 5, 824–842.
Meng, X. L. (1994), ‘Multiple-imputation inferences with uncongenial
sources of input (with discussion)’, Statistical Science 9, 538–573.
Rubin, D. B. (1987), Multiple Imputation for Nonresponse in Surveys,
Wiley, New York.
Kim (ISU)
Fractional Hot Deck Imputation
June 26, 2013
44 / 44
Download