Two-phase Fractional Hot deck Imputation 2015 Joint Statistical Meetings Seattle, August 2015

advertisement
Two-phase Fractional Hot deck Imputation
Jongho Im, Jae-kwang Kim, Wayne A. Fuller
Iowa State University
2015 Joint Statistical Meetings
Seattle, August 2015
Outline
1
Missing Data problem
2
Estimation
3
Imputed data set
4
Variance estimation
5
Simulation
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
2 / 19
Data
A: index set of the sample
(xi , yi , zi , δi ), i ∈ A
xi discrete {1, 2, · · · , G }
yi variable of intereset
zi discrete version of yi {1, 2, · · · , H}
δi : response indicator function of yi
1 if yi is observed
δi =
0 if yi is missing
wi = sampling weight
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
3 / 19
Discrete Variables
xi = g , zi = h defines cell gh
Assume missing at random
πh|g ≡ P{z = h | x = g } = P{z = h | x = g , δ = 1}
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
4 / 19
Parameter Estimation
π̂h|g =
(
X
)−1
wi δi I (xi = g )
i∈A
X
wi δi I (xi = g , zi = h)
i∈A
Note:
1
2
zi is observed only for δi = 1.
PH
h=1 π̂h|g = 1.
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
5 / 19
Discrete Variable Fractional Imputation
Fully efficient fractional imputation (FEFI)
zi some missing
xi all observed
H imputed “observations” for each i with δi = 0
weight = wj whj∗ , whj∗ = π̂h|g for xj = g
nobs observed
nmis H missing
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
6 / 19
Continuous y -variable
Efficient estimator
µ̂y ,FE =
XX
g
π̂gh =
estimated fraction for cell gh
ȳgh =
sample cell mean
P
wi δi I (xi = g , zi = h)yi
Pi∈A
i∈A wi δi I (xi = g , zi = h)
=
Fuller (ISU)
π̂gh ȳgh
h
Fractional Hot Deck Imputation
2015 JSM
7 / 19
Continuous y -variable (Cont’d)
Expression using Fractional Imputation
µ̂y ,FEFI =
G X
X
(
wj I (xj = g ) δj yj + (1 − δj )
g =1 j∈A
where
)
X
wij∗ yi
i
wi δi I (xi = g , zi = h)
k∈A wk δk I (xk = g , zk = h)
wij∗ = π̂h|g × P
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
8 / 19
Missing Data Illustration
Observation
1
2
3
4
5
Fuller (ISU)
Weight
0.2
0.2
0.2
0.2
0.2
xi
1
2
2
2
2
Fractional Hot Deck Imputation
yi
1
1
1
2
Miss
2015 JSM
9 / 19
Missing Data Imputed
Observation
1
2
3
4
5
5
Fuller (ISU)
Weight
0.2
0.2
0.2
0.2
0.2
0.2
Fractional Wgt
1.00
1.00
1.00
1.00
0.67
0.33
Fractional Hot Deck Imputation
Final Wgt
0.200
0.200
0.200
0.200
0.013
0.007
xi
1
2
2
2
2
2
yi
1
1
1
2
1
2
2015 JSM
10 / 19
Continuous y -variable (Cont’d)
Sample y -values: Two-phase sampling approach
1
2
Compute wij∗ for fully efficient fractional imputation. (Define zi and
compute π̂h|g for each g and h.)
Select M imputed values randomly using the selection probability
proportional to wij∗ .
Select M cells with replacement with the probability proportional to
π̂h|g and then select an element within each selected cell with
probability proportional to wi .
If a cell is selected twice, we select two donors from the cell.
The fractional weights equal to wij∗ = 1/M
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
11 / 19
Continuous y -variable (Cont’d)
Random FI: Two-phase sampling approach
Define dij = 1 if unit i is selected as a donor for unit j.
Let xj = g . Note that we have
dij =
H
X
(1) (2)
dh|j di|hg
h=1
where
(1)
EI (dh|j ) = π̂h|g ,
and
for xj = g
wi δi I (xi = g , zi = h)
k∈A wk δk I (xk = g , zk = h)
(2)
EI (di|hg ) = P
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
12 / 19
Replication variance estimation
Replicates for FEFI
(k)
µ̂y ,FEFI =
XX
g
(k) (k)
π̂gh ȳgh
h
(k)
=
replicated cell fraction for gh using wi
(k)
=
replicated cell mean
P
(k)
i∈A wi δi I (xi = g , zi = h)yi
P
(k)
i∈A wi δi I (xi = g , zi = h)
π̂gh
ȳgh
=
Fuller (ISU)
(k)
Fractional Hot Deck Imputation
2015 JSM
13 / 19
Replication variance estimation
Replicates for random FI
The fractional weights wij∗ = 1/M are replicated to reflect the
variation due to sampling of donors.
Imputed values remain the same
(k)
µ̂y ,FI
=
G X
X
(
(k)
wj I (xj
= g ) δj yj + (1 − δj )
g =1 j∈A
Fuller (ISU)
)
X
∗(k)
wij yi
i
Fractional Hot Deck Imputation
2015 JSM
14 / 19
Extension to Multivariate Case
Define categories for each variable
Estimate cell proportions πght
Select donors using the fractional weights for FEFI method
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
15 / 19
Simulation
Trivariate data,
Y1 ∼ U(0, 2),
Y2 = 1 + Y1 + e2 ,
e2 ∼ N(0, 1/4)
Y3 = 2 + Y1 + 0.5Y2 + e3 ,
e3 ∼ N(0, 1)
The response are determined by the Bernoulli with p = (0.5, 0.7, 0.9)
for (Y1 , Y2 , Y3 ), respectively.
Categorical transformation (basically with 3 categories) was used to
each of Y1 , Y2 , and Y3 .
B = 2, 000 simulation samples with size of n = 300.
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
16 / 19
Monte Carlo Results (2,000 samples)
Parameter
Mean Y1
Estimator
FEFI
FI
Std Variance
1.59
1.63
Rel Bias V̂ (µ̂)
-2.9
-2.8
Mean Y3
FEFI
FI
1.08
1.09
0.3
0.2
Proportion
P(Y1 < 1, Y2 < 2)
FEFI
FI
1.48
1.51
5.1
5.7
Std Variance: Relative to full sample
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
17 / 19
Summary
Imputation procedure approximates conditional distribution
Nearly unbiased under cell mean model
Replication variance estimator performed well
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
18 / 19
The end
Fuller (ISU)
Fractional Hot Deck Imputation
2015 JSM
19 / 19
Download