Propensity score adjustment with several followups Jae-Kwang Kim August 1, 2012

advertisement
Propensity score adjustment with several followups
Jae-Kwang Kim
1
Iowa State University
August 1, 2012
1
Joint work with Jongho Im at Iowa State University
Call For Papers
Statistics and Its Interface (http://www.intlpress.com/SII/)
Special issue on “missing data & imputation”
Target date of publication: June 2013.
Submitted paper will go through the same review process but the
final decision will be made relatively quickly.
Please be sure to mention the special issue on Missing data &
Imputation at the time of submission.
Kim (ISU)
Combining data
August 1, 2012
2 / 22
Outline
1
Introduction (with a motivating example)
2
Proposed method
3
Applications to a Korean Labor Force survey data
4
Discussion
Kim (ISU)
Combining data
August 1, 2012
3 / 22
Propensity score adjustment with several followups
1. Introduction
A Motivating Example
2009 Local Area Labor Force survey in Korea.
Large scale survey with about n = 157K sample households.
Obtain the employment status: Employed, Unemployed, Not in labor
force.
To obtain response, interviewers visit the sample households up to
four times. That is, the current rule allows for three follow-ups.
Each nonresponding household after the three follow-ups is replaced
by a household randomly selected by a pre-specified rule.
(substitution rule)
Kim (ISU)
Combining data
August 1, 2012
4 / 22
Propensity score adjustment with several followups
1. Introduction
A Motivating Example (Cont’d)
Realized response rate (%) and the average unemployment rate (%)
for the sample
First Response at t-th visit
No
t = 1 t = 2 t = 3 t = 4 Response
Response Rate
43.08 24.13 14.53 8.41
9.85
Ave. Unemp. Rate 1.81
1.98
2.08
2.15
?
Roughly speaking, early respondents have lower unemployment rates
than late respondents.
Kim (ISU)
Combining data
August 1, 2012
5 / 22
Propensity score adjustment with several followups
1. Introduction
A Motivating Example (Cont’d)
Drew and Fuller (1980)
Employed
Unemployed
Not in LF
First Response at t-th
t=1 t=2 t=3
n11
n12
n13
n21
n22
n23
n31
n32
n33
visit
t=4
n14
n24
n34
No
Response
n0
Drew and Fuller (1980) proposed a ML estimation based on the
multinomial distribution where the cell probabilities are functions of
the population proportion for category k, fraction of hardcore
nonrespondents, and the conditional probability that the unit in
category k responds when sampled.
Kim (ISU)
Combining data
August 1, 2012
6 / 22
Propensity score adjustment with several followups
1. Introduction
A Motivating Example (Cont’d)
That is, Drew and Fuller (1980) assumed the following model
E (nkt /n) = γ(1 − pk )t−1 pk fk
3
X
E (n0 /n) = (1 − γ) + γ
(1 − pk )4 fk
k=1
where
fk
=
1−γ =
pk
=
population proportion for category k
fraction of hardcore nonresponse
the conditional probability that the unit in category k
responds when sampled
Kim (ISU)
Combining data
August 1, 2012
7 / 22
Propensity score adjustment with several followups
1. Introduction
Alho (1990) extends the method of Drew and Fuller (1980) to the
case of continuous yi with a logistic regression model for the
conditional probability of response
pit = P (δit = 1 | δi,t−1 = 0, yi ) =
exp (αt + φyi )
1 + exp (αt + φyi )
where δit = 1 if yi is observed (or already known) at the t-th visit and
δit = 0 otherwise.
Parameter φ is estimated by maximizing a conditional likelihood.
Parameter αt is estimated by solving an estimating equation that
makes use of the realized response rate for each visit.
Kim (ISU)
Combining data
August 1, 2012
8 / 22
Propensity score adjustment with several followups
2. Proposed method
Basic Setup
A: the set of indices for the original sample
At : the set of indices for the units whose yi values are available after
t-th visit.
A1 ⊂ A2 ⊂ · · · ⊂ AT ⊂ A
where T is the final visit time.
It is another type of monotone missing.
Kim (ISU)
Combining data
August 1, 2012
9 / 22
Propensity score adjustment with several followups
2. Proposed method
Classical monotone missing structure
A1
Kim (ISU)
A2
Combining data
A3
August 1, 2012
10 / 22
Propensity score adjustment with several followups
2. Proposed method
Monotone missing structure for the follow-up samples
A1
Kim (ISU)
A2
A3
Combining data
A
August 1, 2012
11 / 22
Propensity score adjustment with several followups
2. Proposed method
Basic Setup
The goal is to find π̂i = π̂(yi ) such that
θ̂ =
X
wi
i∈AT
1
yi
π̂i
P
is (approximately) unbiased for θ = E ( i∈A wi yi ).
This weighting method is often called the propensity score weighting
method.
Kim (ISU)
Combining data
August 1, 2012
12 / 22
Propensity score adjustment with several followups
2. Proposed method
Basic Setup (Cont’d)
Let δit = 1 if i ∈ At and δit = 0 otherwise.
The propensity score π̂i can be obtained from a model for
πi = P(δiT = 1 | i ∈ A, yi ).
Under the model of Alho (1990), we have
π̂i = 1 −
T
Y
(1 − p̂it )
t=1
where
exp α̂t + φ̂yi
.
p̂it =
1 + exp α̂t + φ̂yi
Note that π̂i → 1 if T → ∞.
Kim (ISU)
Combining data
August 1, 2012
13 / 22
Propensity score adjustment with several followups
2. Proposed method
Alho (1990) method
1
2
3
Variance estimation is complicated. (It was not covered in the paper.)
Not directly applicable to complex survey sampling.
Based on a particular choice of the response model
logit (pit ) = αt + φyi .
Kim (ISU)
Combining data
August 1, 2012
14 / 22
Propensity score adjustment with several followups
2. Proposed method
A new approach
Alho’s model
Pr (δit = 1 | δi,t−1 = 0, yi ) =
exp (αt + φyi )
1 + exp (αt + φyi )
New model
Pr (δit = 1 | δi,t+1 = 1, yi ) =
exp (αt∗ + φ∗ yi )
1 + exp (αt∗ + φ∗ yi )
The proposed model is based on the reverse of the multi-phase
sampling structure in that At ⊂ At+1 . That is, we treat At as the
second phase sample from At+1 .
Kim (ISU)
Combining data
August 1, 2012
15 / 22
Propensity score adjustment with several followups
2. Proposed method
Propensity score estimator under the new model
θ̂ =
X
wi
i∈AT
where
1
yi
π̂i
c∗ yi
ct∗ + φ
exp α
.
π̂i =
c∗ yi
ct∗ + φ
1 + exp α
The new model can be called reverse (two-phase) model.
Kim (ISU)
Combining data
August 1, 2012
16 / 22
Propensity score adjustment with several followups
2. Proposed method
Estimation of (αt∗ , φ∗ )
The new model (reverse model) can be generalized as
Pr (δit = 1 | δi,t+1 = 1, yi ) =
exp (αt∗ + φ∗t yi )
.
1 + exp (αt∗ + φ∗t yi )
∗ and φ∗ , · · · , φ∗
Estimable parameters: α1∗ , · · · , αT
1
T −1 .
∗
∗
For example, (αt , φt ) can be obtained by solving
X
wi {δit − pi (αt∗ , φ∗t )}(1, yi ) = (0, 0)
i∈At+1
or by solving
X
i∈At+1
Kim (ISU)
wi {
δit
− 1}(1, yi ) = (0, 0)
pi (αt∗ , φ∗t )
Combining data
August 1, 2012
17 / 22
Propensity score adjustment with several followups
2. Proposed method
Estimation of (αt∗ , φ∗ )
Thus, we can test
H0 : E (φ̂∗1 ) = E (φ̂∗2 ) = · · · = E (φ̂∗T −1 )
If we cannot reject H0 , then a common parameter φ∗ can be
estimated by solving
X
wi {δit − pi (αt∗ , φ∗ )}(1, yi ) = (0, 0), t = 1, · · · , T − 1
i∈At+1
using the generalized method of moment (GMM).
If H0 is rejected, then we may consider a model such as
φ̂∗t = β0 + β1 t + et and use GMM estimation for (αt∗ , β0 , β1 ).
Kim (ISU)
Combining data
August 1, 2012
18 / 22
Propensity score adjustment with several followups
3. Application to Korean LF survey data
Realized Responses from the Korean LF survey data
status
Employment
t=1
81,685
t=2
46,926
t=3
28,124
t=4
15,992
No response
Unemployment
1,509
948
597
352
32,350
Not in LF
57,882
32,308
19,086
10,790
Kim (ISU)
Combining data
August 1, 2012
19 / 22
Propensity score adjustment with several followups
3. Application to Korean LF survey data
Parameter estimation under generalized reverse model
Pr (δit = 1 | δi,t+1 = 1, yi ) =
exp (αt∗ + φ∗t yi )
,
1 + exp (αt∗ + φ∗t yi )
where yi = 1 if unemployed and yi = 0 otherwise.
Result from data analysis
t=1
t=2
t=3
Kim (ISU)
c∗t
φ
-0.112
-0.112
-0.110
95 % C.I. for φ∗t
(-0.191, -0.031)
(-0.200, -0.025)
(-0.219, -0.002)
Combining data
August 1, 2012
20 / 22
Propensity score adjustment with several followups
3. Application to Korean LF survey data
Thus, we can safely use the following reverse model
Pr (δit = 1 | δi,t+1 = 1, yi ) =
exp (αt∗ + φ∗ yi )
,
1 + exp (αt∗ + φ∗ yi )
where yi = 1 if unemployed and yi = 0 otherwise.
Estimate of φ∗ (obtained by GMM):
c∗ = −0.110 with 95 % C.I. =(-0.163, -0.056).
φ
Estimation of θ
Naive
Alho
New
Kim (ISU)
Unemployment Rate (%)
1.93
2.00
1.96
Combining data
S.E. (×104 )
3.47
N/A
3.42
August 1, 2012
21 / 22
Propensity score adjustment with several followups
4. Conclusion
Nonignorable nonresponse is a challenging problem.
Followup sample is often used to investigate the nonignorable
nonresponse.
If the followup sample also has missing data, then a response model is
needed.
Alho’s model may not hold in real data.
The proposed model provides an alternative approach that is more
flexible in the modeling and is also simple in estimation.
Kim (ISU)
Combining data
August 1, 2012
22 / 22
Download