Propensity score adjustment with several followups Jae-Kwang Kim 1 Iowa State University August 1, 2012 1 Joint work with Jongho Im at Iowa State University Call For Papers Statistics and Its Interface (http://www.intlpress.com/SII/) Special issue on “missing data & imputation” Target date of publication: June 2013. Submitted paper will go through the same review process but the final decision will be made relatively quickly. Please be sure to mention the special issue on Missing data & Imputation at the time of submission. Kim (ISU) Combining data August 1, 2012 2 / 22 Outline 1 Introduction (with a motivating example) 2 Proposed method 3 Applications to a Korean Labor Force survey data 4 Discussion Kim (ISU) Combining data August 1, 2012 3 / 22 Propensity score adjustment with several followups 1. Introduction A Motivating Example 2009 Local Area Labor Force survey in Korea. Large scale survey with about n = 157K sample households. Obtain the employment status: Employed, Unemployed, Not in labor force. To obtain response, interviewers visit the sample households up to four times. That is, the current rule allows for three follow-ups. Each nonresponding household after the three follow-ups is replaced by a household randomly selected by a pre-specified rule. (substitution rule) Kim (ISU) Combining data August 1, 2012 4 / 22 Propensity score adjustment with several followups 1. Introduction A Motivating Example (Cont’d) Realized response rate (%) and the average unemployment rate (%) for the sample First Response at t-th visit No t = 1 t = 2 t = 3 t = 4 Response Response Rate 43.08 24.13 14.53 8.41 9.85 Ave. Unemp. Rate 1.81 1.98 2.08 2.15 ? Roughly speaking, early respondents have lower unemployment rates than late respondents. Kim (ISU) Combining data August 1, 2012 5 / 22 Propensity score adjustment with several followups 1. Introduction A Motivating Example (Cont’d) Drew and Fuller (1980) Employed Unemployed Not in LF First Response at t-th t=1 t=2 t=3 n11 n12 n13 n21 n22 n23 n31 n32 n33 visit t=4 n14 n24 n34 No Response n0 Drew and Fuller (1980) proposed a ML estimation based on the multinomial distribution where the cell probabilities are functions of the population proportion for category k, fraction of hardcore nonrespondents, and the conditional probability that the unit in category k responds when sampled. Kim (ISU) Combining data August 1, 2012 6 / 22 Propensity score adjustment with several followups 1. Introduction A Motivating Example (Cont’d) That is, Drew and Fuller (1980) assumed the following model E (nkt /n) = γ(1 − pk )t−1 pk fk 3 X E (n0 /n) = (1 − γ) + γ (1 − pk )4 fk k=1 where fk = 1−γ = pk = population proportion for category k fraction of hardcore nonresponse the conditional probability that the unit in category k responds when sampled Kim (ISU) Combining data August 1, 2012 7 / 22 Propensity score adjustment with several followups 1. Introduction Alho (1990) extends the method of Drew and Fuller (1980) to the case of continuous yi with a logistic regression model for the conditional probability of response pit = P (δit = 1 | δi,t−1 = 0, yi ) = exp (αt + φyi ) 1 + exp (αt + φyi ) where δit = 1 if yi is observed (or already known) at the t-th visit and δit = 0 otherwise. Parameter φ is estimated by maximizing a conditional likelihood. Parameter αt is estimated by solving an estimating equation that makes use of the realized response rate for each visit. Kim (ISU) Combining data August 1, 2012 8 / 22 Propensity score adjustment with several followups 2. Proposed method Basic Setup A: the set of indices for the original sample At : the set of indices for the units whose yi values are available after t-th visit. A1 ⊂ A2 ⊂ · · · ⊂ AT ⊂ A where T is the final visit time. It is another type of monotone missing. Kim (ISU) Combining data August 1, 2012 9 / 22 Propensity score adjustment with several followups 2. Proposed method Classical monotone missing structure A1 Kim (ISU) A2 Combining data A3 August 1, 2012 10 / 22 Propensity score adjustment with several followups 2. Proposed method Monotone missing structure for the follow-up samples A1 Kim (ISU) A2 A3 Combining data A August 1, 2012 11 / 22 Propensity score adjustment with several followups 2. Proposed method Basic Setup The goal is to find π̂i = π̂(yi ) such that θ̂ = X wi i∈AT 1 yi π̂i P is (approximately) unbiased for θ = E ( i∈A wi yi ). This weighting method is often called the propensity score weighting method. Kim (ISU) Combining data August 1, 2012 12 / 22 Propensity score adjustment with several followups 2. Proposed method Basic Setup (Cont’d) Let δit = 1 if i ∈ At and δit = 0 otherwise. The propensity score π̂i can be obtained from a model for πi = P(δiT = 1 | i ∈ A, yi ). Under the model of Alho (1990), we have π̂i = 1 − T Y (1 − p̂it ) t=1 where exp α̂t + φ̂yi . p̂it = 1 + exp α̂t + φ̂yi Note that π̂i → 1 if T → ∞. Kim (ISU) Combining data August 1, 2012 13 / 22 Propensity score adjustment with several followups 2. Proposed method Alho (1990) method 1 2 3 Variance estimation is complicated. (It was not covered in the paper.) Not directly applicable to complex survey sampling. Based on a particular choice of the response model logit (pit ) = αt + φyi . Kim (ISU) Combining data August 1, 2012 14 / 22 Propensity score adjustment with several followups 2. Proposed method A new approach Alho’s model Pr (δit = 1 | δi,t−1 = 0, yi ) = exp (αt + φyi ) 1 + exp (αt + φyi ) New model Pr (δit = 1 | δi,t+1 = 1, yi ) = exp (αt∗ + φ∗ yi ) 1 + exp (αt∗ + φ∗ yi ) The proposed model is based on the reverse of the multi-phase sampling structure in that At ⊂ At+1 . That is, we treat At as the second phase sample from At+1 . Kim (ISU) Combining data August 1, 2012 15 / 22 Propensity score adjustment with several followups 2. Proposed method Propensity score estimator under the new model θ̂ = X wi i∈AT where 1 yi π̂i c∗ yi ct∗ + φ exp α . π̂i = c∗ yi ct∗ + φ 1 + exp α The new model can be called reverse (two-phase) model. Kim (ISU) Combining data August 1, 2012 16 / 22 Propensity score adjustment with several followups 2. Proposed method Estimation of (αt∗ , φ∗ ) The new model (reverse model) can be generalized as Pr (δit = 1 | δi,t+1 = 1, yi ) = exp (αt∗ + φ∗t yi ) . 1 + exp (αt∗ + φ∗t yi ) ∗ and φ∗ , · · · , φ∗ Estimable parameters: α1∗ , · · · , αT 1 T −1 . ∗ ∗ For example, (αt , φt ) can be obtained by solving X wi {δit − pi (αt∗ , φ∗t )}(1, yi ) = (0, 0) i∈At+1 or by solving X i∈At+1 Kim (ISU) wi { δit − 1}(1, yi ) = (0, 0) pi (αt∗ , φ∗t ) Combining data August 1, 2012 17 / 22 Propensity score adjustment with several followups 2. Proposed method Estimation of (αt∗ , φ∗ ) Thus, we can test H0 : E (φ̂∗1 ) = E (φ̂∗2 ) = · · · = E (φ̂∗T −1 ) If we cannot reject H0 , then a common parameter φ∗ can be estimated by solving X wi {δit − pi (αt∗ , φ∗ )}(1, yi ) = (0, 0), t = 1, · · · , T − 1 i∈At+1 using the generalized method of moment (GMM). If H0 is rejected, then we may consider a model such as φ̂∗t = β0 + β1 t + et and use GMM estimation for (αt∗ , β0 , β1 ). Kim (ISU) Combining data August 1, 2012 18 / 22 Propensity score adjustment with several followups 3. Application to Korean LF survey data Realized Responses from the Korean LF survey data status Employment t=1 81,685 t=2 46,926 t=3 28,124 t=4 15,992 No response Unemployment 1,509 948 597 352 32,350 Not in LF 57,882 32,308 19,086 10,790 Kim (ISU) Combining data August 1, 2012 19 / 22 Propensity score adjustment with several followups 3. Application to Korean LF survey data Parameter estimation under generalized reverse model Pr (δit = 1 | δi,t+1 = 1, yi ) = exp (αt∗ + φ∗t yi ) , 1 + exp (αt∗ + φ∗t yi ) where yi = 1 if unemployed and yi = 0 otherwise. Result from data analysis t=1 t=2 t=3 Kim (ISU) c∗t φ -0.112 -0.112 -0.110 95 % C.I. for φ∗t (-0.191, -0.031) (-0.200, -0.025) (-0.219, -0.002) Combining data August 1, 2012 20 / 22 Propensity score adjustment with several followups 3. Application to Korean LF survey data Thus, we can safely use the following reverse model Pr (δit = 1 | δi,t+1 = 1, yi ) = exp (αt∗ + φ∗ yi ) , 1 + exp (αt∗ + φ∗ yi ) where yi = 1 if unemployed and yi = 0 otherwise. Estimate of φ∗ (obtained by GMM): c∗ = −0.110 with 95 % C.I. =(-0.163, -0.056). φ Estimation of θ Naive Alho New Kim (ISU) Unemployment Rate (%) 1.93 2.00 1.96 Combining data S.E. (×104 ) 3.47 N/A 3.42 August 1, 2012 21 / 22 Propensity score adjustment with several followups 4. Conclusion Nonignorable nonresponse is a challenging problem. Followup sample is often used to investigate the nonignorable nonresponse. If the followup sample also has missing data, then a response model is needed. Alho’s model may not hold in real data. The proposed model provides an alternative approach that is more flexible in the modeling and is also simple in estimation. Kim (ISU) Combining data August 1, 2012 22 / 22