A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline • Introduction • Basic Theory • Application to Korean LFS • Discussion Jae-kwang Kim Survey Sampling Spring, 2015 2 / 26 Introduction • Small Area estimation: want to provide reliable estimates for area with insufficient sample sizes. • Sample is not planned to give accurate direct estimators for the domains: domains with few or no sample observations. • Idea: Model can be used to borrow strength from other sources of information. Jae-kwang Kim Survey Sampling Spring, 2015 3 / 26 Introduction • Motivation: want to combine several sources of information to get improved small area estimates. • How to improve the direct estimators using auxiliary variables, • from other independent survey data • from census data or administrative data. • In our study, • • • • Area-level model approach, Several sources of auxiliary information, A measurement error model. Using a Generalized Least Squares(GLS) method. Jae-kwang Kim Survey Sampling Spring, 2015 4 / 26 Introduction • General Setup • • • • Study variable : Xi Survey A: Directly compute X̂i , subject to sampling error. Survey B: Compute Ŷi1 , subject to sampling error . Census: Measures Ŷi2 . • EA (X̂i ) 6= EB (Ŷi1 ) due to the structural differences between the surveys. • Structural differences (or systematic difference) • due to different mode of survey • due to time difference • due to frame difference • Goal: Best prediction of Xi by incorporating various types of auxiliary information. Jae-kwang Kim Survey Sampling Spring, 2015 5 / 26 Basic Steps • Model specification: Measurement error model approach • Best prediction: BLUP • Parameter estimation: GLS method • MSPE estimation Jae-kwang Kim Survey Sampling Spring, 2015 6 / 26 Model: Measurement error model approach • Two error models (for area i) • Sampling error model X̂i,a = Xi + ai Ŷi,b = Yi + bi where (ai , bi ) represents the sampling error such that ai bi ∼ 0 0 , V (ai ) Cov (ai , bi ) Cov (ai , bi ) V (bi ) • Structural error model Yi = β0 + β1 Xi + ei , Jae-kwang Kim Survey Sampling ei ∼ (0, σei2 ) Spring, 2015 7 / 26 Model: measurement error model approach • Structural error model describes the relationship between the two survey measurement up to sampling error. • X : target measurement item (variable of primary interest) • Y : inaccurate measurement of X with possible systematic bias. • If both X and Y measure the same item (with different survey modes), structural error model is essentially a measurement error model. (β0 = 0, β1 = 1 means no measurement bias.) • Why consider Yi = β0 + β1 Xi + ei instead of Xi = β0 + β1 Yi + ei ? : We want to explain Y in terms of X . (e.g. β0 = 0 and β1 = 1 means no measurement bias) 2 Can handle several Y more easily. 1 Jae-kwang Kim Survey Sampling Spring, 2015 8 / 26 Prediction • Recall GLS method: y = Z θ + e, e ∼ (0, V ) θ̂GLS = (Z 0 V −1 Z )−1 Z 0 V −1 y • GLS approach to combine two error models: e ∼ (0, V ) y = Z θ + e, ⇔ X̂i,a β1−1 (Ŷi,b − β0 ) = 1 1 where u1i = ai and u2i = β1−1 (bi + ei ). Thus, u1i 0 V (ai ) ∼ , u2i 0 β1−1 Cov (ai , bi ) Jae-kwang Kim Survey Sampling Xi + u1i u2i β1−1 Cov (ai , bi ) β1−2 (V (bi ) + σei2 ) Spring, 2015 9 / 26 Prediction • GLS method: Best linear unbiased estimator of Xi based on the linear combination of X̂i,a and X̂i,b = β1−1 (Ŷi,b − β0 ). • Under the current setup, X̂i∗ = αi X̂i,a + (1 − αi )X̂i,b where αi = σei2 σei2 + V (bi ) − β1 Cov (ai , bi ) + β12 V (ai ) + V (bi ) − 2β1 Cov (ai , bi ) • The GLS estimator is sometimes called composite estimator. In practice we need to use β̂0 , β̂1 , and σ̂ei2 . Jae-kwang Kim Survey Sampling Spring, 2015 10 / 26 Parameter estimation • The area-level model takes the form of measurement error model (Fuller, 1987) Ŷi = β0 + β1 Xi + ei + bi X̂i = Xi + ai • We will consider generalized least squares (GLS) method for parameter estimation. • GLS Estimation of β0 , β1 : Minimize Q1 (β0 , β1 ) = 2 X Ŷi − β0 − β1 X̂i i (1) V (Ŷi − β0 − β1 X̂i ) with respect to (β0 , β1 ). Jae-kwang Kim Survey Sampling Spring, 2015 11 / 26 Parameter estimation (Cont’d) • Since V Ŷi − β0 − β1 X̂i = σei2 + (−β1 , 1) Σi (−β1 , 1)0 , where σei2 = V (ei ) and Σi = V {(ai , bi )0 }, we can write 2 X Q ∗ (β0 , β1 ) = wi (β1 ) Ŷi − β0 − β1 X̂i , (2) (3) i −1 where wi (β1 ) = σei2 + (−β1 , 1) Σi (−β1 , 1)0 . Here, Σi is assumed to be known. • Note that ∂ Q∗ = 0 ∂β0 ⇐⇒ X wi (β1 ) Ŷi − β0 − β1 X̂i = 0 i and so where (x̄w , ȳw ) = { Jae-kwang Kim −1 i wi (β̂1 )} P β̂0 = ȳw − β̂1 x̄w , P i wi (β̂1 )(X̂i , Ŷi ). Survey Sampling (4) Spring, 2015 12 / 26 Plugging (4) into (3), we have only to minimize n o2 X Q1∗ (β1 ) = wi (β1 ) Ŷi − ȳw − β1 (X̂i − x̄w ) . (5) i Thus, we need to find the solution to ∂Q1∗ /∂β1 = 0 where n o2 X ∂ ∂ Q1∗ = wi (β1 ) Ŷi − ȳw − β1 (X̂i − x̄w ) ∂β1 ∂β1 i n o X −2 wi (β1 )(X̂i − x̄w ) Ŷi − ȳw − β1 (X̂i − x̄w ) . i Using ∂ wi (β1 ) = −2 {wi (β1 )}2 {β1 V (ai ) − C (ai , bi )} , ∂β1 and n Ŷ1i − ȳw − β1 (X̂i − x̄w ) o2 →p σei2 + (−β1 , 1) Σi (−β1 , 1)0 = 1/wi (β1 ), the solution to ∂Q1∗ /∂β1 = 0 satisfies P i wi (β̂1 ) {(x̄i − x̄w ) (ȳi − ȳw ) − C (ai , bi )} β̂1 = . P 2 i wi (β̂1 ) (x̄i − x̄w ) − V (ai ) Jae-kwang Kim Survey Sampling (6) Spring, 2015 13 / 26 Parameter estimation: Estimation of σei2 • Assume σei2 = σe2 . • We can also consider an alternative assumption such as σei2 = Xi σe2 , but in this case, parametric model assumption is needed. • In practice, one can consider a transformation T (·) such that the structural error model becomes T (Yi ) = β0 + β1 T (Xi ) + ei , ei ∼ (0, σe2 ). • Method-of-moment estimator of σe2 : Solve X i (Ŷi − β̂0 − X̂i β̂1 )2 = H − 2, σe2 + (−β̂1 , 1)Σi (−β̂1 , 1)0 (7) where H is the total number of small areas. Jae-kwang Kim Survey Sampling Spring, 2015 14 / 26 Parameter estimation (Cont’d) • Iterative algorithm for parameter estimation. Compute the initial estimator of (β0 , β1 ) by setting σ̂e2 = 0. 2 Use the current value of (β̂0 , β̂1 ), compute σ̂e2 using (7). 2 3 Use the current value of σ̂e1 compute the updated estimator of (β0 , β1 ), using (4) and (6). 4 Repeat step 2, step 3 until convergence. 1 Jae-kwang Kim Survey Sampling Spring, 2015 15 / 26 MSE estimation • Recall the measurement error model structure Ŷi = β0 + β1 Xi + ei + bi X̂i = Xi + ai • GLS estimator of Xi : X̂i∗ = {(β1 , 1)Vi−1 (β1 , 1)0 }−1 (β1 , 1)Vi−1 (Ŷi − β0 , X̂i ) = αi X̂i + (1 − αi ){β1−1 (Ŷi − β0 )} = αi X̂i,a + (1 − αi )X̂i,b , where Vi is the variance-covariance matrix of (bi + ei , ai )0 and αi = σei2 σei2 + V (bi ) − β1 Cov (ai , bi ) + β12 V (ai ) + V (bi ) − 2β1 Cov (ai , bi ) • MSE of X̂i∗ : E {(X̂i∗ 2 − Xi ) } Jae-kwang Kim = n o2 E αi (X̂i,a − Xi ) + (1 − αi )(X̂i,b − Xi ) = αi2 V (X̂i,a ) + (1 − αi )2 V (X̂i,b ) + 2αi (1 − αi )Cov (X̂i,a , X̂i,b ) = αi V (X̂i,a ) + (1 − αi )Cov (X̂i,a , X̂i,b ) := M1i . Survey Sampling Spring, 2015 16 / 26 MSE estimation • The actual prediction for Xi is computed by X̂ei∗ = X̂i∗ (θ̂) where θ = (β0 , β1 , σe2 ). MSE (X̂ei∗ ) = n o MSE (X̂i∗ ) + E (X̂ei∗ − X̂i∗ )2 = M1i + M2i • Consider a jackknife approach, M̂2i = H H − 1 X ˆ (−k) ˆ 2 (Ȳi − Ȳi ) H k=1 (JK ) M̂1i = α̂i (JK ) where α̂i Jae-kwang Kim = α̂i − H−1 H P (JK ) V̂ (ai ) + (1 − α̂i (−k) k=1 (α̂i d (ai , bi ) )Cov − α̂i ). Survey Sampling Spring, 2015 17 / 26 Korean LFS Application • Labor Force Survey: very important economic survey measuring unemployment rates. • Several sources of information for unemployment of Korea Korean Labor Force Survey (KLF) data - 7K sample households (monthly) 2 Local Area Labor Force Survey (LALF) data - 200K sample households (quarterly) 3 Census long form data (10% of the population) 1 • KLF sample is nested within LALF sample. Jae-kwang Kim Survey Sampling Spring, 2015 18 / 26 Korea LFS Application • Unemployment rate for small area is the parameter of interest. • Several sources of information for unemployment for analysis district area i. • X̂i : estimates from KLF data • Ŷ1i : estimates from LALF data • Ŷ2i : estimates from census data • KLF : sampling error ↑, measurement error ↓. • LALF : sampling error ↓, measurement error ↑. • Census data : sampling error ↓, measurement error ↑(no updated information). Jae-kwang Kim Survey Sampling Spring, 2015 19 / 26 Korea LFS Application • We can Consider also Census data. Then (3) changes to ˆ X̄ i 1 ai ˆ β1 X̄i + bi + ē1i Ȳ1i − β0 = ˆ γ1 ē2i Ȳ2i − γ0 • Whole process is similar to the case combining two survey. Jae-kwang Kim Survey Sampling Spring, 2015 20 / 26 Figure: Plot of Unemployment Rate for KLF and LALF Survey for Urban Area Jae-kwang Kim Survey Sampling Spring, 2015 21 / 26 Figure: Plot of Residuals against estimated values for Urban Area Jae-kwang Kim Survey Sampling Spring, 2015 22 / 26 Korea LFS Application Data analysis Result • Consider four estimates • KLF : Only KLF • LALF : Only LALF • GLS 1 : Combine KLF and LALF • GLS 2 : Combine KLF, LALF, and census data • MSE MSE KLF LALF GLS 1 GLS 2 Jae-kwang Kim 1st Q 0.0000630 0.0001123 0.0000444 0.0000405 Median 0.0001210 0.0001330 0.0000738 0.0000543 Survey Sampling Mean 0.0002476 0.0001482 0.0000893 0.0000575 3rd Q 0.0002395 0.0001695 0.0001210 0.0000721 Spring, 2015 23 / 26 Discussion • Model specification was very difficult!. • We build models separately for urban and rural areas, which ares assigned based on the proportion of households engaged in agricultural business. • In KLF Survey, 25% of the whole areas have 0 unemployment rate due to the quite small sample size of individual area. • The areas which have 0 unemployment rate are excluded when parameters are estimated. • We have considered the structural model which has a 0 intercept. Ȳ1i = β1 X̄i + ei • Mixture model or Zero-inflated regression model can be considered. Jae-kwang Kim Survey Sampling Spring, 2015 24 / 26 Summary • Motivated by a real data, Korean Labor Force Survey in small area estimation • GLS prediction approach under the area-level model • Measurement error model for parameter estimation • Instead of GLS approach, maximum likelihood approach is also possible under parametric model assumptions. Jae-kwang Kim Survey Sampling Spring, 2015 25 / 26 Reference Kim, J.K., Park, S. and Kim, S. (2015). “Small area estimation combining information from several sources,” Survey Methodology, In press. Jae-kwang Kim Survey Sampling Spring, 2015 26 / 26