Small Area Estimation Combining Information from Several Sources Jan 27, 2012

Small Area Estimation Combining Information from Several Sources Seunghwan Park and Jae-kwang Kim Jan 27, 2012 Ouline • Introduction • Basic Theory • Application to Korea LFS • Discussion Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 2 / 28 Introduction • Small Area estimation : want to provide reliable estimates for area with insufficient sample sizes. • Sample is not planned to give accurate direct estimators for the domains: domains with few or no sample observations. • Idea : Model can be used to borrow strength from other sources of information. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 3 / 28 Introduction • Motivation : want to combine several sources of information to get improved small area estimates. • How to improve the direct estimators using auxiliary variables, • from other independent survey data • from census data or administrative data. • In our study, • • • • Area-level model approach, Several sources of auxiliary information, A measurement error model. Using a Generalized Least Squares(GLS) method. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 4 / 28 Introduction • General Setup • • • • Interested variable : Yi Survey A: Directly compute Ŷi , subject to sampling error. Survey B: Compute X̂i1 , subject to sampling error . Census: Measures X̂i2 . • EA (Ŷi ) 6= EB (X̂i1 ) due to the structural differences between the surveys. • Structural differences (or systematic difference) • due to different mode of survey • due to time difference • due to frame difference • Goal: Improve estimation of Yi by incorporating various types of auxiliary information. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 5 / 28 Basic Theory • Two error models (for area i) • Sampling error model Ŷi,a = Yi + ai X̂i,b = Xi + bi where (ai , bi ) represents the sampling error such that ai bi ∼ 0 0 , V (ai ) Cov (ai , bi ) Cov (ai , bi ) V (bi ) • Structural error model Xi = β0 + β1 Yi + ei , Seunghwan Park and Jae-kwang Kim () Survey Sampling ei ∼ (0, σei2 ) Jan 27, 2012 6 / 28 Basic Theory • Structural error model describes the relationship between the two survey measurement up to sampling error. • Y : target measurement item (variable of primary interest) • X : inaccurate measurement of Y with possible systematic bias. • If both X and Y measure the same item (with different survey modes), structural error model is essentially a measurement error model. (β0 = 0, β1 = 1 means no measurement bias.) • Why consider Xi = β0 + β1 Yi + ei instead of Yi = β0 + β1 Xi + ei ? : We want to treat Yi fixed rather than treating Xi fixed. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 7 / 28 Basic Theory • If the parameters in the structural error model are known, Ŷi,b ≡ β1−1 (X̂i,b − β0 ) is also an unbiased estimator of Yi , computed from called survey B. Estimator Ŷi,b , using consistent (β̂0 , β̂1 ) is often called synthetic estimator. • Two main issues: • Prediction of Yi : GLS ( or GMM) approach. • Parameter estimation : Use the theory of measurement error model. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 8 / 28 Basic Theory Prediction • Recall GLS method: e ∼ (0, V ) y = Z θ + e, 0 θ̂GLS = (Z V −1 Z )−1 Z 0 V −1 y • GLS approach to combine two error models: e ∼ (0, V ) y = Z θ + e, ⇔ Ŷi,a β1−1 (X̂i,b − β0 ) = 1 1 where u1i = ai and u2i = β1−1 (bi + ei ). Thus, u1i 0 V (ai ) ∼ , u2i 0 β1−1 Cov (ai , bi ) Seunghwan Park and Jae-kwang Kim () Survey Sampling Yi + u1i u2i β1−1 Cov (ai , bi ) β1−2 (V (bi ) + σei2 ) Jan 27, 2012 9 / 28 Basic Theory Prediction • GLS estimator : Best linear unbiased estimator of Yi based on the linear combination of Ŷi,a and Ŷi,b = β1−1 (X̂i,b − β0 ). • Under the current setup, Ŷi∗ = wi Ŷi,a + (1 − wi )Ŷi,b where wi = σei2 + V (bi ) − β1 Cov (ai , bi ) σei2 + β12 V (ai ) + V (bi ) − 2β1 Cov (ai , bi ) • The GLS estimator is sometimes called composite estimator. In paractice we need to use β̂0 , β̂1 , and σ̂ei2 . Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 10 / 28 Basic Theory Parameter estimation • Parameter estimation for the structural model • Case 1: Matching measurement X and measurement Y is possible (e.g.: two phase sampling, Survey A sample is a subset of survey B sample.) • Case 2: Matching is not possible. • In case 1, we can easily obtain a consistent estimator of the model parameters from the set where units have both X and Y observed. (Unit level modeling) • In case 2, we may use area level model to link X̂i (from survey B) and Ŷi (from survey A) in the area level. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 11 / 28 Basic Theory Parameter estimation • The area-level model takes the form of measurement error model (Fuller, 1987) X̂i = β0 + β1 Yi + ei + bi Ŷi = Yi + ai • Parameter estimation can be performed using the measurement error model estimation methods. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 12 / 28 Korea LFS Application • Labor Force Survey : very important economic survey interested in estimating unemployment rates. • Several sources of information for unemployment of Korea Korean Labor Force Survey(KLF) data - 7K sample households (monthly) 2 Local Area Labor Force Survey(LALF) data - 200K sample households (quarterly) 3 Census data (10% of the population) 1 • KLF sample is nested within LALF sample. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 13 / 28 Korea LFS Application • Unemployment rate for small area is the parameter of interest. • Several sources of information for unemployment for analysis district area i. • Ŷi : estimates from KLF data • X̂1i : estimates from LALF data • X̂2i : estimates from census data • KLF : sampling error ↑, measurement error ↓. • LALF : sampling error ↓, measurement error ↑. • Census data : sampling error ↓, measurement error↑(no updated information). Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 14 / 28 Korea LFS Application • First, we can construct structural error models of area i in terms of population mean X̄1i = β0 + β1 Ȳi + ē1i , (1) 2 where (X̄1i , Ȳi , ē1i ) = Ni−1 ΣUi (x1j , yj , e1j ), ē1i ∼ (0, σe1 /Ni ). • Consider nested error model : e1i = i + ui , 2 i ∼ (0, σe1 ) ui ∼ (0, σu2 ) 2 then ē1i ∼ (0, σe1 + σu2 /Ni ) 2 • Since Ni is often quite large, we can assume ē1i ∼ (0, σe1 ) Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 15 / 28 Korea LFS Application • Sampling error model Ŷi X̂1i ȳi x̄1i − β0 = Yi X1i + Ni a i Ni bi (2) • Combining (1) and (2) = 1 β1 Ȳi + ai bi + ē1i (3) where (x̄1i , ȳi ) = Ni−1 (X̂1i , Ŷi ) • Vi Variance-covariance matrix of (ai , bi + ē1i )0 is Vi = Seunghwan Park and Jae-kwang Kim () V (ai ) Cov (ai , bi ) Cov (ai , bi ) V (bi ) + σe2 Survey Sampling Jan 27, 2012 16 / 28 Korea LFS Application • GLS estimator ŶiGLS = {(β1 , 1)Vi−1 (β1 , 1)0 }−1 (β1 , 1)Vi−1 (x̄1i − β0 , ȳi ) where Vi is the variance-covariance matrix of (ai , bi + ē1i )0 . • GLS estiamtor can be expressed as the composite estimator form Ŷicomp = αi ȳi + (1 − αi )ỹi where ỹi = β1−1 (x̄1i − β0 ) which is called synthetic estimator and αi = V (ỹi ) − Cov (ȳi , ỹi ) V (x̄i ) + V (ỹi ) − 2Cov (ȳi , ỹi ) • Ignoring the effect of estimating β V (Ŷicomp − Ȳi ) = αi V (ȳi ) + (1 − αi )Cov (ȳi , ỹi ) Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 17 / 28 Korea LFS Application • We can Consider also Census data. Then (3) changes to       ȳi 1 ai  x̄1i − β0  =  β1  Ȳi +  bi + ē1i  x̄2i − γ0 γ1 ē2i • Whole process is similar to the case combining two survey. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 18 / 28 Korea LFS Application Parameter estimation • A consistent estimator of (β0 , β1 ) : Minimize Q(β0 , β1 ) = H X (x̄1i − β0 − ȳi β1 )2 V (x̄1i − β0 − ȳi β1 ) i=1 where V (x̄1i − β0 − ȳi β1 ) = σe2 + β12 V (ai ) − 2β1 Cov (ai , bi ) + V (bi ). • Let wi (β1 ) = σe2 + β12 V (ai ) − 2β1 Cov (ai , bi ) + V (bi ) −1 . Then β̂0 β̂1 where (x̄w , ȳw ) = { = x̄w − β̂1 ȳw PH i=1 wi (β̂1 ){(ȳi − ȳw )(x̄1i − x̄w ) − Cov (ai , bi )} = PH 2 i=1 wi (β̂1 ){(ȳi − ȳw ) − V (ai )} PH i=1 wi (β̂1 )}−1 PH i=1 (4) (5) wi (β̂1 )(x̄i , ȳi ) • This solution can be obtained by iterative algorithm. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 19 / 28 Korea LFS Application Parameter estimation • Consider the method of moment estimator 2 E {(x̄1i − β0 − ȳi β1 )2 − β12 V (ai ) + 2β1 Cov (ai , bi ) − V (bi )} = σe1i • Under the nested error model 2 E {(x̄1i − β0 − ȳi β1 )2 − β12 V (ai ) + 2β1 Cov (ai , bi ) − V (bi )} = σe1 • Using the Fuller(2009) 2 σ̂e1 = H X n o ˆ (ai , bi ) − V̂ (bi ) κi (x̄1i − β̂0 − ȳi β̂1 )2 − β̂12 V̂ (ai ) + 2β̂1 Cov (6) i=1 n o−1 P 2 ˆ (ai , bi ) + V̂ (bi ) where κi ∝ σe1 + β̂12 V̂ (ai ) − 2β̂1 Cov and H i=1 κi = 1. 2 • We can also consider ēi ∼ (0, Ȳi σe1 ). Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 20 / 28 Korea LFS Application Parameter estimation • Iterative algorithm for parameter estimation. 2 = 0 using Compute the initial estimator of (β0 , β1 ) by setting σ̂e1 (4),(5). 2 2 Use the current value of (β̂0 , β̂1 ), compute σ̂e1 using (6). 2 3 Use the current value of σ̂e1 compute the updated estimator of (β0 , β1 ) using (4),(5). 4 Repeat step 2, step 3 until convergence. 1 Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 21 / 28 Korea LFS Application MSE estimation 2 • The actual prediction for Ȳi is computed by Ȳˆei = Ȳˆi (θ̂) where θ = (β0 , β1 , σe1 ). ˆ ) MSE (Ȳ ei = n o ˆ ) + E (Ȳ ˆ − Ȳ ˆ )2 MSE (Ȳ i ei i = M1i + M2i • Consider a jackknife approach, M̂2i = H H − 1 X ˆ (−k) ˆ 2 (Ȳi − Ȳi ) H k=1 (JK ) M̂1i = α̂i (JK ) where α̂i = α̂i − H−1 H Seunghwan Park and Jae-kwang Kim () P (JK ) V̂ (ai ) + (1 − α̂i (−k) k=1 (α̂i d (ai , bi ) )Cov − α̂i ) Survey Sampling Jan 27, 2012 22 / 28 Korea LFS Application Data analysis Result • Consider four estimates • KLF : Only KLF • LALF : Only LALF • GLS 1 : Combine KLF and LALF • GLS 2 : Combine KLF, LALF, and census data • MSE MSE KLF LALF GLS 1 GLS 2 1st Q 0.0000630 0.0001123 0.0000444 0.0000405 Seunghwan Park and Jae-kwang Kim () Median 0.0001210 0.0001330 0.0000738 0.0000543 Survey Sampling Mean 0.0002476 0.0001482 0.0000893 0.0000575 3rd Q 0.0002395 0.0001695 0.0001210 0.0000721 Jan 27, 2012 23 / 28 Discussion Modeling • Model specification was very difficult!. • We build models separately for urban and rural areas, which ares assigned based on the proportion of households engaged in agricultural business. • In KLF Survey, 25% of the whole areas have 0 unemployment rate due to the quite small sample size of individual area. • The areas which have 0 unemployment rate are excluded when parameters are estimated. • We have considered the structural model which has a 0 intercept. X̄1i = β1 Ȳi + ei • Mixture model or Zero-inflated regression model can be considered. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 24 / 28 Discussion Estimation d (ai , bi ) even though it • In real data set, there is no estimate of covariance term, Cov is not 0. • After calculating the covariance term, there exist a problem covariance matrix for some area is not positive definite. • Thus a smoothing covariance matrix procedure is essentially needed. • Consider reverse two-phase sampling design • From the finite population, we select the first-phase sample A1 of size n1 . • We select the second-phase sample A2 from U − A1 of size n2 . • The final sample is A = A1 ∪ A2 and size is n = n1 + n2 . Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 25 / 28 Discussion Estimation • In fact, LALF survey samples are augmented by an additional sampling procedure from KLF survey samples. • Use reversed two-phase sampling design properties, V (ai ) 1 1 1 2 − )Sy2 ∼ = Sy na N na 1 1 2 1 Sy = ( − )Sy2 ∼ = nb N nb 1 1 ∼ 1 Sy2 = ( − )Sy2 = nb N nb =( V (bi ) Cov (ai , bi ) • Sampling error variance V̂ (ai ) d (ai , bi ) Cov Seunghwan Park and Jae-kwang Kim () d (ai , bi ) Cov V̂ (bi ) ! ∼ = Survey Sampling V̂ (ai ) 1 nai /nbi nai /nbi nai /nbi Jan 27, 2012 26 / 28 Discussion Future work • Current MSE estimation formula does not consider smoothing variance matrix procedure. • To improve the approximation to asymptotic normality, we can consider a transformation of X̂i , Ŷi . • New MSE estimation formula for transformation case is under investigation. Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 27 / 28 Discussion Thank You ! Seunghwan Park and Jae-kwang Kim () Survey Sampling Jan 27, 2012 28 / 28

Small Area Estimation Combining Information from Several Sources Jan 27, 2012

Related documents

Products

Support

Small Area Estimation Combining Information from Several Sources Jan 27, 2012

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib