A measurement error model approach to small area estimation Jae-kwang Kim Spring, 2015

advertisement
A measurement error model approach to small area
estimation
Jae-kwang Kim
1
Spring, 2015
1
Joint work with Seunghwan Park and Seoyoung Kim
Ouline
• Introduction
• Basic Theory
• Application to Korean LFS
• Discussion
Jae-kwang Kim
Survey Sampling
Spring, 2015
2 / 26
Introduction
• Small Area estimation: want to provide reliable estimates for area with insufficient
sample sizes.
• Sample is not planned to give accurate direct estimators for the domains: domains
with few or no sample observations.
• Idea: Model can be used to borrow strength from other sources of information.
Jae-kwang Kim
Survey Sampling
Spring, 2015
3 / 26
Introduction
• Motivation: want to combine several sources of information to get improved small
area estimates.
• How to improve the direct estimators using auxiliary variables,
• from other independent survey data
• from census data or administrative data.
• In our study,
•
•
•
•
Area-level model approach,
Several sources of auxiliary information,
A measurement error model.
Using a Generalized Least Squares(GLS) method.
Jae-kwang Kim
Survey Sampling
Spring, 2015
4 / 26
Introduction
• General Setup
•
•
•
•
Study variable : Xi
Survey A: Directly compute X̂i , subject to sampling error.
Survey B: Compute Ŷi1 , subject to sampling error .
Census: Measures Ŷi2 .
• EA (X̂i ) 6= EB (Ŷi1 ) due to the structural differences between the surveys.
• Structural differences (or systematic difference)
• due to different mode of survey
• due to time difference
• due to frame difference
• Goal: Best prediction of Xi by incorporating various types of auxiliary information.
Jae-kwang Kim
Survey Sampling
Spring, 2015
5 / 26
Basic Steps
• Model specification: Measurement error model approach
• Best prediction: BLUP
• Parameter estimation: GLS method
• MSPE estimation
Jae-kwang Kim
Survey Sampling
Spring, 2015
6 / 26
Model: Measurement error model approach
• Two error models (for area i)
• Sampling error model
X̂i,a
= Xi + ai
Ŷi,b
= Yi + bi
where (ai , bi ) represents the sampling error such that
ai
bi
∼
0
0
,
V (ai )
Cov (ai , bi )
Cov (ai , bi )
V (bi )
• Structural error model
Yi = β0 + β1 Xi + ei ,
Jae-kwang Kim
Survey Sampling
ei ∼ (0, σei2 )
Spring, 2015
7 / 26
Model: measurement error model approach
• Structural error model describes the relationship between the two survey
measurement up to sampling error.
• X : target measurement item (variable of primary interest)
• Y : inaccurate measurement of X with possible systematic bias.
• If both X and Y measure the same item (with different survey modes), structural
error model is essentially a measurement error model. (β0 = 0, β1 = 1 means no
measurement bias.)
• Why consider Yi = β0 + β1 Xi + ei instead of Xi = β0 + β1 Yi + ei ? :
We want to explain Y in terms of X . (e.g. β0 = 0 and β1 = 1 means
no measurement bias)
2 Can handle several Y more easily.
1
Jae-kwang Kim
Survey Sampling
Spring, 2015
8 / 26
Prediction
• Recall GLS method:
y = Z θ + e,
e ∼ (0, V )
θ̂GLS = (Z 0 V −1 Z )−1 Z 0 V −1 y
• GLS approach to combine two error models:
e ∼ (0, V )
y = Z θ + e,
⇔
X̂i,a
β1−1 (Ŷi,b − β0 )
=
1
1
where u1i = ai and u2i = β1−1 (bi + ei ). Thus,
u1i
0
V (ai )
∼
,
u2i
0
β1−1 Cov (ai , bi )
Jae-kwang Kim
Survey Sampling
Xi +
u1i
u2i
β1−1 Cov (ai , bi )
β1−2 (V (bi ) + σei2 )
Spring, 2015
9 / 26
Prediction
• GLS method: Best linear unbiased estimator of Xi based on the linear combination
of X̂i,a and X̂i,b = β1−1 (Ŷi,b − β0 ).
• Under the current setup,
X̂i∗ = αi X̂i,a + (1 − αi )X̂i,b
where
αi =
σei2
σei2 + V (bi ) − β1 Cov (ai , bi )
+ β12 V (ai ) + V (bi ) − 2β1 Cov (ai , bi )
• The GLS estimator is sometimes called composite estimator. In practice we need
to use β̂0 , β̂1 , and σ̂ei2 .
Jae-kwang Kim
Survey Sampling
Spring, 2015
10 / 26
Parameter estimation
• The area-level model takes the form of measurement error model (Fuller, 1987)
Ŷi
= β0 + β1 Xi + ei + bi
X̂i
= Xi + ai
• We will consider generalized least squares (GLS) method for parameter estimation.
• GLS Estimation of β0 , β1 : Minimize
Q1 (β0 , β1 ) =
2
X Ŷi − β0 − β1 X̂i
i
(1)
V (Ŷi − β0 − β1 X̂i )
with respect to (β0 , β1 ).
Jae-kwang Kim
Survey Sampling
Spring, 2015
11 / 26
Parameter estimation (Cont’d)
• Since
V Ŷi − β0 − β1 X̂i = σei2 + (−β1 , 1) Σi (−β1 , 1)0 ,
where σei2 = V (ei ) and Σi = V {(ai , bi )0 }, we can write
2
X
Q ∗ (β0 , β1 ) =
wi (β1 ) Ŷi − β0 − β1 X̂i ,
(2)
(3)
i
−1
where wi (β1 ) = σei2 + (−β1 , 1) Σi (−β1 , 1)0
. Here, Σi is assumed to be known.
• Note that
∂
Q∗ = 0
∂β0
⇐⇒
X
wi (β1 ) Ŷi − β0 − β1 X̂i = 0
i
and so
where (x̄w , ȳw ) = {
Jae-kwang Kim
−1
i wi (β̂1 )}
P
β̂0 = ȳw − β̂1 x̄w ,
P
i wi (β̂1 )(X̂i , Ŷi ).
Survey Sampling
(4)
Spring, 2015
12 / 26
Plugging (4) into (3), we have only to minimize
n
o2
X
Q1∗ (β1 ) =
wi (β1 ) Ŷi − ȳw − β1 (X̂i − x̄w ) .
(5)
i
Thus, we need to find the solution to ∂Q1∗ /∂β1 = 0 where
n
o2
X ∂
∂
Q1∗ =
wi (β1 )
Ŷi − ȳw − β1 (X̂i − x̄w )
∂β1
∂β1
i
n
o
X
−2
wi (β1 )(X̂i − x̄w ) Ŷi − ȳw − β1 (X̂i − x̄w ) .
i
Using
∂
wi (β1 ) = −2 {wi (β1 )}2 {β1 V (ai ) − C (ai , bi )} ,
∂β1
and
n
Ŷ1i − ȳw − β1 (X̂i − x̄w )
o2
→p σei2 + (−β1 , 1) Σi (−β1 , 1)0 = 1/wi (β1 ),
the solution to ∂Q1∗ /∂β1 = 0 satisfies
P
i wi (β̂1 ) {(x̄i − x̄w ) (ȳi − ȳw ) − C (ai , bi )}
β̂1 =
.
P
2
i wi (β̂1 ) (x̄i − x̄w ) − V (ai )
Jae-kwang Kim
Survey Sampling
(6)
Spring, 2015
13 / 26
Parameter estimation: Estimation of σei2
• Assume σei2 = σe2 .
• We can also consider an alternative assumption such as σei2 = Xi σe2 , but in this
case, parametric model assumption is needed.
• In practice, one can consider a transformation T (·) such that the structural error
model becomes
T (Yi ) = β0 + β1 T (Xi ) + ei ,
ei ∼ (0, σe2 ).
• Method-of-moment estimator of σe2 : Solve
X
i
(Ŷi − β̂0 − X̂i β̂1 )2
= H − 2,
σe2 + (−β̂1 , 1)Σi (−β̂1 , 1)0
(7)
where H is the total number of small areas.
Jae-kwang Kim
Survey Sampling
Spring, 2015
14 / 26
Parameter estimation (Cont’d)
• Iterative algorithm for parameter estimation.
Compute the initial estimator of (β0 , β1 ) by setting σ̂e2 = 0.
2 Use the current value of (β̂0 , β̂1 ), compute σ̂e2 using (7).
2
3 Use the current value of σ̂e1
compute the updated estimator of
(β0 , β1 ), using (4) and (6).
4 Repeat step 2, step 3 until convergence.
1
Jae-kwang Kim
Survey Sampling
Spring, 2015
15 / 26
MSE estimation
• Recall the measurement error model structure
Ŷi
= β0 + β1 Xi + ei + bi
X̂i
= Xi + ai
• GLS estimator of Xi :
X̂i∗
=
{(β1 , 1)Vi−1 (β1 , 1)0 }−1 (β1 , 1)Vi−1 (Ŷi − β0 , X̂i )
=
αi X̂i + (1 − αi ){β1−1 (Ŷi − β0 )}
=
αi X̂i,a + (1 − αi )X̂i,b ,
where Vi is the variance-covariance matrix of (bi + ei , ai )0 and
αi =
σei2
σei2 + V (bi ) − β1 Cov (ai , bi )
+ β12 V (ai ) + V (bi ) − 2β1 Cov (ai , bi )
• MSE of X̂i∗ :
E {(X̂i∗
2
− Xi ) }
Jae-kwang Kim
=
n
o2 E
αi (X̂i,a − Xi ) + (1 − αi )(X̂i,b − Xi )
=
αi2 V (X̂i,a ) + (1 − αi )2 V (X̂i,b ) + 2αi (1 − αi )Cov (X̂i,a , X̂i,b )
=
αi V (X̂i,a ) + (1 − αi )Cov (X̂i,a , X̂i,b ) := M1i .
Survey Sampling
Spring, 2015
16 / 26
MSE estimation
• The actual prediction for Xi is computed by X̂ei∗ = X̂i∗ (θ̂) where θ = (β0 , β1 , σe2 ).
MSE (X̂ei∗ )
=
n
o
MSE (X̂i∗ ) + E (X̂ei∗ − X̂i∗ )2
=
M1i + M2i
• Consider a jackknife approach,
M̂2i =
H
H − 1 X ˆ (−k) ˆ 2
(Ȳi
− Ȳi )
H
k=1
(JK )
M̂1i = α̂i
(JK )
where α̂i
Jae-kwang Kim
= α̂i −
H−1
H
P
(JK )
V̂ (ai ) + (1 − α̂i
(−k)
k=1 (α̂i
d (ai , bi )
)Cov
− α̂i ).
Survey Sampling
Spring, 2015
17 / 26
Korean LFS Application
• Labor Force Survey: very important economic survey measuring unemployment
rates.
• Several sources of information for unemployment of Korea
Korean Labor Force Survey (KLF) data - 7K sample households
(monthly)
2 Local Area Labor Force Survey (LALF) data - 200K sample households
(quarterly)
3 Census long form data (10% of the population)
1
• KLF sample is nested within LALF sample.
Jae-kwang Kim
Survey Sampling
Spring, 2015
18 / 26
Korea LFS Application
• Unemployment rate for small area is the parameter of interest.
• Several sources of information for unemployment for analysis district area i.
• X̂i : estimates from KLF data
• Ŷ1i : estimates from LALF data
• Ŷ2i : estimates from census data
• KLF : sampling error ↑, measurement error ↓.
• LALF : sampling error ↓, measurement error ↑.
• Census data : sampling error ↓, measurement error ↑(no updated information).
Jae-kwang Kim
Survey Sampling
Spring, 2015
19 / 26
Korea LFS Application
• We can Consider also Census data. Then (3) changes to
 



ˆ
X̄
i
1
ai
 ˆ
 
β1  X̄i +  bi + ē1i 
 Ȳ1i − β0  =
ˆ
γ1
ē2i
Ȳ2i − γ0

• Whole process is similar to the case combining two survey.
Jae-kwang Kim
Survey Sampling
Spring, 2015
20 / 26
Figure: Plot of Unemployment Rate for KLF and LALF Survey for Urban Area
Jae-kwang Kim
Survey Sampling
Spring, 2015
21 / 26
Figure: Plot of Residuals against estimated values for Urban Area
Jae-kwang Kim
Survey Sampling
Spring, 2015
22 / 26
Korea LFS Application
Data analysis Result
• Consider four estimates
• KLF : Only KLF
• LALF : Only LALF
• GLS 1 : Combine KLF and LALF
• GLS 2 : Combine KLF, LALF, and census data
• MSE
MSE
KLF
LALF
GLS 1
GLS 2
Jae-kwang Kim
1st Q
0.0000630
0.0001123
0.0000444
0.0000405
Median
0.0001210
0.0001330
0.0000738
0.0000543
Survey Sampling
Mean
0.0002476
0.0001482
0.0000893
0.0000575
3rd Q
0.0002395
0.0001695
0.0001210
0.0000721
Spring, 2015
23 / 26
Discussion
• Model specification was very difficult!.
• We build models separately for urban and rural areas, which ares assigned based
on the proportion of households engaged in agricultural business.
• In KLF Survey, 25% of the whole areas have 0 unemployment rate due to the quite
small sample size of individual area.
• The areas which have 0 unemployment rate are excluded when
parameters are estimated.
• We have considered the structural model which has a 0 intercept.
Ȳ1i = β1 X̄i + ei
• Mixture model or Zero-inflated regression model can be considered.
Jae-kwang Kim
Survey Sampling
Spring, 2015
24 / 26
Summary
• Motivated by a real data, Korean Labor Force Survey in small area estimation
• GLS prediction approach under the area-level model
• Measurement error model for parameter estimation
• Instead of GLS approach, maximum likelihood approach is also possible under
parametric model assumptions.
Jae-kwang Kim
Survey Sampling
Spring, 2015
25 / 26
Reference
Kim, J.K., Park, S. and Kim, S. (2015). “Small area estimation combining information
from several sources,” Survey Methodology, In press.
Jae-kwang Kim
Survey Sampling
Spring, 2015
26 / 26
Download