Parametric Bootstrap Procedure for Small Area Prediction Variance

advertisement
Parametric Bootstrap Procedure for Small Area
Prediction Variance
Andreea L. Erciulescu and Wayne A. Fuller
Department of Statistics, Iowa State University
Center for Survey Statistics and Methodology
Thanks ASA SRMS for Travel Scholarship
This research was partially supported by USDA NRCS CESU agreement 68-7482-11-534
August 6, 2014
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
1 / 19
Unit Level Generalized Linear Mixed Models
Consider the unit level generalized linear mixed model (ULGLMM)
yij
xij
µ̃xi
= g (xij , β, bi ) + eij ,
= µx + δ i + ij =: µxi + ij ,
= µxi + ui ,
• i = 1, ..., m, where m denotes the number of areas
• j = 1, ..., ni , where ni denotes the number of units within area i
• (yij , xij ) is the vector of observed realizations
• (bi , δ i , ui , eij , ij ) is the vector of unobserved random variables
• β is the vector of fixed effects coefficients
• In addition to xij , a vector of auxiliary information, µ̃xi , is also
available.
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
2 / 19
ULGLMM Estimation and Prediction
Objectives:
• To predict the small area mean of y
Z
θi =
g (xij , β, bi )dFxi (x),
where Fxi (x) is the distribution of x in area i
• To estimate the prediction mean squared error, E (θ̂i − θi )2 , where θ̂i
is the predictor
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
3 / 19
ULGLMM Estimation and Prediction
• The nature of the estimation-prediction problem is determined by the
distributional properties of the vector (bi , δ i , ui , ij ).
• We consider models with
bi
δi
ui
ij
∼
∼
∼
∼
ind
ind
ind
ind
fb (0, σb2 )
fδ (0, σδ2 )
fu (0, σu2 ), σu2 known
Fxi (0, σ2 )
and (bi , δ i , ui , ij ) mutually independent.
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
4 / 19
Auxiliary information
• Known distribution of x
• Known form of distribution of x
• Known covariate mean µxi
• Unknown random covariate mean µxi , unknown µ̃xi
• Unknown random covariate mean µxi , random µ̃xi
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
5 / 19
ULNM Small Area Mean Predictions
Known Covariate Mean µxi
• Fit the linear area model for x and estimate σ2
h
i
R
• θ̂i = E θ̂(b)|(xi , yi ) , θ̂(b) = x g (x, b)d F̂x (x)
Qni
i )dFxi (x)
t=1 f (yit |bi )f (xit |µxi )fb (bi )dbi
b x g (xijR, bQ
ni
t=1 f (yit |bi )f (xit |µxi )fb (bi )dbi
b
R R
θ̂i =
• In some finite population situations, the entire finite population of x
values may be known and the integral in (??) is the sum over the
population
• In practice it is often necessary to estimate the parameters of the
distributions fb , Fxi
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
6 / 19
ULNM Small Area Mean Predictions
Unknown Random Covariate Mean µxi , Unknown µ̃xi
• Fit the linear area model for x and estimate (µx , σ2 , σδ2 )
h
i
• θ̃i,1 = E θ̂(b, δ)|(xi , yi ) , θ̂(b, δ) =
R
g (µ̂x + δ + , b)d F̂ ()
Q i
θ̂(b, δ) nt=1
f (yit |xit , bi )f (xit |δi )d F̂δi (δ)d F̂bi (b)
,
R R Qn i
t=1 f (yit |xit , bi )f (xit |δi )d F̂δi (δ)d F̂bi (b)
b δ
R R
θ̃i,1 =
b δ
where
• F̂ and F̂δ are estimators of F and Fδ , respectively, with estimated
σ , σδ based on a model fit for xij
• F̂b is the estimator of Fb with estimated σb based on a model fit for
yij
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
7 / 19
ULNM Small Area Mean Predictions
Unknown Random Covariate Mean µxi , Random µ̃xi
• Fit the linear area model for (x, µ̃xi ) and estimate µx , σ2 , σδ2 )
h
i
• θ̃i,2 = E θ̂(b, δ)|(xi , yi , µ̃xi ) , θ̂(b, δ) =
R
g (µ̂x + δ + , b)d F̂ ()
Q i
θ̂(b, δ) nt=1
f (yit |xit , bi )f (xit |δi )f (µ̃it |δi )d F̂δi (δ)d F̂bi (b)
,
R R Qni
t=1 f (yit |xit , bi )f (xit |δi )f (µ̃it |δi )d F̂δi (δ)d F̂bi (b)
b δ
R R
θ̃i,2 =
b δ
where
• F̂ and F̂δ are estimators of F and Fδ , respectively, with estimated
σ , σδ based on a model fit for x̃ij = (xij , µ̃xi )
• F̂b is the estimator of Fb with estimated σb based on a model fit for
yij
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
8 / 19
Parametric Bootstrap Notation
• ψ = (β, µx , σb2 , σδ2 , σ2 ) is the parameter vector that defines the
distribution of the sample observations
• ψ̂ is an estimator of ψ
• ψ ∗ is a parametric bootstrap (simulation) estimator of ψ
• Data generator denoted DG (ψ, r ), r is a random number seed
• α = MSE of the prediction error for an area
• α∗ = (θ̂i∗ − θi∗ )2 be a level-one parametric bootstrap (simulation)
estimator of α
• α∗∗ = (θ̂i∗∗ − θi∗∗ )2 be a level-two parametric bootstrap (simulation)
estimator of α
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
9 / 19
Parametric Bootstrap Estimators
Level-one bootstrap estimator, B1 samples generated using DG (ψ̂, r1,k )
α̂i∗
= B1
−1
B1
X
∗
αi,k
k=1
Double bootstrap, B2 samples generate using DG (ψ ∗k , rk,t )
ˆ i,k
Bias
∗∗
αi,k
α̂i∗∗
P 2
∗∗
∗
= B2−1 B
t=1 (αi,k,t − αi,k )
∗
∗
ˆ i,k = 2α − ᾱ∗∗
= αi,k − Bias
P i,k
PB2i,k ∗∗
1
= 2ᾱi∗ − B1−1 B2−1 B
t=1 αi,k,t
k=1
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
10 / 19
Parametric Double Bootstrap Procedures
• Fast double bootstrap, B2 = 1 (Davidson and MacKinnon, 2007)
∗∗
• Generate one αi,k
using DG (ψ ∗k , r2,k )
∗∗
α̂i,C
= B1−1
B1
X
∗
∗∗
(2αi,k
− αi,k
).
k=1
• Fast double bootstrap, telescoping, B2 = 1
∗
• Generate αi,k
using DG (ψ̂, r1,k )
∗∗
• Generate αi,k
using DG (ψ ∗ , r1,k+1 )
∗∗
α̂i,T
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
=
(B1 − 1)−1
PB1 −1
k=1
∗
∗
∗∗
(αi,k
+ αi,k+1
− αi,k
).
August 6, 2014
11 / 19
Simulation
Generation model
• ULGLMM, yij |bi ∼ Binomial(pij ),
pij =
exp(−0.8 + xij + bi )
, bi ∼ N(0, σb2 = 0.25),
1 + exp(−0.8 + xij + bi )
• m = 36 areas, ni ∈ {2, 10, 40} units within area i
• bi and xij are mutually independent
• xij = (1, xij )
• µx = 0
• δi ∼ NI (0, σδ2 = 0.16)
• ui ∼ NI (0, σu2 = 0.036)
• µxi = µx + δi , µ̃xi = µx + δi + ui
• xij |µxi ∼ NI (µxi , σ2 = 0.36)
Draw 400MC samples and 50B1 , 1B2 bootstrap samples
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
12 / 19
Bootstrap Estimators
For each Monte Carlo sample,
• Compute the REML estimator ψ̂
∗ , δ ∗ , x ∗ , y ∗ ) using DG (ψ̂, r )
• Generate (bi,k
1,k
i,k ij,k ij,k
• Level-one estimator
B
X
∗
α̂i∗ = B −1
αi,k
k=1
∗∗ , δ ∗∗ , x ∗∗ , y ∗∗ ) using DG (ψ ∗ , r )
• Generate (bi,k
2,k
i,k ij,k ij,k
• Level-two classic estimator
α̂C∗∗,i = (B − 1)−1
B−1
X
∗
∗∗
(2αi,k
− αi,k
)
k=1
∗∗ , δ ∗∗ , x ∗∗ , y ∗∗ )
(bi,k
i,k ij,k ij,k
• Generate
using DG (ψ ∗ , r1,k+1 )
• Level-two telescoping estimator
∗∗
−1
α̂T
,i = (B − 1)
B−1
X
∗
∗
∗∗
(αi,k
+ αi,k+1
− αi,k+1
).
k=1
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
13 / 19
Monte Carlo Prediction MSE (×103 )
Size
2
10
40
(ȳ − θ)
101.91
(1.09)
20.66
(0.27)
5.17
(0.07)
(θ̂ − θ)1
9.18
(0.18)
7.28
(0.16)
3.69
(0.07)
(θ̃1 − θ)2
13.14
(0.26)
8.18
(0.18)
3.83
(0.08)
(θ̃2 − θ)3
10.64
(0.22)
7.68
(0.17)
3.76
(0.08)
Model 1, known µxi
Model 2, random µxi , no µ̃xi
Model 3, random µxi , random µ̃xi
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
14 / 19
Monte Carlo Properties of Prediction MSE (%)
Model 3, random µxi , random µ̃xi
Size
2
10
40
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
Rel
Rel
Rel
Rel
Rel
Rel
Bias
Sd
Bias
Sd
Bias
Sd
α̂∗
-14.57
38.92
-13.19
30.69
-7.48
20.06
∗∗
α̂T
-9.43
45.12
-6.82
36.5
-1.93
23.31
August 6, 2014
15 / 19
Variance Components (×106 ) in Prediction MSE
Model 3, random µxi , random µ̃xi , ni = 2
(B1 , B2 ) =
Source
Between
Within
Total
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
(100, 1)
Level2 T
24.14
0.45
24.59
(100, 1)
Level2 C
24.14
0.69
24.83
(20, 10)
Level 2 C
24.14
1.69
25.83
August 6, 2014
16 / 19
End
Thank you!
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
17 / 19
References
Battese, G.E., Harter, R.M. and Fuller, W.A. (1988), ”An error component model for prediction of county crop areas
using survey and satellite data. ” Journal of the American Statistical Association
Berrige D. and Crouchley R. (2011), Multivariate Generalized Linear Mixed Models Using R, CRC Press.
Beale, E.M.L. (1962). ”Some uses of computers in operations research.” Industrielle Organisation, 31, 51-52.
Datta, G.S. and Lahiri, P. (2000), ”A unified measure of uncertainty of best linear unbiased predictors in small area
estimation problems,” Statistica Sinica, 10, 613-627.
Datta, G.S., Rao, J.N.K. and Smith, D. (2005) ”On measuring the variability of small area estimators under a basic area
level model. ” Biometrika, 92, 183-196.
Datta, G.S., Rao, J.N.K. and Smith, D. (2012) ”Amendments and Corrections: On measuring the variability of small
area estimators under a basic area level model, ” Biometrika, 99, 2, 509.
Davidson, R. and MacKinnon, J.G. (2007), ”Improving the reliability of bootstrap tests with the fast double bootstrap,”
Computational Statistics and Data Analysis, 51, 3259-3281.
Erciulescu, A.L. and Fuller, W.A. (2013), ”Small Area Prediction of the Mean of a Binomial Random Variable,” Survey
Research Methods Section, JSM Proceedings, 855-863.
Hall P. and Maiti T. (2006), ”On parametric bootstrap methods for small area prediction,” J.R. Statist. Soc. B, 68, 2,
221-238.
Harville D.A., (1985), ”Decomposition of Prediction Error,” Journal of the American Statistical Association, 80, 389,
132-138.
Jeong K.M., Son J. (2009), ”Estimation of Small Area Proportions Based on Logistic Mixed Model,” The Korean
Journal of Applied Statistics, 22(1), 153-161.
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
18 / 19
References
Kackar, R. and Harville, D.A. (1984) ”Approximations for standard errors of estimators of fixed and random effects in
mixed linear models, ” Journal of the American Statistical Association, 79, 853-862.
Pfeffermann, D. and Correa, S. (2012).”Empirical bootstrap bias correction and estimation of prediction mean square
error in small area estimation.” Biometrika, 99, 457-472.
Pfeffermann D, Glickman H. (2004), ”Mean Square Error Approximation in Small Area Estimation by Use of Parametric
and Nonparametric Bootstrap,” ASA Section on Survey Research Methods, 4167-4178.
Prasad, N.G.N. and Rao, J.N.K. (1990), ”The estimation of mean squared error of small area estimators.,” Journal of
the American Statistical Association, 85, 163-171.
Rao J.N.K. (2003), ”Small Area Estimation,” Wiley Series in Survey Methodology.
Stroup, W.W. (2012), ”Generalized Linear Mixed Models: Modern Concepts, Methods and Applications,” Chapman and
Hall, CRC Texts in Statistical Science, Taylor and Francis.
Wang, J. and Fuller, W.A. (2003),”The mean squared error of small area spedictors constructed with estimated area
variances,” Journal of the American Statistical Association, 98, 716-723.
Andreea L. Erciulescu, Wayne A. Fuller (ISU)
August 6, 2014
19 / 19
Download