Parametric Bootstrap Procedure for Small Area Prediction Variance Andreea L. Erciulescu and Wayne A. Fuller Department of Statistics, Iowa State University Center for Survey Statistics and Methodology Thanks ASA SRMS for Travel Scholarship This research was partially supported by USDA NRCS CESU agreement 68-7482-11-534 August 6, 2014 Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 1 / 19 Unit Level Generalized Linear Mixed Models Consider the unit level generalized linear mixed model (ULGLMM) yij xij µ̃xi = g (xij , β, bi ) + eij , = µx + δ i + ij =: µxi + ij , = µxi + ui , • i = 1, ..., m, where m denotes the number of areas • j = 1, ..., ni , where ni denotes the number of units within area i • (yij , xij ) is the vector of observed realizations • (bi , δ i , ui , eij , ij ) is the vector of unobserved random variables • β is the vector of fixed effects coefficients • In addition to xij , a vector of auxiliary information, µ̃xi , is also available. Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 2 / 19 ULGLMM Estimation and Prediction Objectives: • To predict the small area mean of y Z θi = g (xij , β, bi )dFxi (x), where Fxi (x) is the distribution of x in area i • To estimate the prediction mean squared error, E (θ̂i − θi )2 , where θ̂i is the predictor Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 3 / 19 ULGLMM Estimation and Prediction • The nature of the estimation-prediction problem is determined by the distributional properties of the vector (bi , δ i , ui , ij ). • We consider models with bi δi ui ij ∼ ∼ ∼ ∼ ind ind ind ind fb (0, σb2 ) fδ (0, σδ2 ) fu (0, σu2 ), σu2 known Fxi (0, σ2 ) and (bi , δ i , ui , ij ) mutually independent. Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 4 / 19 Auxiliary information • Known distribution of x • Known form of distribution of x • Known covariate mean µxi • Unknown random covariate mean µxi , unknown µ̃xi • Unknown random covariate mean µxi , random µ̃xi Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 5 / 19 ULNM Small Area Mean Predictions Known Covariate Mean µxi • Fit the linear area model for x and estimate σ2 h i R • θ̂i = E θ̂(b)|(xi , yi ) , θ̂(b) = x g (x, b)d F̂x (x) Qni i )dFxi (x) t=1 f (yit |bi )f (xit |µxi )fb (bi )dbi b x g (xijR, bQ ni t=1 f (yit |bi )f (xit |µxi )fb (bi )dbi b R R θ̂i = • In some finite population situations, the entire finite population of x values may be known and the integral in (??) is the sum over the population • In practice it is often necessary to estimate the parameters of the distributions fb , Fxi Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 6 / 19 ULNM Small Area Mean Predictions Unknown Random Covariate Mean µxi , Unknown µ̃xi • Fit the linear area model for x and estimate (µx , σ2 , σδ2 ) h i • θ̃i,1 = E θ̂(b, δ)|(xi , yi ) , θ̂(b, δ) = R g (µ̂x + δ + , b)d F̂ () Q i θ̂(b, δ) nt=1 f (yit |xit , bi )f (xit |δi )d F̂δi (δ)d F̂bi (b) , R R Qn i t=1 f (yit |xit , bi )f (xit |δi )d F̂δi (δ)d F̂bi (b) b δ R R θ̃i,1 = b δ where • F̂ and F̂δ are estimators of F and Fδ , respectively, with estimated σ , σδ based on a model fit for xij • F̂b is the estimator of Fb with estimated σb based on a model fit for yij Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 7 / 19 ULNM Small Area Mean Predictions Unknown Random Covariate Mean µxi , Random µ̃xi • Fit the linear area model for (x, µ̃xi ) and estimate µx , σ2 , σδ2 ) h i • θ̃i,2 = E θ̂(b, δ)|(xi , yi , µ̃xi ) , θ̂(b, δ) = R g (µ̂x + δ + , b)d F̂ () Q i θ̂(b, δ) nt=1 f (yit |xit , bi )f (xit |δi )f (µ̃it |δi )d F̂δi (δ)d F̂bi (b) , R R Qni t=1 f (yit |xit , bi )f (xit |δi )f (µ̃it |δi )d F̂δi (δ)d F̂bi (b) b δ R R θ̃i,2 = b δ where • F̂ and F̂δ are estimators of F and Fδ , respectively, with estimated σ , σδ based on a model fit for x̃ij = (xij , µ̃xi ) • F̂b is the estimator of Fb with estimated σb based on a model fit for yij Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 8 / 19 Parametric Bootstrap Notation • ψ = (β, µx , σb2 , σδ2 , σ2 ) is the parameter vector that defines the distribution of the sample observations • ψ̂ is an estimator of ψ • ψ ∗ is a parametric bootstrap (simulation) estimator of ψ • Data generator denoted DG (ψ, r ), r is a random number seed • α = MSE of the prediction error for an area • α∗ = (θ̂i∗ − θi∗ )2 be a level-one parametric bootstrap (simulation) estimator of α • α∗∗ = (θ̂i∗∗ − θi∗∗ )2 be a level-two parametric bootstrap (simulation) estimator of α Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 9 / 19 Parametric Bootstrap Estimators Level-one bootstrap estimator, B1 samples generated using DG (ψ̂, r1,k ) α̂i∗ = B1 −1 B1 X ∗ αi,k k=1 Double bootstrap, B2 samples generate using DG (ψ ∗k , rk,t ) ˆ i,k Bias ∗∗ αi,k α̂i∗∗ P 2 ∗∗ ∗ = B2−1 B t=1 (αi,k,t − αi,k ) ∗ ∗ ˆ i,k = 2α − ᾱ∗∗ = αi,k − Bias P i,k PB2i,k ∗∗ 1 = 2ᾱi∗ − B1−1 B2−1 B t=1 αi,k,t k=1 Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 10 / 19 Parametric Double Bootstrap Procedures • Fast double bootstrap, B2 = 1 (Davidson and MacKinnon, 2007) ∗∗ • Generate one αi,k using DG (ψ ∗k , r2,k ) ∗∗ α̂i,C = B1−1 B1 X ∗ ∗∗ (2αi,k − αi,k ). k=1 • Fast double bootstrap, telescoping, B2 = 1 ∗ • Generate αi,k using DG (ψ̂, r1,k ) ∗∗ • Generate αi,k using DG (ψ ∗ , r1,k+1 ) ∗∗ α̂i,T Andreea L. Erciulescu, Wayne A. Fuller (ISU) = (B1 − 1)−1 PB1 −1 k=1 ∗ ∗ ∗∗ (αi,k + αi,k+1 − αi,k ). August 6, 2014 11 / 19 Simulation Generation model • ULGLMM, yij |bi ∼ Binomial(pij ), pij = exp(−0.8 + xij + bi ) , bi ∼ N(0, σb2 = 0.25), 1 + exp(−0.8 + xij + bi ) • m = 36 areas, ni ∈ {2, 10, 40} units within area i • bi and xij are mutually independent • xij = (1, xij ) • µx = 0 • δi ∼ NI (0, σδ2 = 0.16) • ui ∼ NI (0, σu2 = 0.036) • µxi = µx + δi , µ̃xi = µx + δi + ui • xij |µxi ∼ NI (µxi , σ2 = 0.36) Draw 400MC samples and 50B1 , 1B2 bootstrap samples Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 12 / 19 Bootstrap Estimators For each Monte Carlo sample, • Compute the REML estimator ψ̂ ∗ , δ ∗ , x ∗ , y ∗ ) using DG (ψ̂, r ) • Generate (bi,k 1,k i,k ij,k ij,k • Level-one estimator B X ∗ α̂i∗ = B −1 αi,k k=1 ∗∗ , δ ∗∗ , x ∗∗ , y ∗∗ ) using DG (ψ ∗ , r ) • Generate (bi,k 2,k i,k ij,k ij,k • Level-two classic estimator α̂C∗∗,i = (B − 1)−1 B−1 X ∗ ∗∗ (2αi,k − αi,k ) k=1 ∗∗ , δ ∗∗ , x ∗∗ , y ∗∗ ) (bi,k i,k ij,k ij,k • Generate using DG (ψ ∗ , r1,k+1 ) • Level-two telescoping estimator ∗∗ −1 α̂T ,i = (B − 1) B−1 X ∗ ∗ ∗∗ (αi,k + αi,k+1 − αi,k+1 ). k=1 Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 13 / 19 Monte Carlo Prediction MSE (×103 ) Size 2 10 40 (ȳ − θ) 101.91 (1.09) 20.66 (0.27) 5.17 (0.07) (θ̂ − θ)1 9.18 (0.18) 7.28 (0.16) 3.69 (0.07) (θ̃1 − θ)2 13.14 (0.26) 8.18 (0.18) 3.83 (0.08) (θ̃2 − θ)3 10.64 (0.22) 7.68 (0.17) 3.76 (0.08) Model 1, known µxi Model 2, random µxi , no µ̃xi Model 3, random µxi , random µ̃xi Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 14 / 19 Monte Carlo Properties of Prediction MSE (%) Model 3, random µxi , random µ̃xi Size 2 10 40 Andreea L. Erciulescu, Wayne A. Fuller (ISU) Rel Rel Rel Rel Rel Rel Bias Sd Bias Sd Bias Sd α̂∗ -14.57 38.92 -13.19 30.69 -7.48 20.06 ∗∗ α̂T -9.43 45.12 -6.82 36.5 -1.93 23.31 August 6, 2014 15 / 19 Variance Components (×106 ) in Prediction MSE Model 3, random µxi , random µ̃xi , ni = 2 (B1 , B2 ) = Source Between Within Total Andreea L. Erciulescu, Wayne A. Fuller (ISU) (100, 1) Level2 T 24.14 0.45 24.59 (100, 1) Level2 C 24.14 0.69 24.83 (20, 10) Level 2 C 24.14 1.69 25.83 August 6, 2014 16 / 19 End Thank you! Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 17 / 19 References Battese, G.E., Harter, R.M. and Fuller, W.A. (1988), ”An error component model for prediction of county crop areas using survey and satellite data. ” Journal of the American Statistical Association Berrige D. and Crouchley R. (2011), Multivariate Generalized Linear Mixed Models Using R, CRC Press. Beale, E.M.L. (1962). ”Some uses of computers in operations research.” Industrielle Organisation, 31, 51-52. Datta, G.S. and Lahiri, P. (2000), ”A unified measure of uncertainty of best linear unbiased predictors in small area estimation problems,” Statistica Sinica, 10, 613-627. Datta, G.S., Rao, J.N.K. and Smith, D. (2005) ”On measuring the variability of small area estimators under a basic area level model. ” Biometrika, 92, 183-196. Datta, G.S., Rao, J.N.K. and Smith, D. (2012) ”Amendments and Corrections: On measuring the variability of small area estimators under a basic area level model, ” Biometrika, 99, 2, 509. Davidson, R. and MacKinnon, J.G. (2007), ”Improving the reliability of bootstrap tests with the fast double bootstrap,” Computational Statistics and Data Analysis, 51, 3259-3281. Erciulescu, A.L. and Fuller, W.A. (2013), ”Small Area Prediction of the Mean of a Binomial Random Variable,” Survey Research Methods Section, JSM Proceedings, 855-863. Hall P. and Maiti T. (2006), ”On parametric bootstrap methods for small area prediction,” J.R. Statist. Soc. B, 68, 2, 221-238. Harville D.A., (1985), ”Decomposition of Prediction Error,” Journal of the American Statistical Association, 80, 389, 132-138. Jeong K.M., Son J. (2009), ”Estimation of Small Area Proportions Based on Logistic Mixed Model,” The Korean Journal of Applied Statistics, 22(1), 153-161. Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 18 / 19 References Kackar, R. and Harville, D.A. (1984) ”Approximations for standard errors of estimators of fixed and random effects in mixed linear models, ” Journal of the American Statistical Association, 79, 853-862. Pfeffermann, D. and Correa, S. (2012).”Empirical bootstrap bias correction and estimation of prediction mean square error in small area estimation.” Biometrika, 99, 457-472. Pfeffermann D, Glickman H. (2004), ”Mean Square Error Approximation in Small Area Estimation by Use of Parametric and Nonparametric Bootstrap,” ASA Section on Survey Research Methods, 4167-4178. Prasad, N.G.N. and Rao, J.N.K. (1990), ”The estimation of mean squared error of small area estimators.,” Journal of the American Statistical Association, 85, 163-171. Rao J.N.K. (2003), ”Small Area Estimation,” Wiley Series in Survey Methodology. Stroup, W.W. (2012), ”Generalized Linear Mixed Models: Modern Concepts, Methods and Applications,” Chapman and Hall, CRC Texts in Statistical Science, Taylor and Francis. Wang, J. and Fuller, W.A. (2003),”The mean squared error of small area spedictors constructed with estimated area variances,” Journal of the American Statistical Association, 98, 716-723. Andreea L. Erciulescu, Wayne A. Fuller (ISU) August 6, 2014 19 / 19