Benchmarked Small Area Prediction Emily J.Berg, Wayne A. Fuller, Andreea L. Erciulescu Iowa State University Center for Survey Statistics and Methodology Statistics Department September 17, 2012 Andreea L. Erciulescu (Iowa State University) September 17, 2012 1 / 35 Introduction Small area • A population for which reliable statistics of interest cannot be produced due to certain limitations of the available data Examples • Geographical regions: state, county, municipality • Demographical groups: age, sex, race • Demographical group within a geographical region What? • Estimation related to population counts • Disease mapping research - for a long time • Regional planning: apportionment of congressional seats and funds allocation in the governments - U.S. $130 bilion federal funds per year Andreea L. Erciulescu (Iowa State University) September 17, 2012 2 / 35 Introduction - WHO? • Small Area Income and Poverty Estimates (SAIPE) by Census Bureau - income and poverty measures for various populations subgroups for states, counties, and school districts • Local Area Unemployment Statistics (LAUS) by Bureau of Labor Statistics - employment and unemployment for states, metropolitan areas, counties and certain sub-county areas • County Estimates by National Agricultural Statistics Service - county estimates of crop yield • Substance Abuse and Mental health Services Administration - substance abuse in states and metropolitan areas Andreea L. Erciulescu (Iowa State University) September 17, 2012 3 / 35 Introduction: Small area estimation • Reliable estimates of one or several variables of interest in areas where the information available is not sufficient • Collection: survey in some or all areas Problems: • Sampling design to provide reliable data for large areas, but little attention or no attention to small areas of interest • Fixed budget or practical constraints • Measurement errors, in the case of administrative data (even if no sampling error) • Underreported small area statistics, in the case of law enforcement crime records • Poor quality data due to nonresponse or hard-to-find populations, in the case of census compiled subgroups population Andreea L. Erciulescu (Iowa State University) September 17, 2012 4 / 35 Introduction: Small area estimation approaches • Direct estimators - based on local data, when large sample size • “Borrow strength” from other areas - deal with small sample size - data from ‘similar’ areas - data from previous occasions • Models - to share information between areas and account for correlations - to link survey outcome or response variables to a set of predictor variables known for small areas Andreea L. Erciulescu (Iowa State University) September 17, 2012 5 / 35 Outline 1 Small area prediction models 2 The benchmarking restriction 3 Augmented models: linear and nonlinear 4 Simulation study Andreea L. Erciulescu (Iowa State University) September 17, 2012 6 / 35 Part I: Linear small area model yi = θi + ei 0 θi = xi β + ui , for i = 1, 2, .., m • m is the number of small areas • θi is the small area mean • xi are known fixed vectors • ui is the random small area effect • ei is the sampling error • assume ei ui Andreea L. Erciulescu (Iowa State University) ∼ IND 0 0 2 σei , 0 0 2 σui . September 17, 2012 7 / 35 Linear small area model: prediction Assume we observe a = u + e and that u 0 G 0 ∼N , . e 0 0 R Question: Given the data a, what is our best guess (predictor) for the unobserved u? Andreea L. Erciulescu (Iowa State University) September 17, 2012 8 / 35 Linear small area model: prediction Note that a u 0 0 I I G 0 I I , . I 0 0 R I 0 = I I I 0 u e . Thus, a u ∼N d =N Andreea L. Erciulescu (Iowa State University) 0 0 G+R G , . G G September 17, 2012 9 / 35 Linear small area model: prediction E (u|a) = G(G + R)−1 a is the BLUP of u. 2 u • If G = diag (σu2 ) and R = diag (σe2 ), then the BLUP ûi = σ2σ+σ 2 ai u e • Usually, the matrices G and R are unknown and we replace them by estimates Ĝ and R̂, or σ̂u2 and σ̂e2 , respectively. Andreea L. Erciulescu (Iowa State University) September 17, 2012 10 / 35 Linear small area model: prediction The BLUP û satisfies: • û is a linear function of a • û is unbiased for u so that E (û − u) = 0 • Var (û − u) is no “larger” than the Var (v − u), where v is any other linear and unbiased predictor Andreea L. Erciulescu (Iowa State University) September 17, 2012 11 / 35 Linear small area model: prediction Small area information: η(xi , β) = xi 0 β, for linear fixed area effects η(xi , β) = g (xi , β), for nonlinear fixed area effects In this situation we often make predictions for quantities like η(xi , β) + u, and the predictor of such quantity is η(xi , β̂) + û, the estimator of η(xi , β) plus the predictor of u. Andreea L. Erciulescu (Iowa State University) September 17, 2012 12 / 35 Linear small area model: prediction The best linear unbiased predictor (BLUP) of θi is 0 0 θˆi = xi β̂ + γi (yi − xi β̂), where m m X X 0 2 2 −1 0 −1 xi (σui + σei2 )−1 yi = (X0 Σ−1 β̂ = [ xi (σui + σei2 )−1 xi ]−1 aa X) X Σaa y i=1 i=1 2 + σ 2 as the ith diagonal element • Σaa is a diagonal matrix with σui ei 0 • X is the m × k matrix with xi as ith row 2 + σ 2 )−1 σ 2 • γi = (σui ei ui The variance of the prediction error is n o n o 0 V θ̂i − θi = γi σei2 + (1 − γi )2 xi V β̂ xi . Andreea L. Erciulescu (Iowa State University) September 17, 2012 13 / 35 Benchmarking Survey weights • associated with a respondent • certain number of units in the population that is sampled • to compensate for the unequal probability of selection ⇒ reduce selection bias • calibration Andreea L. Erciulescu (Iowa State University) September 17, 2012 14 / 35 Linear small area model: benchmarking restriction • bechmarked estimator - reproduces large area weighted estimator, when aggregated m X i=1 ω i θˆi = m X ω i yi i=1 where ω i are vectors of fixed coefficients. Andreea L. Erciulescu (Iowa State University) September 17, 2012 15 / 35 Linear small area model: benchmarking prediction • No uniformly best benchmarked predictor • Introduce an objective function Minimize ω Q(θ̂ ) = m X φi E (θ̂iω − yi )2 i=1 subject to m X ω i θ̂iω = i=1 m X ω i yi . i=1 Choices for φ−1 i 2 2 2 + σei2 )−1 σui σei ↔ BHF φ−1 = (σui i φ−1 i = ωi cov(θ̂i , m X ωj θ̂j ) ↔ PB j=1 2 φ−1 = (σui + σei2 ) ↔ ITF i Andreea L. Erciulescu (Iowa State University) September 17, 2012 16 / 35 Augmented linear small area model: benchmarked predictor ω The linear predictor that minimized Q(θ̂ ) is 0 θ̂iω = θ̂i + φ−1 i ω i β̂ aug , where m m X X 0 −1 β̂ aug = ( φ−1 ω ω ) [ ω j (1 − γj )(yj − θ̂j )]. j j j j=1 j=1 GLS coefficient, where • φ−1 j ω j ≈ explanatory variable • φj ≈ weight • yi − θ̂i ≈ dependent variable Andreea L. Erciulescu (Iowa State University) September 17, 2012 17 / 35 Augmented linear small area model Simple ratio adjustment predictor: ω θ̂ratio,i = θ̂i + θ̂i β̂ aug , where φ−1 = θ̂i ω −1 and i i m m X X β̂ aug = ( θ̂i ω i )−1 ω i (yi − θ̂i ). i=1 Andreea L. Erciulescu (Iowa State University) i=1 September 17, 2012 18 / 35 Augmented linear small area model: benchmarked predictor Replace the original model with the augmented model 0 0 yi = xi β + zi β aug + ui + ei The predictor can now be written: 0 0 0 0 θ̂iω = xi β̂ + zi β̂ aug + γi (yi − xi β̂ − zi β̂ aug ) 0 0 0 0 −1 = xi β̂ + φ−1 i ω i β̂ aug + γi (yi − xi β̂) = θ̂i + φi ω i β̂ aug . Andreea L. Erciulescu (Iowa State University) September 17, 2012 19 / 35 Augmented linear small area model The estimated GLS coefficient is m m X X 0 0 β̂ aug = ( zi ψi−1 zi )−1 zi ψi−1 (yi − xi β̂) i=1 i=1 • zi = ψi (1 − γi )ω i ≈ new explanatory variable −2 ≈ new weight • ψi = φ−1 i (1 − γi ) 0 • yi − xi β̂ ≈ dependent variable • β̂ aug is not the usual regression coefficient because zi arises from a constraint that is not part of the original model. Andreea L. Erciulescu (Iowa State University) September 17, 2012 20 / 35 Augmented linear small area model 0 0 0 0 β̂ aug = (Z Ψ−1 Z)−1 Z Ψ−1 â = (W Φ−1 W)−1 W (I − Γ)â, where • Ψ is a diagonal matrix with ψi as the ith diagonal element 0 • Z is the m × r matrix with zi as the ith row 0 • the ith element of â is âi = yi − xi β̂ 0 • W is the matrix with ω i as the ith row • Φ is a diagonal matrix with φi as the ith diagonal element • Γ is a diagonal matrix with γi as the ith diagonal element. Andreea L. Erciulescu (Iowa State University) September 17, 2012 21 / 35 Augmented linear small area model: prediction error The prediction error is 0 θ̂iω − θi = θ̂i − θi + φ−1 i ω i β̂ aug , where 0 −1 −1 β̂ − β = (X0 Σ−1 aa X) X Σaa a 0 0 0 −1 −1 β̂ aug = (W Φ−1 W)−1 W (I − Γ)[(I − X(X0 Σ−1 aa X) X Σaa )a] • β̂ and β̂ aug are uncorrelated • V n o n o n o 0 θ̂iω − θi = V θ̂i − θi + φ−2 ω V β̂ aug ω i i i Andreea L. Erciulescu (Iowa State University) September 17, 2012 22 / 35 Part II: Nonlinear small area model yi = θi + ei θi = g (xi , β) + ui , for i = 1, 2, .., m • m is the number of small areas • θi is the small area mean • xi are known fixed vectors • g (xi , β) is continuous in β with two continuous derivates • ui is the random small area effect • ei is the sampling error • assume ei ui Andreea L. Erciulescu (Iowa State University) ∼ IND 0 0 2 σei , 0 0 2 σui . September 17, 2012 23 / 35 Nonlinear small area model The GLS estimator of β satisfies the equation m X ∂g (xi , β̂) i=1 ∂β 2 (σui + σei2 )−1 [yi − g (xi , β̂)] = 0 By standard approximation methods, 0 −1 −1 −0.5 β̂ − β = (H0 Σ−1 ) aa H) H Σaa a + op (m := Ma + op (m−0.5 ), where 0 • H is the matrix with hi = ∂g (xi , β)/∂β as the ith row 2 + σ 2 as the ith diagonal element • Σaa is a diagonal matrix with σui ei • a is a vector with yi − g (xi , β) as the ith element Andreea L. Erciulescu (Iowa State University) September 17, 2012 24 / 35 Augmented nonlinear small area model: benchmarking The original model can be replaced with the augmented model 0 0 yi = g ((xi , zi ), (β, β aug )) + ui + ei The benchmarking restriction m X i=1 ω i θˆi = m X ω i yi ⇔ i=1 m X 0 0 ω i (1 − γi )[yi − g ((xi , zi ), (β̂, β̂ aug ))] = 0 i=1 The benchmarked predictor is 0 0 0 0 θ̂iω = g ((xi , zi ), (β̂, β̂ aug )) + γi [yi − g ((xi , zi ), (β̂, β̂ aug ))] Andreea L. Erciulescu (Iowa State University) September 17, 2012 25 / 35 Augmented nonlinear small area model: benchmarking The estimated coefficient β̂ aug satisfies m X 0 0 0 0 0 haug ,i [yi − g ((xi , zi ), (β̂, β̂ aug ))] = 0, i=1 where 0 haug ,i = ∂g ((xi , zi ), (β, β aug )) 0 ∂β aug , and zi is chosen such that 0 0 ∂g ((xi , zi ), (β̂, β̂ aug )) ω i (1 − γi )ψi = . ∂β aug The benchmarking restriction m X 0 0 ω i (1 − γi )[yi − g ((xi , zi ), (β̂, β̂ aug ))] = 0 i=1 is satisfied. Andreea L. Erciulescu (Iowa State University) September 17, 2012 26 / 35 Augmented nonlinear small area model: prediction error The prediction error is 0 θ̂iω − θi = θ̂i − θi + haug ,i β̂ aug + op (m−0.5 ), where β̂ − β = Ma + op (m−0.5 ) 0 0 0 β̂ aug = (Haug Ψ−1 Haug )Haug Ψ−1 (I − HM)a + op (m−0.5 ), 0 • the ith row of Haug is haug ,i = ω i (1 − γi )ψi • β̂ aug and β̂ are approximately uncorrelated • V n o n o n o 0 θ̂iω − θi = V θ̂i − θi + haug ,i V β̂ aug haug ,i + o(m−1 ) Andreea L. Erciulescu (Iowa State University) September 17, 2012 27 / 35 Part III: Simulation Model Model (developed for Canadian Labour Force Survey) p̂i = θi + ei , g (xi , β) = θi = g (xi , β) + ui exp(β0 + β1 xi ) 1 + exp(β0 + β1 xi ) xi = log[pC ,i (1 − pC ,i )−1 ], pC ,i = Census proportion • ni are the sample sizes of small areas 2 = αg (x , β)(1 − g (x , β)) • σui i i 2 • σei variance of a 2-stage cluster sample of size ni • assume ei ui Andreea L. Erciulescu (Iowa State University) ∼ IND 0 0 −1 2 ni σei , 0 0 2 σui . September 17, 2012 28 / 35 Predictors for the Simulation 2 are estimated iteratively β̂ and σ̂ui • β̂ minimizes Q(β) = m X 2 −1 (p̂i − g (xi , β))2 (ni−1 σ̂ei2 + σ̂ui ) i=1 2 • Estimated GLS for σ̂ui The predictor of θi is θ̂i = g (xi , β̂) + γ̂i [p̂i − g (xi , β̂)], where 2 −1 2 γ̂i = (ni−1 σ̂ei2 + σ̂ui ) σ̂ui . Andreea L. Erciulescu (Iowa State University) September 17, 2012 29 / 35 Benchmarked Predictors for the Simulation Restrictions: preserve direct estimators 10 X i=1 θ̂iω ωi = 10 X p̂i ωi i=1 Benchmarking methods • Ratio adjustment (raking) ψi = (1 − γ̂i )−2 ωi−1 θ̂i 2 + σ̂ 2 ]−1 n−1 σ̂ 2 σ̂ 2 • Augmented model (BHF) ψi = (1 − γ̂i )−2 [ni−1 σ̂ei ei ui ui i • Augmented model ψi = (1 − γ̂i )−2 ωi−1 Andreea L. Erciulescu (Iowa State University) September 17, 2012 30 / 35 Simulation Parameters Parameters Area gi Sample Size 1 0.20 16 2 0.75 16 3 0.50 16 4 0.20 30 5 0.75 30 Area gi Sample Size 6 0.75 60 7 0.50 60 8 0.20 204 9 0.75 204 10 0.50 204 • Two simulation sets • Set 1: ωi increases as ni increases • Set 2: ωi decreases as ni increases Andreea L. Erciulescu (Iowa State University) September 17, 2012 31 / 35 Simulation Results: MC MSE • Simulation set 1: ωi increases as ni increases ni g (xi , β) ωi MSE(θ̂i ) Added MSE Aug BHF 0.0 Aug ω −1 0.0 16 0.20 0.01 11 Raking 0.0 16 0.50 0.01 21 0.2 0.0 0.1 204 0.20 0.25 5 0.2 0.2 0.3 204 0.50 0.25 12 0.2 0.5 0.1 Andreea L. Erciulescu (Iowa State University) September 17, 2012 32 / 35 Simulation Results: MC MSE • Simulation set 2: ωi decreases as ni increases ni g (xi , β) ωi MSE(θ̂i ) Added MSE Aug BHF 26 Aug ω −1 28 16 0.20 0.25 11 Raking 20 16 0.50 0.25 21 46 68 30 204 0.20 0.04 5 19 0 28 204 0.50 0.01 12 45 0 28 Andreea L. Erciulescu (Iowa State University) September 17, 2012 33 / 35 MC Coverages of Nominal 95% Prediction Intervals • Nominal 95% prediction interval ˆ 0.5 θ̂iω ± tni MSE i,aug • Average coverages across areas with the same sample size Set 1 Set 2 Empirical Coverages (BHF) ni = 16 ni = 30 ni = 60 ni = 204 0.94 0.93 0.94 0.95 0.96 0.93 0.93 0.95 • Set 1: ωi increases as ni increases • Set 2: ωi decreases as ni increases Andreea L. Erciulescu (Iowa State University) September 17, 2012 34 / 35 Summary • Augmented approach for nonlinear models • Benchmarking effect on MSE • small increase when ωi inversely related to variance • large increase when ωi positively related to variance • Alternative weights in the objective function • Aug BHF - relatively small average amount added to MSE • Aug ω −1 - nearly constant increase in MSE Andreea L. Erciulescu (Iowa State University) September 17, 2012 35 / 35 End Thank you! References: Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). “An error components model for prediction of county crop areas using survey and satellite data,” Journal of the American Statistical Association, 28-36. Jiang, J. and Lahiri, J. (2006). “Mixed model prediction and small area estimation,” Test 15, 1-96. Thomas, D.R., Rao, J.N.K. (1987), “Small-sample comparisons of level and power for simple goodness-of-fit statistics under cluster sampling,” Journal of the American Statistical Association 82, 630-636. Wang, J., Fuller, W.A., Qu, Y. (2008), “Small area estimation under a restriction,” Survey Methodology 34, 29-36. You, Y., and Rao, J.N.K. (2002), “A pseudo-empirical best linear unbiased prediction approach to small area estimation using survey weights,” Canadian Journal of Statistics 35, 431-439. Andreea L. Erciulescu (Iowa State University) September 17, 2012 36 / 35