Bootstrap Confidence Intervals for Small Area Means Andreea L. Erciulescu and Wayne A. Fuller Iowa State University, Department of Statistics, Ames, IA 50011 Introduction Most small area studies focus on constructing predictors for the area means and on estimating the variance of the prediction errors. However, agencies and policy makers are often interested in confidence intervals for the small area predictors. We present two sided confidence intervals for the small area means of a binary response variable. We consider unit level data and stochastic covariates. The estimation of the prediction error variance and the estimation of the cutoff points are key components in the construction of confidence intervals for the small area means. A linear approximation of the model is considered and a Taylor variance approximation is presented for the prediction error variance. We compare different bootstrap estimation methods for the cutoff points using a simulation study. Unit Level Model and Small Area Mean Prediction yij |(xij , bi ) g(xij , β, bi ) ∼ Bernoulli(g(xij , β, bi )) xij = = exp((1, xij )0 β + bi ) 1 + exp((1, xij )0 β + bi ) µ̃xi = µxi + ui • i = 1, ..., m, j = 1, ..., ni True small area mean of y Z θi = g(x, β, bi )dFxi (x) µx + δi + ij =: µxi + ij • (bi , δi , ij , ui ) mutually independent • bi ∼ ind fb (0, σb2 ) • δi ∼ ind fδ (0, σδ2 ) • (yij , xij ) observed in pairs • ij ∼ ind • µ̃ = (µ̃x1 , ..., µ̃xm ) observed • ui ∼ ind fui (0, ki−1 σ2 ) 2 f (0, σ ) Small area mean prediction R R R Qni g(µx + δ + , β, b)dF () ( t=1 f (yit |xit , b)f (xit |δi )) f (µ̃xi |δi )dFδ (δ)dFb (b) b δ R R Qni θ̂i = ( t=1 f (yit |xit , b)f (xit |δi )) f (µ̃xi |δi )dFδ (δ)dFb (b) b δ Prediction Mean Squared Error Estimators Fast Double Bootstrap (FDB) Algorithm Taylor approximation g(xij , β, bi ) h̄ψ ≈ g(µ̂xi + ij , β̂, 0) + h̄µxi (µxi − µ̂xi ) + h̄β (β − β̂) + h̄bi bi = R ∂g(µxi +,β ,bi ) dFi (), ∂ψ • Vector of parameters ψ = (β, σb2 , µx , σδ2 , σ2 ) ψ ∈ {µxi , β, bi } • Data generator DG(ψ, r), random number seed r Taylor Estimator of Prediction MSE T aylor αi • αi∗ = (θ̂i∗ − θi∗ )2 , αi∗∗ = (θ̂i∗∗ − θi∗∗ )2 2 2 2 2 2 2 ˆ := M SE(θ̂i − θi ) = ĝ1i (σ̂b , σ̃ei ) + ĝ2i (σ̂b , σ̃ei ) + 2ĝ3i (σ̂b , σ̃ei ), 2 ĝ1i (σ̂b2 , σ̃ei ) = 2 ĝ2i (σ̂b2 , σ̃ei ) = • −1 2 γ̃i ni σ̃ei , 0 2 (1 − γ̃i ) h̄β V̂ (β̂)h̄β + h̄2µxi V̂ = h̄2bi σ̂b2 + (µ̂xi ), LEVEL TWO TELESCOPING −1 2 −1 ni σ̃ei h̄2bi σ̂b2 , −1 ni ∗ ∗ ∗ (ψ 1 , θi,1 , αi,1 ) ∗ ∗ ∗ (ψ 2 , θi,2 , αi,2 ) P ni j=1 uy,ij . α̂i∗ = B1 −1 PB1 ∗ α k=1 i,k Confidence Intervals (CIs) ∗ θ̂i,k ∗ θi,k ∗∗ θ̂i,k ∗∗ θi,k − − θ̂i − θi ∗ ∗∗ T̂i := q , T̂i,k = q , T̂i,k = q ∗,T aylor ∗∗,T aylor T aylor αi,k αi,k αi • k = 1, .., B1 • qi,DB := |T ∗ |i,([(1−αB )B1 ]+1) −1 PB1 ∗∗ • αB = 1 − B1 I(|T i,k | < qi,B ) k=1 Iz = θ̂i ± ζ̂i,α,z se ˆi • β = (−0.8, 1), µx = 0, ki = 10 • σb2 = 0.25, σδ2 = 0.16, σ2 = 0.36 • 400 Monte Carlo samples, B1 = 400, B2 = 1 Symmetric Two-sided General (1 − α)% CI I = θ̂i ± qt,dfˆ se ˆi ,1−α/2 i ˆ ) = argmin Qi (qi , τi , dfi ) • (τ̂i , df i • se ˆ i = τ̂i q T aylor αi • Qi (qi , τi , dfi ) := ∗ ∗ ∗∗ (α + α − α i,k i,k+1 i,k ) k=1 • Normal distributions fb , fδ , f , fui • ζ̂i,α,z = Φ−1 ([1 − α/2]) 95% Bootstrap CI PB1 • m = 36, ni ∈ {2, 10, 40} n o T aylor • se ˆ i ∈ αi , α̂i∗ , α̂i∗∗ • qi,B := |T ∗ |i,([(1−α)B1 ]+1) α̂i∗∗ = −1 B1 Simulation Results Wald-type (1 − α)% CI Pivot-type Statistics ∗ DG(ψ 1 , r1,2 ) ∗ DG(ψ 2 , r1,3 ) ∗∗ ∗∗ → (θi,1 , αi,1 ) ∗∗ ∗∗ → (θi,2 , αi,2 ) .. .. . . ∗ ∗ ∗∗ ∗∗ ∗ ∗ , α DG(ψ̂, r1,B1 ) → (ψ B1 , θi,B1 , αi,B1 ) DG(ψ B1 , r1,B1 +1 ) → (θi,B i,B1 +1 ) 1 +1 DG(ψ̂, r1,1 ) → DG(ψ̂, r1,2 ) → = yij − g(xij , β̂, 0), ūyi = uy,ij ∗∗,T aylor ∗ ∗ ˆ = M SE(θ̂i − θi ), αi,k = M ˆSE(θ̂i∗∗ − θi∗∗ ) LEVEL ONE 2 ĝ3i (σ̂b2 , σ̃ei ) = ū2yi V̂ (γ̃i ), γ̃i ∗,T aylor αi,k −1 (qi −τi qtdfi )Vq i (qi −τi qtdfi )0 • qi ∈ {qi,B , qi,DB }, qtdfi := −1 Ftdf ([1 i Empirical Coverages for 95% ni IαT aylor Iα̂∗i i 2 90.6 89.7 10 92.2 90.3 40 94.1 91.4 Empirical Coverages Level 1 ni 90% 95% 2 90.0 94.8 10 89.6 94.4 40 89.0 94.1 Wald-type CIs Iα̂∗∗ i 88.3 89.1 89.7 for General Bootstrap Level 2 99% 90% 95% 97.8 90.4 94.5 97.9 90.5 94.5 98.4 89.9 94.5 CIs 99% 97.4 97.5 98.2 − α/2]) • Vq i based on Bahadur representation Acknowledgement I would like to acknowledge the U.S. National Science Foundation for a travel award to attend this meeting. Summary • Wald-type CIs undercover • Bootstrap CIs perform well • FDB does not improve the coverage accuracy