Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October 2014 Spatial Kernel Density Estimation 1 / 20 Motivation Nonparametric density estimation offers a flexible alternative to parametric method. To obtain the same level of precision as that of its parametric counterpart, a larger sample size is desired in nonparametric estimation due to its slower convergence rate. Given similarity/dependence among distributions across geographic units, efficiency gain is possible via information pooling; particularly so in nonparametric estimations with small sample size for each unit. Example: Crop yield distributions are typically estimated (individually) at the county level. Taking into account their spatial dependence may improve estimation. Spatial Kernel Density Estimation 2 / 20 Nonparametric estimation of spatially dependent densities An information-theoretic nonparametric estimation procedure Weighted Kernel Density Estimator (WKDE) that satisfies moment conditions implied by spatial dependence among units Observation weights are the implied Generalized Empirical Likelihood (GEL) probabilities associated with spatial moment conditions. GEL offers a flexible yet simple framework to incorporate out-of-sample information into density estimation. Computationally inexpensive; joint estimation of a large number of spatially dependent densities is avoided. Spatial Kernel Density Estimation 3 / 20 Talk plan Weighted KDE (WKDE) via the GEL Spatial smoothing of KDE Monte Carlo Simulations Empirical example Spatial Kernel Density Estimation 4 / 20 Kernel Density Estimation (KDE) Data: X1 , . . . , XT i.i.d. from a continuously differentiable distribution F with density f KDE: T 1 X fˆ(x) = K Th t=1 x − Xt h ≡ T 1 X Kh (x − Xt ) T t=1 where K is a kernel function and h the bandwidth. Weighted KDE (WKDE) fˆ(x) = T X pt Kh (x − Xt ) t=1 where p = (p1 , . . . , pT ) are non-negative observation weights that sum to one. Spatial Kernel Density Estimation 5 / 20 Incorporation of out-of-sample information Incorporation of out-of-sample information, for instance population moments or theoretical properties, may enhance the KDE. Suppose out-of-sample information comes in the form E[gm (x)] = µm , m = 1, . . . , M, where gm are known real functions. Chen (1997) uses WKDE to incorporate out-of-sample information; the weights is calculated via empirical likelihood (Owen, 1988; 2001) max p1 ,...,pT s.t. T X t=1 T X log pt pt gm (Xt ) = µm , m = 1, . . . , M, t=1 pt ≥ 0, T X pt = 1. t=1 Spatial Kernel Density Estimation 6 / 20 Empirical likelihood weighted KDE: fˆEL (x) = T X p̂t Kh (x − Xt ) t=1 where p̂ = (p̂1 , . . . , p̂T ) are the implied EL probabilities. Under mild regularity conditions abias(fˆEL (x)) = abias(fˆ(x)) + o(n−1 ) avar(fˆEL (x)) = avar(fˆ(x)) − r (x)f 2 (x)/n + o(n−1 ) where r (x) : R → R+ depends on the unknown density f and moment functions g = (g1 , . . . , gM ). An important implication: fˆEL can use the optimal bandwidth for fˆ; joint search of p and bandwidth is not required. Spatial Kernel Density Estimation 7 / 20 Extension to Generalized Empirical Likelihood Cressie-Read power discrepancy CRν = T X 1 (npt )−ν − 1 ν(ν + 1) t=1 Generalized Empirical Likelihood (GEL) T X 1 min (npt )−ν − 1 p1 ,...,pT ν(ν + 1) t=1 s.t. T X pt gm (Xt ) = µm , m = 1, . . . , M, t=1 pt ≥ 0, T X pt = 1. t=1 ν = 0 =⇒ EL ν = 1 =⇒ Exponential Tilting (ET) or maximum entropy Spatial Kernel Density Estimation 8 / 20 Spatial smoothing of KDE Consider N spatially dependent densities fi , i = 1, . . . , N; for each i, we observe Yi = (Yi,1 , . . . , Yi,T ) i.i.d. from fi . Given selected moment functions g = (g1 , . . . , gM ), define their sample analog for each unit ĝ(i) T 1 X g (Yi,t ), i = 1, . . . , N. = T t=1 Spatial weights for unit i: wi = {wi,j }N j=1 ; wi,j ≥ 0 and PN j=1 wi,j = 1. Spatially smoothed sample moments g̃(i) = N X j=1 Spatial Kernel Density Estimation wi,j ! T N X 1 X g (Yj,t ) = wi,j ĝ(j) T t=1 j=1 9 / 20 Estimate fi using the WKDE fˆEL or fˆET subject to spatial moment conditions g̃(i) , i = 1, . . . , N. For implication, we need to specify spatial weights w = {w1 , . . . , wN } moment functions g = (g1 , . . . , gM ) Spatial Kernel Density Estimation 10 / 20 Construction of spatial weights di,j ≥ 0: the spatial distance between unit i and j Nik : an indicator set including unit i and its k nearest neighbors Spatial weights for unit i: 2 /γ )1(j ∈ N k ) exp(−di,j i i , j = 1, . . . , N, wi,j = PN 2 k j=1 exp(−di,j /γi )1(j ∈ Ni ) where 1 is the indicator function and 1 X di,j γi = k k j∈Ni Spatial Kernel Density Estimation 11 / 20 Specification of moment functions Consider for now the Exponential Tilting estimator fˆET . It can be shown to be asymptotically equivalent to a minimum cross entropy density estimator with the KDE fˆ as the reference density: Z f (x) f˜ET = arg min f (x) log dx fˆ(x) f Z s.t. f (x)dx = 1, Z gm (x)f (x)dx = µm , m = 1, . . . , M. The solution takes the form ( ) M X f˜ET (x) = fˆ(x) exp λ̃0 + λ̃m gm (x) = fˆET (x) + op (n−1 ) m=1 where λ̃0 is a normalization constant. Spatial Kernel Density Estimation 12 / 20 Thus fˆET is seen to modify fˆ with a multiplicative adjustment, which consists of a basis expansion in exponent. Therefore the selection of moment functions for fˆET amounts to the selection of basis functions for nonparametric jminimum cross entropy density estimation. Power series, g (x) = (x, x 2 , x 3 , ...), is customarily used, but can be sensitive to outliers and require the existence of population moments up to order M. We use spline basis due to its flexibility and robustness.; e.g., g (x) = (x, x 2 , (x − K1 )2+ , (x − K2 )2+ , . . . , (x − KM )2+ ), where (x)+ = max(0, x) and K1 , . . . , KM are spline knots. Only the first two moments are required to exist. The resultant density can be viewed as a hybrid of KDE and log spline estimator (Kooperberg and Stone, 1992; Stone, et al., 1997) Spatial Kernel Density Estimation 13 / 20 Monte Carlo Simulations N = 100 locations in a unit square I Construct a 20 × 20 grid evenly spaced over I Randomly draw 100 points from the grid without replacement 0.0 0.4 0.8 Calculate the distance matrix D = [di,j ] 0.0 Spatial Kernel Density Estimation 0.2 0.4 0.6 0.8 1.0 14 / 20 Constructions of spatially dependent distributions Assume fi ∼ Skewed Normal (µi , σi , αi ) for i = 1, · · · , N. µ = (µ1 , · · · , µN )> , σ = (σ1 , · · · , σN )> and α = (α1 , · · · , αN )> . A N-dimensional vector θ follows spatial error model SpE (β, λ, σ ) if θ =β+η η = λWη + , where ∼ N 0, σ2 IN and the spatial weighting matrix W is constructed as 1 2 ∗ ∗ W∗ = wi,j , wi,j = 1 if i 6= j and the location i is one of the 5 nearest ∗ neighbors of the location j; otherwise wi,j = 0; PN ∗ ∗ row standardization, i.e. W = [wi,j ], wi,j = wi,j / j=1 wi,j . Spatial Kernel Density Estimation 15 / 20 We set µ ∼ SpE (−1, 0.8, 0.03), log (σ) ∼ SpE (0, 0.8, 0.03) and α ∼ SpE (−0.75, 0.8, 0.03). Each vector of shape parameter is given by θ = β + (IN − λW)−1 . 0.0 0.2 0.4 Randomly draw to obtain a set of values for each µ, σ and α respectively. −5 Spatial Kernel Density Estimation −3 −1 0 1 2 16 / 20 Generate data Yi = {Yi,t }T t=1 from fi , i = 1, . . . , N Sample size T = 30 and 60 We consider the KDE fˆ and two WKDEs: fˆEL and fˆET . For each unit, KDE bandwidth is selected according to the rule of thumb. Moment functions: quadratic splines with two knots at the 33th and 66th percentiles of the pooled sample: g (y ) = (y , y 2 , (y − Q̄33 )2+ , (y − Q̄66 )2+ ), where (y )+ = max(y , 0) and Q̄α is the αth sample percentile. Number of nearest neighbors in the calculation of spatial weights: k=5 Number of repetitions: R = 500 Spatial Kernel Density Estimation 17 / 20 Second experiment Construct fi as a two-component normal mixture, i.e. fi = ωi φ(µ1,i , σ1,i ) + (1 − ωi )φ(µ2,i , σ2,i ). 0.0 0.2 0.4 Parameters: Φ−1 (ω) ∼ SpE(Φ−1 (0.2), 0.6, 0.03), µ1 ∼ SpE(−3, 0.6, 0.2), µ2 ∼ SpE(0, 0.6, 0.1), log (σ 1 ) ∼ SpE(log(1.5), 0.6, 0.03) and log (σ 2 ) ∼ SpE(0, 0.6, 0.03). −8 −6 −4 −2 Spatial Kernel Density Estimation 0 2 4 18 / 20 Evaluation of performance (r ) Denote by fˆi an estimate of fi in the r th experiment, i = 1, . . . , N and r = 1, . . . , R. Global performance: average mean integrated squared error N R Z 2 1 X X ˆ(r ) fi − fi NR i=1 r =1 Tail performance: let Qi,5 be the 5% quantile of the fi and R (r ) (r ) π̂i = x≤Qi,5 fˆi (x)dx, the average mean squared error of tail probability estimation N R 2 1 X X (r ) π̂i − 5% NR i=1 r =1 Spatial Kernel Density Estimation 19 / 20 Simulation results MSE of KDE and ratio of WKDE MSE relative to that of KDE T = 30 global SN tail global NM tail T = 60 global SN tail global NM tail Spatial Kernel Density Estimation WKDE ET KDE EL 0.0176 0.0017 0.0148 0.0013 63.51% 56.08% 77.01% 34.66% 63.89% 56.50% 78.21% 34.90% 0.0100 0.0009 0.0088 0.0006 62.88% 62.95% 75.49% 37.91% 65.05% 63.53% 76.74% 38.11% 20 / 20