Distribution Function Estimation in Small Areas for Aquatic Resources

advertisement
Distribution Function Estimation in
Small Areas for Aquatic Resources
Spatial Ensemble Estimates of Temporal Trends
in Acid Neutralizing Capacity
Mark Delorey
F. Jay Breidt
Colorado State University
This research is funded by
U.S.EPA – Science To Achieve
Results (STAR) Program
Cooperative
# CR - 829095
Agreement
Project Funding
• The work reported here was developed under the
STAR Research Assistance Agreement CR-829095
awarded by the U.S. Environmental Protection
Agency (EPA) to Colorado State University. This
presentation has not been formally reviewed by
EPA. The views expressed here are solely those of
the presenter and STARMAP, the Program he
represents. EPA does not endorse any products or
commercial services mentioned in this presentation.
Outline
• Statement of the problem: How to get a set of
estimates that are good for multiple inferences of acid
trends in watersheds?
• Hierarchical model and Bayesian inference
• Constrained Bayes estimators
- adjusting the variance of the estimators
• Conditional auto-regressive (CAR) model
- introducing spatial correlation
• Constrained Bayes with CAR
• Summary
The Problem
• Examine acid neutralizing capacity
• Supply of acids from atmospheric deposition and
watershed processes exceeds buffering capacity
• Surface waters are acidic if ANC < 0
• Temporal trends in ANC within watersheds (8-digit
HUC’s)
- characterize the spatial ensemble of trends
- make a map, construct a histogram, plot an
empirical distribution function
Data Set
• 86 HUC’s in Mid-Atlantic Highlands
• ANC in at least two years from 1993–1998
• HUC-level covariates:
- area
- average elevation
- average slope, max slope
- percents agriculture, urban, and forest
- spatial coordinates
Region of Study
Locations of Sites
Small Area Estimation
• Probability sample across region
- regional-level inferences are model-free
- samples are not sufficiently dense in small
watersheds (HUC-8)
- need to incorporate auxiliary information through
model
• Two standard types of small area models (Rao, 2003)
- area-level: watersheds
- unit-level: site within watershed
Two Inferential Goals
• Interested in estimating individual HUC-specific slopes
• Also interested in ensemble:
spatially-indexed true values:
 
m
h h 1
spatially-indexed estimates:  
- subgroup analysis: what proportion of HUC’s have ANC
increasing over time?
- “empirical” distribution function (edf):
est m
h
h 1
1 m
F ( z )   I  h  z
m h 1
Deconvolution Approach
• Treat this as measurement error problem:
ˆh   h  eh
eh  ~ N0,  h 
• Deconvolve:
- parametric: assume F in parametric class
- semi-parametric: assume F well-approximated within
class (like splines, normal mixtures)
- non-parametric: assume EF [ei] is smooth
• Not so appropriate for heteroskedastic measurements,
explanatory variables, two inferential goals
Hierarchical Area-Level Model
• Extend model specification by describing parameter
uncertainty:
ˆ    e , e  NID0,  
h
h
h
 h  xTh   h ,
• Prior specification:


h
h
 h  NID0,  2 
 
 
f  ,    f   f    f  
2
2
2
Bayesian Inference
• Individual estimates: use posterior means

 
 hB  E  h | ˆ  E  h ˆh  1   h xThˆ | ˆ
where

 2
h 
 h   2
• Do Bayes estimates yield a good ensemble estimate?
- use edf of Bayes estimates to estimate F?
• No: Bayes estimates are “over-shrunk”
- too little variability to give good representation of edf
(Louis 1984, Ghosh 1992)
m
m
2
2

B
B
ˆ




E



|


h
 h

h 1
h

1






Adjusted Shrinkage
• Posterior means not good for both individual and
ensemble estimates
• Improve by reducing shrinkage
- sample mean of Bayes estimates already matches
posterior mean of  h 
- adjust shrinkage so that sample variance of
estimates matches posterior variance of true values
• Louis (1984), Ghosh (1992)
• Cressie and Stern (1991)
Constrained Bayes Estimates
• Compute the scalars
H1 ˆ  tr Var    1 | ˆ
  

H ˆ       
m
2
h 1
B 2
B
h
• Form the constrained Bayes (CB) estimates as

CB
h

 a   1  a    a   
B
h
B
B
h
where
 
 

H 1 ˆ 

a  1

H 2 ˆ 


1
2
1
B
 
B
Shrinkage Comparisons for the Slope Ensemble
Numerical Illustration
• Compare edf’s of estimates to posterior mean of F:

m
1
FB ( z )   E I  h  z| ˆ
m h 1

• Comparison of ensemble estimates at selected
quantiles:
0.6
CB
0.4
Posterior Mean
0.2
Bayes
0.0
Cumulative Probability
0.8
1.0
Estimated EDF’s of the Slope Ensemble
-1000
-500
0
Slope in ug / L / year
500
1000
1500
Conditional Auto Regressive (CAR) Model
• Let
ˆ |  , ~ N , D 

 |  ,   ,  ~ N X ,    I  C  
2
2
1

where  is an unknown coefficient vector, C = (cij)
represents the adjacency matrix,  is a parameter
measuring spatial dependence,  is a known diagonal
matrix of scaling factors for the variance in each
HUC, and  is an unknown parameter.
• Adjacency matrix C can reflect watershed structure
HUC Structure
• First level (2-digit) divides U.S. into 21 major
geographic regions
• Second level (4-digit) identifies area drained by a
river system, closed basin, or coastal drainage area
• Third level (6-digit) creates accounting units of
surface drainage basins or combination of basins
• Fourth level (8-digit) distinguishes parts of drainage
basins and unique hydrologic features
Neighborhood Structure
• All watersheds within the same HUC-6 region were
considered part of same neighborhood
• No spatial relationship among HUC-4 regions or
HUC-2 regions considered at this point
Model Specifications
• Adjacency matrix:
1 distance between HUC - 8 centroids ; h and k neighbors
chk  
0
; otherwise

•  = Im
•  fixed at three different levels: 0.09, 0.01, 0
Constrained Bayes with CAR
• Again, compute H1 and H2:
  
H ˆ   E  | ˆ   I  P  
H1 ˆ  tr Var  1 2  I  P  | ˆ
T
T

2
1

I  P E | ˆ 
where

P  X X  X
1
T

1
X T 1
• Then,

CB
h





 H 1 ˆ

ˆ
ˆ

 aE  h |   1  a P E  h |  , where a   1 
ˆ
H2  


1
2
1.0
0.8
0.6
0.4
0.2
CB
phi=0.01
0.0
-1000
-500
0
500
1000
1500
0.8
0.6
0.4
0.0
0.2
CB
phi=0
-1000
-500
0
500
1000
Slope in ug / L / year
-1000
-500
0
500
1000
Slope in ug / L / year
1.0
Slope in ug / L / year
Cumulative Probability
Cumulative Probability
0.8
0.6
0.4
0.2
CB
phi=0.09
0.0
Cumulative Probability
1.0
Comparison of Estimated EDF’s
1500
1500
Spatial Structure
0
0
0
0
0
0
0
0
0
0
0
0
0
Summary
• In Bayesian context, posterior means are overshrunk;
in order to obtain estimates appropriate for ensemble,
need to adjust
• In CAR model, value of  does have an effect on edf,
but not large effect, possibly due to CB calculation
• Edf of CB estimates in CAR shifts more mass
towards positive values
• Contour plot indicates that trend slopes of ANC are
smoothed within HUCs
Further Work
• Restrict to acid-sensitive waters
• Combine probability and convenience samples
• Other covariates?
- deposition maps/trends from CASTNet?
• Modify spatial structure
• Site-level model?
- useful sub-watershed covariates?
- spatial scales: HUC to HUC, site to site
- more concern with design, normality assumptions
Download