Evaluation of GIS Interpolation Techniques for unmeasured radon

advertisement
Evaluation of Five GIS based Interpolation
Techniques for Estimating the Radon
Concentration for Unmeasured Zip Codes
in the State of Ohio
By
Suman Maroju
Department of Civil Engineering
The University of Toledo
Advisor: Ashok Kumar PhD
Introduction
 Radon is a naturally occurring radioactive gas produced by the
breakdown of Uranium in soil, rock and water.
 Radon is the second most common cause of lung cancer after
cigarette smoking, accounting for 15,000 to 22,000 cancer deaths
per year in the US alone according to the National Cancer Institute
(USA)
 Radon gas is believed to cause about 14% of lung cancer deaths
(1000+ deaths) in Ohio annually.
 45% of homes in Ohio exceed the USEPA action level.
 62.5% of schools in Ohio have at least one room in excess of the
USEPA action level
Data Collection
 Data collected from various county health
departments, commercial testing services and
university researchers.
 Original database – Kumar et al. (1990)
 1996 and 1997 – 82,000
 New data being constantly added
 Total of 130,826 observations used in this study
Objectives
 To evaluate the best interpolation technique for
the radon data set.
 To perform this interpolation technique on the
whole radon data set, obtain prediction map
and estimate concentrations for unmeasured
zip codes.
 To present the impact of the results obtained
from this study.
ArcGIS Geostatistical Analyst
Geostatistical Analyst provides a wide variety of
tools for spatial data exploration, identification of
data anomalies, evaluation of error in prediction
surface models, statistical estimation and optimal
surface creation.
Exploratory Spatial Data Analysis
(ESDA) Tool
 The ESDA tools are designed to explore the
distribution of data, look for global trends in the
data, examining spatial autocorrelation and
understand the correlation between multiple
data sets.
 Tools include Histogram, Normal QQ Plot, Trend
Analysis, Semivariogram/Covariance Cloud.
Histogram
 The Histogram tool in
ESDA
univariate
provides
a
(one-
variable) description of
the data.
 The plots shows the
frequency
distribution
for the radon data set.
Normal QQ Plot
 The
QQ
Plot
is
to
compare the distribution
of the data to a standard
normal distribution.
Trend Analysis
East-West
trend line
 The Trend
Analysis tool
can help identify
global trends in
the input data
set.
North-South
axis
East-West
axis
North-South
Trend line
Semivariogram/Covariance Cloud
Semivariogram
points representing
pairs of locations
Approach
 The geometric mean of radon concentration values is
inputted for each zip code and zero values are assigned
to the zip codes that are not measured.
 The polygon features of Ohio zip codes shape file is
converted into point features to input as point data
source in the interpolation techniques.
 The point featured shape file is then divided into two
shape files; one having 1066 zip codes with radon
concentration data and the other contains 796 zip codes
with no measured radon concentration data.
Approach
 The first step is to evaluate the best interpolation
technique.
 The point featured shape file is divided into 80%
training data points and 20% test data points.
Sensitivity analysis for division of data set
 Then the different interpolation techniques are
executed using the training data points which
creates a layer of spatial variation and the
predictions are evaluated for test data points.
Approach
 Second part
– Best interpolation technique is chosen based on values
of statistical parameters.
– Modeling is done for the whole radon data set, which
creates a surface of spatial variation and the predictions
for unmeasured zip codes (where no data is collected)
is evaluated from the surface created.
Interpolation methods
 Five Interpolation Techniques
 Ordinary Kriging
 Inverse Distance Weighting (IDW)
 Radial Basis Function (RBF)
 Local Polynomial Interpolation
 Global Polynomial Interpolation
Ordinary Kriging
 Kriging is divided into two distinct tasks:
 Quantifying the spatial structure of the data
(known as variography) and producing a
prediction i.e., fitting a spatial dependence
model to the data.
 Make a prediction for the unknown value of a
specific location. Achieved by using the fitted
model from the variography (spatial data
configuration) and values of the measured
sample points around the prediction location.
Ordinary Kriging
The equation used in Ordinary Kriging is:
 n (u )

 (u)Z(u   1    (u ) m

  1

 1
n(u )
Z*(u) =
 Z* (u) is the Ordinary Kriging estimate at spatial
location u,
 n (u) is the number of the data used at the
known locations given a neighborhood
 Z (uα ) are the n measured data at locations uα
located close to u
 m= mean of distribution
Ordinary Kriging
λα (u)= weights for location uα computed from
the spatial covariance matrix based on the
spatial continuity (variogram) model, which is
given by:
γ (h) =
1 n
2
(
z
(
u
)

z
(
u

h
))

i
i
2n i 1
 n is the number of data pairs separated by distance h
 z(ui) and z(ui+h) are the data values at locations
separated by distance h
Ordinary Kriging
Ordinary Kriging

There are three primary
parameters that describe
the autocorrelation of radon
concentrations. These are
range, nugget and sill.
Range
– The range is where the Sill
best-fit line starts to
level off, (46.55). Within
the range, all data are
correlated.
Nugget
– The
maximum
semivariogram value is
sill parameter (0.2869)
– Nugget is data variation
due to measurement
errors (0.20487).
Spherical model
Ordinary Kriging
Ordinary Kriging
Inverse Distance Weighting (IDW)
 IDW interpolation assumes that things close to one
another are more alike than those farther apart.
 To predict a value for any unmeasured location, IDW will
use the measured values surrounding the prediction
location.
 Measured values closest to the prediction location will
have more influence on the predicted value than those
farther away.
 IDW assumes that each measured point has a local
influence that diminishes with distance.
Inverse Distance Weighting
 A simple IDW weighting function, as defined by
Shepard, is :
Where w(d) is the weighting factor applied to a known value
d is the distance between known and unknown values
p is the power parameter (most common value is 2).
 A general form of interpolating a value using IDW is:
Inverse Distance Weighting
Inverse Distance Weighting
Radial Basis Function (RBF)
 RBF is an exact interpolation technique in the
sense that, the surface created must go through
each measured sample value.
 It is similar to IDW, except that it predicts values
above the maximum and below the minimum
measured values.
Radial Basis Function (RBF)
Radial Basis Function (RBF)
Global Polynomial Interpolation
 Global
polynomial
interpolation
technique fits a
plane through
the measured
data points. A
plane is
typically a
polynomial.
Global Polynomial Interpolation
Local polynomial Interpolation
 While Global
Polynomial
interpolation fits
a polynomial to
the entire
surface, Local
Polynomial
interpolation fits
many
polynomials,
each within
specified
overlapping
neighborhoods.
Local polynomial Interpolation
Evaluation Criteria
 Several statistical indicators (Root Mean Square Error
(RMSE), Mean Error (ME), Mean Absolute Error (MAE) and
Mean Square Error (MSE)) are computed on observed and
predicted radon concentrations.
 Confidence limits on the statistics for Normalized Mean
Square Error (NMSE), Fractional Bias (FB), and Coefficient
of Correlation (r) are calculated using Bootstrap application
to identify the most suitable interpolation technique.
Results
Measured Vs Predicted Radon Conc. Values for the test
datasets
Ordinary Kriging Estimates for Test Dataset
RBF Estimates for Test Dataset
10.00
10.00
9.00
9.00
Ordinary Kriging estimates
for Test Dataset
5.00
Linear (Ordinary Kriging
estimates for Test Dataset)
4.00
3.00
2.00
1.00
8.00
7.00
RBF Estimates for Test
Dataset
6.00
5.00
Linear (RBF Estimates for
Test Dataset)
4.00
3.00
2.00
1.00
0.00
0.00
Measured Values
7.00
6.00
0.00
2.00
4.00
6.00
8.00
10.00
0.00
Predicted Values
2.00
4.00
6.00
Predicted Values
IDW Estimates for Test Dataset
10.00
9.00
Measured Values
Measured Values
8.00
8.00
7.00
IDW Estimates for Test
Dataset
6.00
5.00
4.00
Linear (IDW Estimates for
Test Dataset)
3.00
2.00
1.00
0.00
0.00
2.00
4.00
6.00
Predicted Values
8.00
10.00
8.00
10.00
Results
Measured Vs Predicted Radon Conc. Values for test
datasets
LPI Estimated for Test Dataset
9.00
8.00
7.00
6.00
5.00
LPI Estimated for Test
Dataset
Linear (LPI Estimated for
Test Dataset)
4.00
3.00
2.00
1.00
0.00
0.00
GPI Estimates for the Test Dataset
2.00
4.00
6.00
8.00
10.00
Predicted Values
Measured Values
Measured Values
10.00
10.00
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
0.00
GPI Estimates for the Test
Dataset
Linear (GPI Estimates for
the Test Dataset)
2.00
4.00
6.00
Predicted Values
8.00
10.00
Results
ME, MAE, MSE and RMSE values of different
interpolation techniques for geometric mean of
radon concentration test predictions
RBF
Global
Polynomial
Interpolation
Local
Polynomial
Interpolation
0.17
0.19
0.1
0.14
1.33
1.45
1.44
1.46
1.4
MSE
4.99
5.77
5.57
5.15
5.21
RMSE
Value
2.23
2.4
2.36
2.27
2.28
Ordinary
Kriging
IDW
ME
0.09
MAE
Results
NMSE, FB and Corr. Values from Bootstrap Method
RBF
Global
Polynomial
Interpolation
Local
Polynomial
Interpolation
0.46
0.44
0.42
0.42
-0.026
-0.047
-0.055
-0.027
-0.041
0.5
0.42
0.45
0.48
0.47
Ordinary
Kriging
IDW
NMSE
0.41
FB
Corr. (r)
Results
Summary of Robust and Seductive 95%
Confidence Limits Analyses on Each Technique
NMSE
Ordinary
Kriging
IDW
RBF
Global
Polynomial
Local
Polynomial
X
X
X
X
X
X
X
X
X
X
FB
Corr. (r)
Note:
X indicates significantly different from zero.
Blank indicates not significantly different from zero.
Results
Summary of Robust and Seductive 95% Confidence Limits Analyses
among Each Technique
Among Techniques
Interpolation Technique
NMSE
Yes
FB
No
Yes
Ordinary Kriging- IDW
Ordinary Kriging –RBF
Ordinary Kriging - GPI
Ordinary Kriging - LPI
IDW- RBF
IDW- GPI
IDW- LPI
RBF- GPI
RBF- LPI
GPI – LPI
Note:
Yes- Indicates significantly different from zero.
No- Indicates not significantly different from zero
X
Corr.(r)
No
Yes
No
Comparison of the behavior of the prediction maps with the soil
uranium concentrations map
Comparison of the behavior of the prediction maps with the soil
uranium concentrations map
Results
Results
Predicted Geometric Mean of Radon
Concentrations Using Ordinary Kriging technique
for Lucas County
ZIP CODE
COUNTY
PREDICTED GM
43402
LUCAS
1.88
43445
LUCAS
2.96
43449
LUCAS
2.89
43460
LUCAS
2.35
43522
LUCAS
1.80
43551
LUCAS
2.28
43558
LUCAS
1.92
Conclusion
 Prediction maps were created using the training data set for all five
interpolation techniques and projected values were estimated for the
test data set.
 Statistical parameters (error values) were evaluated and the
prediction maps generated from these techniques were compared to
the soil uranium concentration map.
 It was inferred that any of the four (Ordinary Kriging, IDW, RBF and
Local Polynomial) interpolation techniques can be used for predicting
the radon concentrations for unmeasured zip codes.
 Ordinary Kriging technique was chosen and the geometric means of
radon concentrations were evaluated for unmeasured zip codes.
Conclusion
 From the data sets available prior to study, number of zip codes
having geometric mean of radon concentration over 4.0 pCi/l is 390.
 After using the Ordinary Kriging interpolation technique to calculate
the predictions for unmeasured zip codes, number of zip codes
having radon concentration over 4.0 pCi/l is 688.
 The predicted radon concentrations for unmeasured zip codes were
found to be below 8 pCi/l.
 Therefore, for the cases where the geometric mean of radon
concentration exceeds 8 pCi/l and 20 pCi/l, the number of zip codes
from existing data is equal to that obtained by interpolation
technique for unmeasured zip codes (85 and 9 for the respective
cases).
Thank you
Sensitivity Analysis for division of
data set
80-20 (%)
70-30 (%)
60-40 (%)
RMSE
RMSE
RMSE
Ordinary Kriging
2.23
3.33
2.86
IDW
2.4
3.31
2.29
RBF
2.36
3.31
2.93
Global Polynomial
2.27
3.57
3.06
Local Polynomial
2.28
3.3
2.91
Interpolation
Technique
Download