The research community has realized the limitations of

advertisement
The research community has realized the limitations of simple inverse distance weighted
interpolation and modifications to 1/r2 have been addressed as well as new statistical
estimation methods. The most popular of these is kriging. In addition to accounting for
separation distance, it also uses a “statistical distance” that accounts for the covariance
between stations. If two stations have a high covariance, then their separation distance
will be reduced. They are closer together than their eularian distance because of their
similarities in concentration. The addition of the covariance makes the estimation
method responsive to the “clustering” of monitoring stations.
The difficulty in applying kriging techniques to air pollution data arises from some of its
assumptions. Kriging assumes that the pollutant has a constant mean and variance over
the region to which the variogram is applied. Air pollutant concentrations fail this
criteria because of their dependence on terrain, local emission sources, and meteorology.
Air pollution data has definite spatial, particularly direction, behavior and does not adhere
to the spatial stationarity assumption. In other word, generating a complete variogram
model is not possible because the relationships between air pollutant concentrations are
sensitive to their spatial locations not just their separation distances.The literature has
current examples that work around these assumptions. Haas, (1990) developed a moving
window kriging scheme so that separate variograms were modeled for each subregion.
Two recent Ph.D. dissertations dealt with the issue of non-stationarity in air pollutant data
by applying spatial and temporal filters to the data so that the residuals are spatially and
temporally. At the University of Washington, Wendy Meiring, used a spatial
deformation technique that realigned the coordinate system after filtering out spatial and
temporal trends in the data. Vyas at the University of North Carolina also applied filters
to the observation data before modeling the variogram and then added the trends back to
the estimated data. Other problems include the fact that for adequate reliablity, kriging,
like all interpolators, requires a high spatial monitoring network coverage. It only
accounts for the behavior of the pollutant at those locations where monitoring sites are
located. If no sites exist near an emission source, then that emission source will be
ignored in the estimates derived near that source.
-----------------------------------------------------------------------
(From Isaaks, p30,56)
Correlation

1 n
 xi  m x yi  m y 
n i 1
 x y
Covariance
C XY 
1 n
 xi  m x yi  m y 
n i 1
Moment of Inertia
M .ofInertia 
1 n
2
  xi  y i 
2n i 1
Much of the current work aimed at improving the interpolation process for air pollutant
concentrations involves geostatistical techniques. A frequently invoked technique,
kriging, accounts for the spatial variability in the data as well as their spatial distribution
(Isaaks and Srivastava, 1989). Kriging was originally developed as a statistical tool in
the mining industry for estimating ore deposits. Like simple distance weighted
interpolation, kriging estimation is based on the separation distance between the
monitoring sites and the estimation location with the estimate being a linear combination
of weighted concentrations at neighboring monitoring stations. Kriging distinguishes
itself in that the weights are determined by minimizing the estimation variance. The
estimation variance is derived through covariances that are dependent on a random
variable model called the variogram. The variogram model is developed by comparing
the concentrations between all pairs of monitoring stations. The distance separating each
monitoring station pair is used to place the pair in a distance bin and the covariance is
calculated for all station pairs within each bin. Plotting the covariance values against the
bin distances results in the sample variogram. A function is fitted to the sample
variogram and is applied to the error minimization.
The kriging error is a function of the variance at the estimation location, the covariance
among the sampling sites involves, and the covariance between the sampling site and the
estimation location. The covariances are found through the variogram model. The
variance at the estimation point is simply the nugget value. The covariances among
samples and between sample and estimation point is derived through the variogram for
the appropriate separation distances. The equational form of the error is
n
n
n
     ai VX   ai a j X X
2
e
2
v
i 1
i
i 1 j 1
i
j
where v2 is the variance at the estimation location
VXi2 is the covariance of the sampling and estimation locations
XiXj is the covariance among the sampling locations
n is the number of sampling sites
Therefore if the covariance among the sampling and estimation location is large, the error
is reduced and if the covariance among samples is large the error is increased due to
redundancy. A large covariance among samples should not increase the error because the
estimation is not any worse for it. If anything, it should reduce the error because the
multiple sites have the same value and therefore solidifies the estimate. Granted, this
error is increased so that the estimate is not biased by say a cluster of sites and in that
way it is useful but for giving the uncertainty in the estimate it may be misleading.
Lefohn, et. al (1987) have described the components of the estimation error as one term
measures the closeness of the samples to the location being estimated. As this distance
increases the term becomes larger and the error is increased. A second term measures the
size of the area being estimated (others define this as the variance at the estimation
location). As the size increases, the error decreases. The third term measure the spatial
relationship of the samples to each other. If the samples are clustered, this term is small
and the error remains high.
Subsequently, the kriging weights are derived as,
1
wij  C jxj  Dij
where wij is the weight assigned to monitoring site j,
Cjxj-1 is a matrix of that contains the covariance between all pairs of monitoring
station,
Dij is a vector that contains the covariance between monitoring sites j and the
estimation point i.
The covariance vector, Dij, can be interpreted as the weights obtained in inverse distance
weighted interpolation except the distances are statistical in nature in that they account
for the covariance between the monitoring sites and estimation points as well as their
separation distance. The covariance matrix, Cjxj-1, accounts for the separation distances
and covariances between all pairs of monitoring sites. This allows kriging to incorporate
aspects of the monitoring network that simple interpolation schemes do not, namely the
clustering and redundancy of sites.
Kriging has been applied to atmospheric variables such as wind speed and direction, acid
precipitation, tropospheric ozone, and precipitation (Lefohn et al., 1987; Seilkop and
Finkelstein, 1987; Eynon, 1987; Venkatram, 1988; Palomino and Martin, 1994; Liu and
Rossini, 1996).
Declustering in Kriging
Kriging accounts for the redundancies found in clusters of stations. For example, two
samples close together will generally contribute less information to the estimation than
samples farther apart and will be reflected by larger values in the C matrix off-diagonal
entries. The [raw] weight assigned to these samples will generally be redistributed to
other samples that are farther away but less redundant. This possible redundancy between
samples does not just depend on the geometric distance between them, but also depends
on the spatial continuity. The combination of these two factors is referred to as statistical
distance.
Therefore, kriging considers a group of sites a cluster if they have the same 'value' and
are very close to each other. It does not base the 'clusterness' of a group on its relative
distance from the estimation location because it applied a single variogram model to the
entire area being interpolated. If an urban area has multiple sites that have a large
variability, the group of stations will be considered less of a cluster than if the sites all
had low variance, high covariance. A group of stations with high covariance is
considered to be more redundant and therefore will have its total weight reduced in the
interpolation.
Isaaks and Srivastava (1989) indicate that once some form of declustering is performed
on inverse distance weighted interpolation, "the advantage of ordinary kriging over
inverse distance squared becomes slight. (p. 346)"
Kriging with a Trend, "Universal Kriging"
There is a know trend in the data that can be modeled with a function. For drift the
function must be valid at all points to be estimated as well as all points where the primary
data is located. Kriging with a trend only requires that the function be valid at the
estimation locations.
Exhaustive secondary information refers to secondary information that is available at all
primary data locations and at all locations being estimated.
The secondary data is the "trend" for the primary data. The "residual" component is still
estimated with the primary data but the trend that it gets added to is a function (usually
required to be linear) of the secondary data at the estimation location.
Cokriging
Cokriging is an extension of ordinary kriging. A higher resolution network of a
secondary variable is used to improve the estimation of the main variable. Like ordinary
kriging, a variogram model is developed for the main variable. A second variogram
model is also developed for the secondary variable, and a third variogram model is
generated from the cross correlation of the main and secondary variables.
The cokriged estimate is determined by;
n
m
i 1
k 1
c j   wij ci   wkj c k
The secondary data is "transformed" to the scale of the primary and a variogram and
cross variogram are calculated. The secondary are then treated identically as the primary
data in that the nearest sites are found and then the appropriate variogram or cross
variogram is applied for determining the covariance matrix and estimation to site
variance vector.
The third variogram is called the cross-variogram and is generated by pairing up
monitoring stations for the primary and secondary variable in specific distance bins. For
instance, all fine mass and visibility monitoring stations between 50 and 75 kilometers
apart would have their concentrations or values correlated so that you would get a
primary vs. secondary scatter plot. Once the correlations are determined for all of the
bins, a correlogram or variogram can be constructed.
The kriging error equation becomes,
n
m
n
n
m
m
 e2   v2  2 ai VX  2 bk  VX   ai a j X X   bk bl  X
i 1
i
k 1
k
i 1 j 1
i
j
k 1 l 1
n
k Xl
m
 2 ai bk  X i X k
i 1 k 1
where v2 is the variance at the estimation location
VXi2 is the covariance of the primary sampling and estimation locations
XiXj is the covariance among the primary sampling locations
VXk2 is the covariance of the secondary sampling and estimation locations
XkXl is the covariance among the secondary sampling locations
XiXk is the covariance among the primary and secondary sampling locations
n is the number of primary sampling sites
m is the number of primary sampling sites
If the secondary information is exhaustive, that is it is available at all locations where a
primary estimate is sought, then instead of the rigorous cokriging, kriging with an
external drift can be used.
Step-by-step method
Loop over the output grid.
For each grid node get the coordinates.
Loop over data file
For each data point get the coordinates
Calculate the data point distance from the grid node
Check if the distance is less than search radius for primary variable
If it is then check if the value is valid and add to primary data array and
sort in order of increasing distance
Do the same for the secondary variable
End Data file loop
Send the nearest data to the cokriging module
Create the kriging matrices.
For the pair of values (i,j), if
i and j are of both primary variables then use primary variogram
i and j are of both secondary variables then use secondary variogram
i and j are one primary and one secondary variables then use cross variogram
Solve kriging equation to get estimate and variance
End output grid loop
----------------------------------------------------------------------------------------Mining Geostatistics
A.G. Journel, Ch. J. Huijbregts
Academic Press
New York
1978
"Geostatistics is the application of the formalism of random functions to the
reconnaissance and estimation of natural phenomena." -G. Matheron (1962), p.1
Estimation variance:


E Z V  Z v   2  V , v    V , V    v, v 
2
_
_
_
The estimation variance, i.e. the quality of the estimation is dependent on:
1. The relative distances between the estimation location and the monitoring sites used
to estimate it. This is embodied in the first term.
2. The size and geometry of the estimation location, which is embodied in the second
term
3. The quantity and spatial arrangement of the monitoring sites, which is embodied in
the third term.
4. The degree of continuity of the phenomenon under study, which is conveyed by its
characteristic semi-variogram.
Download