PPT - Institute for Mathematical Sciences

advertisement
Determining homogenous regions:
considerations for water quality
management
Sylvia R. Esterby
Mathematics, Statistics and Physics
University of British Columbia Okanagan
Kelowna BC Canada
Week 2 January 14-18 of:
Data-driven and Physically-based Models for Characterization of Processes in
Hydrology, Hydraulics, Oceanography and Climate Change
Institute for Mathematical Sciences, National University of Singapore
January 7-28, 2008
•
•
•
•
Motivating examples
One method: cluster analysis
Example: clustering lakes
Example: clustering profiles
Esterby-IMS Jan 18, 2008
2
Figure 1 in the article on the web-site
quoted shows India divided into
regions considered to be homogeneous
with respect to susceptibility to
drought.
Droughts over Homogeneous
Regions of India: 1871.1990*, B.
Parthasarathy, A. A. Munot, and D.
R. Kothawale, Indian Institute of
Tropical Meteorology, Pune, India
http://ndmc.unl.edu/pubs/dnn/arch22.pdf
Summer monsoon (June through September)
Agriculture and food production depend on these rains
Studies: understanding or prediction of monsoon rainfall behavior
Under the all-India treatment, have considered the country as one unit
Different regions have considerable spatial variability
Limitations on the All-India average rainfall used at present.
Esterby-IMS Jan 18, 2008
3
The first map on the web-site quoted below
shows the conterminous 48 states of USA
divided into 3000 ecoregions on the basis
of areas of 1 square kilometer. The cluster
membership of the square kilometer is
represented by the color of the square. This
was achieved by assigning red, green, and
blue colors according to the principal
component scores associated with the
ranges of the nine variables defining each
cluster.
Objective: create
geographic ecoregions
which are
homogeneous with
regard to the growth of
woody vegetation.
Ecoregions: based on
multivariate geographic
clustering of 9
variables important to
tree growth in 3 groups
- elevation, soil or
edaphic factors, and
climatic factors.
http://www.geobabble.org/~hnw/esri98/
A New High-Resolution National Map of Vegetation Ecoregions
Produced Empirically Using Multivariate Spatial Clustering
Esterby-IMS Jan 18, 2008
4
A parallel supercomputer
Divide conterminous 48 states of USA into 1000, 2000, 3000, 5000, and 7000 ecoregions
Relatively homogeneous values of elevation, edaphic, and climatic variables
Method: iterative multivariate clustering technique.
Resolution of the clustered maps is 1 square kilometer; each national map
has over 7.7 million cells. Each cell has nine variables from maps with
values for elevation, soil nitrogen, soil organic matter, soil water capacity,
depth to water table, mean precipitation, solar irradiance, degree-day heat
sum, and degree-day cold sum.
The resultant national maps objectively capture the ecological patterns of
spatial variance in physical, edaphic, and climatic factors relevant for the
distribution and growth of plants and animals. Assignment of red, green,
and blue colors according to the principal component scores
associated with the ranges of the nine variables defining each cluster
results in a map where the ecological similarity of adjacent cluster regions is
readily apparent. Maps with this gradually-changing color spectrum illustrate
ecological relationships for plant growth derived from soil factors,
physiognomy, and climate across the 48 states at user-defined resolutions.
The clustering technique is being used as a way to spatially extend the
results of simulation models by reducing the number of runs needed to
obtain output over a larger area.
Esterby-IMS Jan 18, 2008
5
http://aquagap.cfe.cornell.edu/discuss.htm
APPLICATION OF GAP ANALYSIS
TO AQUATIC BIODIVERSITY CONSERVATION
A pilot study by the
New York Cooperative Fish & Wildlife Research Unit
Stream gradient acts as a surrogate for substrate by
separating organisms which favor sand, silt and clay (low
gradient streams) from those which favor cobble, boulders and
rock (high gradient streams). Observed median gradients
plotted against dominant substrate in the Allegheny River
watershed leant support for the placement of the classification
criterion used to separate sites with dominant fine sediment
substrate from those with dominant coarse substrate (gravel,
pebble and cobble). Thus, the classification criteria used
here is successful at separating sites based on substrate
composition in the Allegheny River watershed. Automated
Esterby-IMS Jan 18, 2008
6
Influence of the size of homogeneous regions on the goodness of fit of ungauged
river floods in Quebec.
Anctil, F., Mathevet, T. ,
Département de génie civil, Pavillon Adrien-Pouliot, Université Laval, Québec.
Canadian Water Resources Journal, 2004 (Vol. 29) (No. 1) 47-58
Abstract:
The influence of the size of homogeneous regions on the goodness of fit of
ungauged river floods in Quebec, Canada, is studied by cross validation. Two initial
regions, one homogeneous and the other potentially homogeneous, formed by 38
and 34 rivers were used. Homogeneous sub-regions of various sizes were randomly
created to study the behaviour of the non-selected rivers, considered as ungauged for
the purpose of this study. Results have shown that the size of the sub-regions has less
impact on the χ2 test results than the inherent quality of each river. In fact, the size of the
sub-regions was inversely proportional to the variability, which means that a region of
small size has a larger chance to lead to realization exceeding the χ2 test critical value
than a region of large size. In spite of this finding, the influence of the size of the regions
was small if one considers that for the worst case scenario (homogeneous sub-regions of
five rivers), the percentage of failure of the χ2 test was increased by only approximately
3%. However, the distribution of the regional L-moment ratios decreases with the size of
the sub-regions. The selection of larger homogeneous regions thus allows a
reduction in the variability of the estimation of regional T-year events.
Esterby-IMS Jan 18, 2008
7
How can cluster analysis be considered a method for
constructing zones?
• Zone – some definitions
• Data from specific locations
• Classical cluster analysis methods
• Cluster methods with contiguity constraint
Esterby-IMS Jan 18, 2008
8
Characteristics of data sets we are considering
• Multiple variables are important
• Observed at a number of locations
• Are there homogeneous sub-regions?
Esterby-IMS Jan 18, 2008
9
Zone bounded area with constant value of
characteristic Y .
For zone k, E(yij ) = µk for yij in Rk
Esterby-IMS Jan 18, 2008
10
Classical clustering methods
One example
• calculate similarity measure for pairs of sites
• hierarchically group sites, eg. Including progressively
less similar sites to clusters
k-means non–hierarchical
sites added to cluster if closest to that cluster centroid
Now, plot sites on map
Esterby-IMS Jan 18, 2008
11
• We do sample at discrete locations
• For the objective of finding similar locations
• Cluster analysis is a method for determining similar sites
Esterby-IMS Jan 18, 2008
12
Matrix of data first case
n rows corresponding to m variables measured at each
of the n sites
Profiles
n rows corresponding to a variable measured m times at
each site
Cluster rows (sites) in each case
Esterby-IMS Jan 18, 2008
13
Determine stations with
similar water quality
characteristics as
relevant to
anthropogenic
acidification
Possible to reduce the
number of sites
sampled ?
Esterby, El-Shaarawi, Howell, Clair 1989
Esterby-IMS Jan 18, 2008
14
Constrained clustering – take
contiguity relationships into
account in some way
(Gordon, 1980)
Example: order of observations
in time is of importance
• Clustering of diatom profiles
in lake sediment cores
• Motivation: Inferred pH
through regression of pH on
index
To explain, look at individual
profiles
Esterby 1988
Esterby-IMS Jan 18, 2008
15
• Cluster methods
• Other methods (eg. fit a surface and obtain contour)
• Establishing relationships between variables
• Eg. Application to predicting streamflow
Esterby-IMS Jan 18, 2008
16
Homogeneity over time and space in parameter estimation
for data-driven models
ie. relevance to data sets used with models
Trying to predict change by modelling processes, do we
have evidence?
Esterby-IMS Jan 18, 2008
17
Download