Determining homogenous regions: considerations for water quality management Sylvia R. Esterby Mathematics, Statistics and Physics University of British Columbia Okanagan Kelowna BC Canada Week 2 January 14-18 of: Data-driven and Physically-based Models for Characterization of Processes in Hydrology, Hydraulics, Oceanography and Climate Change Institute for Mathematical Sciences, National University of Singapore January 7-28, 2008 • • • • Motivating examples One method: cluster analysis Example: clustering lakes Example: clustering profiles Esterby-IMS Jan 18, 2008 2 Figure 1 in the article on the web-site quoted shows India divided into regions considered to be homogeneous with respect to susceptibility to drought. Droughts over Homogeneous Regions of India: 1871.1990*, B. Parthasarathy, A. A. Munot, and D. R. Kothawale, Indian Institute of Tropical Meteorology, Pune, India http://ndmc.unl.edu/pubs/dnn/arch22.pdf Summer monsoon (June through September) Agriculture and food production depend on these rains Studies: understanding or prediction of monsoon rainfall behavior Under the all-India treatment, have considered the country as one unit Different regions have considerable spatial variability Limitations on the All-India average rainfall used at present. Esterby-IMS Jan 18, 2008 3 The first map on the web-site quoted below shows the conterminous 48 states of USA divided into 3000 ecoregions on the basis of areas of 1 square kilometer. The cluster membership of the square kilometer is represented by the color of the square. This was achieved by assigning red, green, and blue colors according to the principal component scores associated with the ranges of the nine variables defining each cluster. Objective: create geographic ecoregions which are homogeneous with regard to the growth of woody vegetation. Ecoregions: based on multivariate geographic clustering of 9 variables important to tree growth in 3 groups - elevation, soil or edaphic factors, and climatic factors. http://www.geobabble.org/~hnw/esri98/ A New High-Resolution National Map of Vegetation Ecoregions Produced Empirically Using Multivariate Spatial Clustering Esterby-IMS Jan 18, 2008 4 A parallel supercomputer Divide conterminous 48 states of USA into 1000, 2000, 3000, 5000, and 7000 ecoregions Relatively homogeneous values of elevation, edaphic, and climatic variables Method: iterative multivariate clustering technique. Resolution of the clustered maps is 1 square kilometer; each national map has over 7.7 million cells. Each cell has nine variables from maps with values for elevation, soil nitrogen, soil organic matter, soil water capacity, depth to water table, mean precipitation, solar irradiance, degree-day heat sum, and degree-day cold sum. The resultant national maps objectively capture the ecological patterns of spatial variance in physical, edaphic, and climatic factors relevant for the distribution and growth of plants and animals. Assignment of red, green, and blue colors according to the principal component scores associated with the ranges of the nine variables defining each cluster results in a map where the ecological similarity of adjacent cluster regions is readily apparent. Maps with this gradually-changing color spectrum illustrate ecological relationships for plant growth derived from soil factors, physiognomy, and climate across the 48 states at user-defined resolutions. The clustering technique is being used as a way to spatially extend the results of simulation models by reducing the number of runs needed to obtain output over a larger area. Esterby-IMS Jan 18, 2008 5 http://aquagap.cfe.cornell.edu/discuss.htm APPLICATION OF GAP ANALYSIS TO AQUATIC BIODIVERSITY CONSERVATION A pilot study by the New York Cooperative Fish & Wildlife Research Unit Stream gradient acts as a surrogate for substrate by separating organisms which favor sand, silt and clay (low gradient streams) from those which favor cobble, boulders and rock (high gradient streams). Observed median gradients plotted against dominant substrate in the Allegheny River watershed leant support for the placement of the classification criterion used to separate sites with dominant fine sediment substrate from those with dominant coarse substrate (gravel, pebble and cobble). Thus, the classification criteria used here is successful at separating sites based on substrate composition in the Allegheny River watershed. Automated Esterby-IMS Jan 18, 2008 6 Influence of the size of homogeneous regions on the goodness of fit of ungauged river floods in Quebec. Anctil, F., Mathevet, T. , Département de génie civil, Pavillon Adrien-Pouliot, Université Laval, Québec. Canadian Water Resources Journal, 2004 (Vol. 29) (No. 1) 47-58 Abstract: The influence of the size of homogeneous regions on the goodness of fit of ungauged river floods in Quebec, Canada, is studied by cross validation. Two initial regions, one homogeneous and the other potentially homogeneous, formed by 38 and 34 rivers were used. Homogeneous sub-regions of various sizes were randomly created to study the behaviour of the non-selected rivers, considered as ungauged for the purpose of this study. Results have shown that the size of the sub-regions has less impact on the χ2 test results than the inherent quality of each river. In fact, the size of the sub-regions was inversely proportional to the variability, which means that a region of small size has a larger chance to lead to realization exceeding the χ2 test critical value than a region of large size. In spite of this finding, the influence of the size of the regions was small if one considers that for the worst case scenario (homogeneous sub-regions of five rivers), the percentage of failure of the χ2 test was increased by only approximately 3%. However, the distribution of the regional L-moment ratios decreases with the size of the sub-regions. The selection of larger homogeneous regions thus allows a reduction in the variability of the estimation of regional T-year events. Esterby-IMS Jan 18, 2008 7 How can cluster analysis be considered a method for constructing zones? • Zone – some definitions • Data from specific locations • Classical cluster analysis methods • Cluster methods with contiguity constraint Esterby-IMS Jan 18, 2008 8 Characteristics of data sets we are considering • Multiple variables are important • Observed at a number of locations • Are there homogeneous sub-regions? Esterby-IMS Jan 18, 2008 9 Zone bounded area with constant value of characteristic Y . For zone k, E(yij ) = µk for yij in Rk Esterby-IMS Jan 18, 2008 10 Classical clustering methods One example • calculate similarity measure for pairs of sites • hierarchically group sites, eg. Including progressively less similar sites to clusters k-means non–hierarchical sites added to cluster if closest to that cluster centroid Now, plot sites on map Esterby-IMS Jan 18, 2008 11 • We do sample at discrete locations • For the objective of finding similar locations • Cluster analysis is a method for determining similar sites Esterby-IMS Jan 18, 2008 12 Matrix of data first case n rows corresponding to m variables measured at each of the n sites Profiles n rows corresponding to a variable measured m times at each site Cluster rows (sites) in each case Esterby-IMS Jan 18, 2008 13 Determine stations with similar water quality characteristics as relevant to anthropogenic acidification Possible to reduce the number of sites sampled ? Esterby, El-Shaarawi, Howell, Clair 1989 Esterby-IMS Jan 18, 2008 14 Constrained clustering – take contiguity relationships into account in some way (Gordon, 1980) Example: order of observations in time is of importance • Clustering of diatom profiles in lake sediment cores • Motivation: Inferred pH through regression of pH on index To explain, look at individual profiles Esterby 1988 Esterby-IMS Jan 18, 2008 15 • Cluster methods • Other methods (eg. fit a surface and obtain contour) • Establishing relationships between variables • Eg. Application to predicting streamflow Esterby-IMS Jan 18, 2008 16 Homogeneity over time and space in parameter estimation for data-driven models ie. relevance to data sets used with models Trying to predict change by modelling processes, do we have evidence? Esterby-IMS Jan 18, 2008 17