Center for Statistical Ecology and Environmental Statistics Digital Governance and Hotspot Geoinformatics of Biodiversity Measurement, Comparison and Management in the Age of Indicators and Information Technology By Ganapati P. Patil¹ and Wayne L. Myers², ¹Center for Statistical Ecology and Environmental Statistics, Department of Statistics, Penn State University, University Park, PA, 16802, USA ² School of Forest Resources and Office for Remote Sensing for Spatial Information Resources, and Penn State Institutes of Environment, Penn State University, University Park, PA 16802, USA This material is based upon work supported by the United States National Science Foundation under Grant No. 0307010. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the agencies. [Invited Paper for the ISI Platinum Jubilee Volume for International Biodiversity Conference Symposium] Technical Report Number 2008-1121 TECHNICAL REPORTS AND REPRINTS SERIES November 2008 Department of Statistics The Pennsylvania State University University Park, PA 16802 G. P. Patil Distinguished Professor and Director Tel: (814)865-9442 Fax: (814)865-1278 Email: gpp@stat.psu.edu http: //www.stat.psu.edu/~gpp http://www.stat.psu.edu/hotspots DGOnline News Digital Governance and Hotspot GeoInformatics of Biodiversity Measurement, Comparison and Management in the Age of Indicators and Information Technology Ganapati P. Patil¹ and Wayne L. Myers² (1) Center for Statistical Ecology and Environmental Statistics, Department of Statistics, The Pennsylvania State University, University Park, PA, US (2) School of Forest Resources and Office for Remote Sensing and Spatial Information Resources, The Pennsylvania State University, University Park, PA, 16802 Abstract : Biodiversity measurement, comparison and related hotspot geoinformatics are challenging issues and opportunities in the twenty-first century of statistical ecology, environmental statistics, risk analysis, knowledge discovery, decision making, and decision support in the age of ecological, environmental, and socio-economic indicators. This paper covers diversity measurement and comparison, diversity profiles, biodiversity indicators selection for monitoring, etiology, early warning, and management, substantive and geographic considerations, etc. The paper also attempts to demonstrate that the societal biodiversity issues and concerns for knowledge discovery and decision making have also led to interesting and innovative mathematical, statistical, computational, visualizational, and software developmental issues and approaches for decision support in multi-scale advanced raster map analysis area in this age of indicators and information technology of digital governance. A unique prototype novel and innovative district level initiative in the triadic spirit of digital governance and hotspot geoinformatics for natural resources monitoring, etiology, early warning and sustainable management is briefly introduced within the context of district level linking of small rivers and streams as a vehicle for monsoon rainwater harvesting and management in the face of water scarcity to help provide a shot in the arm to restore and enhance agriculture, biodiversity, nature conservation, drinking water, and eco-cultural community life. This material is based upon work partially supported by the National Science Foundation under Grant No. 0307010. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsoring agencies. 1 1. Introduction, Background, and Motivation It is a great delight for me (GP) to be invited to speak to the Platinum Jubilee of the Indian Statistical Institute in its important biodiversity component. It is with a sense of gratitude and affection that I am here celebrating golden jubilee of my own time at the Indian Statistical Institute in great company with C. R. Rao and J. Roy and subsequently with Professor Mahalanobis and Ranidi. It has been an honor to have received the only D.Sc. of Theoretical and Applied Statistics of the Institute so far. My wife and I have fond memories of our stay with Professor and Ranidi at Amrapalli in the Poet’s Room! It was a treat to be at the Silver Jubilee and at the Golden Jubilee of the Institute together with great stalwarts of the whole variety of disciplines and fields in which the Institute has been known to be involved worldwide. Diversity measurement and comparison has been an important issue for a long time. These days biodiversity measurement, comparison, and related hotspot geoinformatics are challenging issues and opportunities in the twenty-first century of statistical ecology, environmental statistics, risk analysis, knowledge discovery, decision making, and decision support in the age of ecological, environmental, and socio-economic indicators. The challenge and opportunities multiply all the more in the present day setting of information technology and digital governance. The whole issue is very exciting. And so also the individual parts. We will touch on these as we move along. We will cover diversity measurement and comparison, diversity profiles, biodiversity indicators selection for monitoring, etiology, early warning, and management, substantive and geospatial considerations, etc. We will demonstrate that the societal biodiversity issues and concerns for knowledge discovery and decision making have led to interesting and innovative mathematical, statistical, computational, visualization, and software developmental issues and approaches for decision support in multi-scale advanced raster map analysis area in this age of indicators and information technology of digital governance. And toward the end, we will share a unique prototype novel and innovative district level initiative of linking small rivers and streams as a vehicle for monsoon rainwater harvesting and management in the face of water scarcity to help restore and enhance agriculture, biodiversity, nature conservation, drinking water, and diverse eco-cultural community development. 2 2. Ecological Diversity as a Motivating Example 2.1 Quantification for Ecological Diversity Conservation biology, landscape ecology, and ecosystem-oriented natural resources management lend considerable urgency to issues and approaches concerning biodiversity assessment. Most of the traditional approaches and statistical tools are plot-based with a goal of definitive characterization. Diversity, however, is relative to a spatial scale, temporal scale, and taxocene spectrum. Patterns may be more informative than absolutes in this regard. The issues are fundamental in that explaining the effects of environment on the distribution and abundance of species is the essence of much ecological work. The controversies arise from the intrinsic scientific importance of diversity theories, as well as from the broad economic and social ramifications of considering biodiversity in land use decisions. At the heart of the scientific and social controversies regarding diversity are problems of quantification, interpretation, and analysis. The classical view of diversity remains important for intensive studies of particular ecological communities and forest stands (Gove et al., 1994). However, the emerging sciences of landscape ecology and conservation biology have made evident the logistical and economical impracticality of such intensive observational coverage for regions in the order of square kilometers and larger (Scott et al., 1989). These spatial scales are necessarily encompassed by contemporary ecosystem-oriented resource management and design of regional/national networks of biodiversity reserves. Furthermore, species/area and minimum viable population issues become fundamental in these matters. The multidimensional character of diversity can be revealed by establishing an intrinsic, and index-free, diversity ordering. In effect, diversity may appear to have decreased when viewed from one vantage point (i.e., index), and increased when viewed from a different perspective. In view of the inadequacy of a single index, Patil and Taillie (1979, 1982) quantify diversity by means of diversity profiles. A diversity profile is a curve depicting the simultaneous values of a large collection of diversity indices. Thus, the profile portrays the views of diversity from many different vantage points simultaneously and in a single picture. Differences in community diversity are studied by comparing profiles. If the two communities are intrinsically comparable, then one profile will lie uniformly above the other. Conversely, when the communities are not intrinsically comparable, their profiles may intersect. But even here, the profiles can reveal which portions of the community have undergone opposing diversity changes. 3 2.2. Indicators for Ecological Diversity We proceed to consider ways of coping with complexity and confounding that embrace multiple indicators rather than agonizing over choices and conflicts of diversity measures. We contemplate enlarging the orders of indicators to encompass some interactions in a formal manner that accommodates both parameterization and visualization. We conclude by noting the convergence of biodiversity and ecological community concepts at meter scales, but not for broader landscape, regional, and global scales of ecological organization. The indefiniteness regarding biodiversity that can give rise to frustration is well expressed by L. R. Taylor (1978) in the following quote: Diversity so pervades every aspect of biology that each author may safely interpret the word as he wishes and there is consequently no central theme to the subject. We cannot be sure if this flexibility is healthy or due to lack of discipline, but it can be traced back to the beginnings of interest in biological diversity … The recent programs of the U.S. National Science Foundation probing biocomplexity in many contexts serve to provide evidence that the flexibility addressed by Taylor is both healthy and indicative of need for strengthening discipline with regard to scientific constructs and means by which they are made operational. Indicators/expressions of this nature are appropriate, and therefore of value, if they convey the desired information within the budget and delivery delays that are acceptable. One method is more efficient than another if it conveys the required information either more rapidly or at less cost. Conveying more information at the same cost and timing is not necessarily desirable if unwanted information has to be processed or filtered. Increasingly sophisticated management, intervention, remediation, and regulation require a continuing flow of multiple indicators for various aspects of ecosystems. What ecosystem managers and regulators seek is a complementary set of indicators that captures aspects of interest. We thus have entered the exciting age of indicators. For ecological diversity and biodiversity. 4 3. Biodiversity Measurement and Comparison 3.1 Biodiversity with Presence/Absence Data: Biodiversity is perhaps best revealed by a species list. Biodiversity may evade specific definition, but there is very strong consensus that the current loss of species, along with the subsequent loss of genetic diversity, is unacceptable if we are to maintain a healthy ecosystem. Such a concern pertains to ecosystems at many spatial scales, whether a state park of 10 km 2 , a whole state, a nation or the entire globe. Indeed, environmental concerns have traditionally been more localized; however, contemporary issues like global warming, ozone depletion and biodiversity loss are very large scale concerns. Large scale monitoring for biodiversity assessment typically allows for only a species list to be aquired in an area of concern. There is simply too much ground to cover for estimating relative abundances as well. If the species list is aquired from a sampled sub-area, then how do we estimate the total number of species, known as species richness, for the larger area of concern? We can not simply estimate the average number of species per unit area and multiply by the whole area. If one sample unit has 3 species and another has 9, the average number of species per sample unit is not necessarily (9+3)/2 = 6. Some species may be present in both units, therefore implying that 3 species plus 9 species would be less than 12 species. Biodiversity as species richness is determined by what becomes of s(2) = 1+1, s(3) = 1+1+1,…. s(n) = 1+1+1+….+1 with n summands for n investigators or n individuals. An approach to this problem of estimating the total of a non-additive variable is to apply the concept of a species area curve. The number of species increases with increasing area sampled in a non-linear manner, rising rapidly at first, then reaching a point of diminishing returns. The challenge is then to maximally accelerate the empirical species-area curve so that the point of diminishing returns is achieved in as small an area as possible. Knowledge of habitat may help to achieve this sampling objective by providing covariate information that helps us to direct which sample units to measure. 3.2 Biodiversity with Relative Abundance Data 3.2.1 Am I a Specialist or a Generalist? The degree of specialization/diversification has to be relative to the categories identified. 3.2.2 Resource Apportionment Resource may take the form of time, energy, biomass, abundance, etc. Degree of specialization/diversification does not depend on the identity of the categories. It is permutation-invariant. 5 3.2.3 Diversity as Average Species Rarity Let C = (s, π ) = ( π ) = ( π 1, π 2 ,…. π s ) be an ecological community of s species with relative abundance vector π . Let R(i; π ) be the rarity measure of the ith species in the community with relative abundance vector π . Diversity of the community π is then measured by its average species rarity given by Δ (π ) = s ∑ π R(i;π ). i i =1 Several of the most frequently used diversity indices may be conveniently expressed under the umbrella of average species rarity through judicious choice of rarity functions. Species richness, species count, Shannon's, and Simpson's indices all may be derived from this theory as follows Δ SR = 1 ∑ (π i =1 Δ SC = Δ Sh = Δ Si = s 1 ∑ (π i =1 )π i = s species richness, (1) − 1)π i = s – 1 species count, (2) i i s ∑ (− log π i =1 s ∑ (1 − π i =1 i i )π i = s ∑π i log π i Shannon, (3) i =1 s )π i = 1 − ∑ π i2 Simpson, (4) i =1 where the term in parentheses denotes the species rarity function used in each case. Table 1 presents a hypothetical example of three forest stands composed of just five or fewer species of trees. The relative abundances of these tree species based on some quantitative measure of abundance are given, and the diversity indices (1) through (4) calculated from these relative abundances also are shown for each community. The example clearly shows the inconsistency of the different indices in their ranking of these three communities. For example, Δ SC (Stand 1) > Δ SC (Stand 2), but Δ Sh (Stand 1) < Δ Sh (Stand 2) and Δ Si (Stand 1) < Δ Si (Stand 2). This is an interesting comparison because it illustrates how one may be lead to the conclusion that a community with fewer species (Stand 2) can be more diverse than one with more species (Stand 1) using either Shannon's or Simpson's index. Similar inconsistencies among the indices may be found by comparing Stands 1 and 3. The only comparison that is consistently ordered with all indices is Δ (Stand 2) > Δ (Stand 3). This inconsistency of different diversity indices evidently is quite common when making comparisons between communities and arises from a lack of intrinsic diversity ordering between the communities being compared (see the following section). 6 Table1: Three hypothetical forest stand communities composed of five or fewer species of trees. 1 Stand 2 3 Pinus strobus Quercus rubra Tsuga canadensis Acer rubrum Betula papyrifera 0.50 0.30 0.10 0.05 0.05 Relative abundance 0.25 0.25 0.25 0.25 0.00 0.35 0.35 0.30 0.00 0.00 Total: 1.00 1.00 Diversity index 1.00 ∆SR ∆SC ∆Sh ∆Si 5 4 1.24 0.65 4 3 1.39 0.75 3 2 1.10 0.67 Species 3.3 Diversity Profiles Diversity profiles allow the graphical comparison of diversity between communities. One set of profiles that incorporates indices (2) through (4) as point estimates along the curve are the so-called 7 Δ β profiles of Patil and Taillie. Since the Δ β profile incorporates indices developed from dichotomous-type rarity measures, it too may be developed in the same manner: Δβ = s ∑ =1 (1 − π βi ) β 1 − ∑ i =1π iβ +1 s πi = β , β ≥ −1. i The restriction that β ≥ −1 assures that Δ β has certain desireable properties. The species count, Shannon and Simpson indices are related to Δ β by Δ SC = Δ −1 Δ Sh = Δ 0 , Δ Si = Δ1 . The Δ β diversity profiles for the three stands in Table1 are presented in Figure 1. Note that the profile for Stand 1 crosses both profiles for Stands 2 and 3. The profile for Stand 1 crosses that of Stand 2 at β = -0.45, which explains why both Δ Sh and Δ Si rank diversity of these two communities differently from Δ SC . On the other hand, the profiles for Stands 1 and 3 cross at β = 0.62 showing how Δ SC and Δ Sh rank these two communities differently from Δ Si . In general, it also is possible for two Δ β profiles to cross at β >1 or for them to cross more than once; in either case, even calculating all three indices( Δ SC , Δ Sh , and Δ Si ) alone may not be enough to show the inconsistent ranking of communities at larger β . Calculating and plotting Δ β profiles for β > 1 may not be helpful either because the profiles tend to converge quickly beyond this point and intersections do not resolve---an algorithm for numerically finding the intersections of any two Δ β profiles is required in this case. Figure 1: Δ β profiles for the three hypothetical forest stand communities in Table 1. 8 Perhaps the most useful way to compare diversity between communities imd > C) is by the concept of intrinsic diversity ordering. This concept may be defined as follows: Community C ′ is intrinsically more diverse than community C (written C ′ provided C leads to C ′ by a finite sequence of 1. introducing a species, 2. transferring abundance from more to less abundant species without reversing the rank-order of the species, and 3. relabeling species (i.e., permuting the components of the abundance vector). Note that this ordering is only partial and two given communities need not be intrinsically comparable. A diversity profile approach has been developed by Patil and Taillie using a rank-type rarity measure on π # that incorporates the concepts of intrinsic diversity ordering defined above. Let U U U ⎧1 if i > j; R(i) = ⎨ ⎩0 if i ≤ j, for 1≤ j ≤ s. Then average species rarity is given as s Tj = ∑π i = j +1 # i , j = 1,…. s − 1 , l; where Ts = 0 and T0 = 1. The quantity in (7) is termed the right tail-sum of the ranked relative abundance vector π # , and when a plot of the (j, T j ) pairs is constructed for each community, the U U U U resulting profiles are termed intrinsic diversity profiles. Any intrinsic orderings of the communities, if they exist, can be determined with the intrinsic diversity( T j ) profiles. The right tail-sum profiles for the three stands in Table 1 are plotted in Figure 2. Notice that the profile for Stand 1 crosses both those for Stands 2 and 3, but that the profile for Stand 2 is everywhere above that for Stand 3. It follows that the only intrinsic diversity ordering for these imd stands is C (Stand 2) > (Stand 3). This is consistent with the findings of the indices in the section on Average Species Rarity and the Δ β profiles. The Δ β profiles are isotonic to intrinsic diversity ordering in that, if an intrinsic diversity ordering exists, they will preserve it. However, the Δ β profiles may not cross even if the T j profiles do; therefore, the Δ β profiles do not necessarily reflect intrinsic diversity ordering. Since the diversity indices discussed have the same properties as the Δ β profiles, it should be emphasized that, of the methods presented thus far, the T j profiles are the most reliable measure of intrinsic diversity ordering between communities. 9 Figure 2: Right tail-sum ( T j ) profiles for the three hypothetical forest stand communities. 10 4. Exploring Patterns of Habitat Diversity Across Landscapes Using Partial Ordering Different aspects of diversity for a particular complement of biotic elements in a locale present copious mathematical and even conceptual challenge as considered to this point. However, the problematic nature of capturing diversity in nature compounds rapidly when one extends the consideration to real-world contexts of natural complexity and human influence at landscape and regional levels of geographical scope. Organisms do not naturally occur in cages, aquariums, and other such laboratory vessels. They occupy environmental contexts that we usually call habitats. Habitat is itself indefinite in the abstract, implying that a particular organism is capable of sustained occupancy in that context. If the organism in question is not actually known to be present, then habitat is a hypothesis. If the organism is present and sustaining occupancy, then its environmental context is habitat. Extended study of the circumstances in which an organism sustains occupancy leads to a habitat model. If a locational context matches a habitat model but lacks occupancy, it does not follow that the habitat model is erroneous since individuals of the organism may not have found their way into (colonized) the locale in its current condition within current lifespan for individuals. We do know from collective scientific experience that there is some specificity among some kinds of organisms in their habitat requirements. Therefore, it is not expected that certain kinds of diversity will co-exist in a region unless there are adequate expanses of the respective types of habitat in some sort of spatial mosaic. Thus, habitat diversity is requisite for having certain types of biodiversity. What constitutes an adequate expanse is another issue, both in aggregate and as spatial instances. Partitioning a region into smaller and smaller instances of different types of habitats will increase the spatial diversity of habitat, but excessively small instances create fragmentation which is detrimental to sustaining biodiversity. Also, the greater the distance between spatial instances of habitat the more perilous becomes the dispersal between instances to replenish occupancy under incidents of attrition. Thus, habitat diversity in both different kinds of habitat and complexity of spatial arrangement can be seen as increasing biodiversity; but only up to some point of diminishing returns. Thus, indicators of a particular aspect of biodiversity are not necessarily monotone increasing at a landscape/regional spatial scope. Some different kinds of organisms have varying degrees of similarity in habitat requirements. When there is sufficient similarity that suites of different organisms are typically found jointly occupying an environmental context, then such a suite can be considered a community of organisms. Thus, habitat diversity is generally consonant with community diversity. Joint occupancy may or may not be individually and/or collectively beneficial for a particular kind of organism, and these relationships may or may not be reciprocal. Thus, one kind of organism may prey upon another, but in so doing may prevent the prey from over-exploiting resources that it requires to the collective detriment. If all spatial instances of habitat are accessible to all individuals over a chain of generations, then adaptive diversification is largely retarded by extensive genetic intermixing so that speciation to produce substantively different (new) kinds of organisms does not proceed rapidly. If spatial instances of habitat are both small and inaccessible, on the other hand, then the (meta)population in that locale becomes more subject to catastrophic extermination without replenishment. The consequence of all of these biological, spatial and temporal interactions compounded by human influence is that biodiversity extended to ecological diversity is anything but simple. It must therefore be approached from a multidimensional perspective of pattern and complementary indicators, and any consideration will always be partial in some sense. Consideration of biological/ecological diversity having meaningful implications will always entail perspectives and priorities. It is also quite possible (probable) for different perspectives to be conflicting in various respects. 11 4.1 Partial Prioritization with Multiple Indicators Beyond the academic, we must speak of particular places and consider specific sorts of diversity under definitive priorities for protection, remediation or other relevant regards. In such situations, we will often not be in a position to expend equal effort on everything. It will thus be in order to identify instances that are particularly poor or problematic and instances that are particularly positive or prime. The intermediary instances will be of less immediate interest since they will not support special attention for either remediation or retention. The intermediary instances are the great middle ground where multiple use interests of humanity are served under a complex of considerations and land tenure. We must unambiguously delineate the spatial instances at some selected scale and characterize them in terms of a suite of indicators that cover the concerns on at least an ordinal basis. We can then proceed to prioritize at both the poor and prime poles. Spatial proximity or adjacency among the instances of interest may or may not be a collateral concern. The special (spatial) instances at either end of the evaluations will be considered here as salient sets in the sense that they stand out in a salient manner relative to the intermediaries. To explicate the partial prioritization process and protocols we use data from a biodiversity assessment that was conducted in the state of Pennsylvania in the northeastern USA (Myers et al., 2000), whereby the state was divided at a first level into 635 km2 hexagonal cells. Pennsylvania has New York State on its northern border, Lake Erie at its northeast corner, and Delaware/Chesapeake Bay of the Atlantic Ocean at its southeast corner. Figure A depicts the physiographic character of Pennsylvania by hill-shading and Figure B shows hexagons (with their identification numbers) covering what is called the Ridge and Valley physiographic region of Pennsylvania containing the Appalachian Mountains as remnant flanks of major geological folds after eons of erosion have excavated the fractured tops of the folds. Pennsylvania has a moist temperate climatic regime with the natural vegetation cover being predominately forested. 12 Figure A. Hill-shading depiction of the physiographic characteristics of Pennsylvania. Figure B. Hexagonal zones in Ridge and Valley Region of Pennsylvania. 13 Several indicators of biodiversity were evaluated for each hexagon during the GAP Analysis biodiversity assessment program. We select five of these indicators for present purposes. Four of these are considered to be positive indicators: 1) number of bird species expected to have viable breeding populations; 2) number of mammal species expected to have viable breeding populations; 3) variability of elevation; and 4) percent of forest cover. A fifth indicator has the negative sense of extensive disturbance, being the percent of the hexagon in one non-forest extent when forest patches covering less than one square kilometer are ignored. The four positive indicators are listed by hexagon in Table A. The purpose here is to find an area consisting of several adjacent hexagons that is prime relative to the suite of indicators without collapsing indicator dimensions in some composite manner. We thus wish to honor all of the dimensions of indications simultaneously. We use the negative indicator of openness as a coupling criterion here. Among the available adjacencies, we intend to give preference where the candidate pairs do not exhibit high degree of openness. The prime area is to be assembled progressively by first finding salience relative to the four positive indicators, and then examining neighbors according to both positive indicators and avoidance of openness. The necessary computational capabilities have been configured as modules in the nonproprietary R statistical software system (Venables & Smith, 2004). The logic of the prioritization process rests on dual ordination of subsets segregated based on both domination and subordination in terms of the mathematics of partial ordering (Patil & Taillie, 2004). One unit or case of interest (hexagon) dominates another if it is as least as good on all indicators and better on at least one indicator (Myers et al., 2006). Conversely, one unit (case) is subordinate to another if it is no better on any indicator and is worse on at least one indicator. While the domination and subordination views are related, they are not equivalent inasmuch as they do not necessarily extract identical subsets. Subsets of domination (or lack thereof) are segregated recursively in terms of numbered levels. For level number 1 the subset consists of all units of interest (cases) that are not dominated by any other unit. Domination level 2 is obtained by removing units in the level 1 subset from consideration, and then segregating all units among the remainder that are not dominated by any other units among the remainder. Subsets for subsequent levels are extracted likewise until there are no dominations among the remainder. Since remaining units for successive levels are increasingly dominated by members of prior subsets, increasing level numbers for domination reflect increasing inferiority in the sense of greater consensus on inferiority across indicators. Thus, a level 1 unit might be undominated by virtue of having a maximal value on one indicator, but still have a relatively poor value on one or more other indicators. However, higher level numbers are increasingly characterized by lack of good values on any indicators. It can thereby be seen that the process gives equal voice to all indicators without formulating a composite metric in any manner. U U 14 Table A. Conservation characteristics for hexagonal zones in the Ridge and Valley Region for Pennsylvania prototype (from RVhexs.txt file). ZoneNum BirdSp MamlSp TopoVarI PctForst 2409 130 45 89 80.8 2410 128 43 105 85.4 2529 133 45 103 74.3 2530 123 45 83 82.5 2649 127 47 65 66.0 2650 120 46 56 69.8 2651 121 46 62 62.5 2652 129 45 70 80.1 2771 135 47 81 48.9 2772 130 46 54 47.5 2773 122 46 101 76.6 2774 126 46 80 77.1 2894 126 47 114 89.4 2895 135 47 130 84.7 2896 123 49 114 75.4 2897 129 48 114 83.2 3019 133 53 102 59.3 3020 136 51 117 68.7 3021 122 46 110 89.8 3022 128 47 123 87.0 3023 118 46 110 74.4 3145 131 50 94 67.4 3146 128 48 103 68.8 3147 129 48 94 69.7 3148 120 47 92 78.1 -----------------------------------------------------------------------------------------------------------U U U U U U U U U Subsets for levels of subordination unfold in a parallel manner, but with an opposite sense of quality. The level one subset consists of units (cases) that do not have any other units that are clearly inferior in the sense of being no better on any indicator and being worse on at least one indicator. Level two applies the same criterion to the remainder after removing level 1 from consideration. Thus increasing level of subordination reflects more and more units in lower levels that are clearly inferior according to the subordination view. Thus increasing level number for subordination can be interpreted as increasing consensus on superiority. U U A joint view breaks the subsets from separate views down into subsets of units (cases) having a particular level of domination coupled with a particular level of subordination. We refer to these jointly classified sets as being salience sets (Myers & Patil, 2008). A plot is then generated of instances of salience sets with domination level on the abscissa (X- or horizontal axis) and subordination level on the ordinate (Y- or vertical axis). Instances plotting in the upper-left are preferred by virtue of high superiority and low inferiority, whereas instances in the lower-right exhibit low superiority and high inferiority. Instances plotting on an upper-left to lower-right diagonal are consistent with regard to the two views. Greater departure from this diagonal implies that the messages conveyed by the indicators are more mixed. Where primary interest lies in the best of the best or the worst of the worst, such a graphic is highly appropriate for making selections. U U U 15 U Since the logic of prioritization according to salience rests only on the orderings or rankings of the cases on the respective indicators, this is a non-parametric approach involving only sorting and ordering. Accordingly, it is expedient as well as lending clarity to the subsequent aspects of the process if the data on indicators are converted to ranks and comparative computations then done on the ranks. The first stage of the process is to determine preferred patches to anchor the building of the network consisting of patches and patch pairings. This initial determination is based entirely on the data in Table A without regard to the data on pairings. The resulting salience plot is presented in Figure C, and the salience data ratings for making the plot are listed in Table B. From the plot of salience in Figure C it can be seen that preference should be to use patches (hexagons) having subordinance level of 4 and dominance level of 1 as initial anchors for the prospective network. Accordingly, it can be seen in Table B that the hexagons having these ratings are 2897 and 3020 which also happen to be neighbors. Thus the initial core of for the prospective network is as shown in Figure D. Figure C. Graph of salience sets for prioritization of initial hexagons. 16 Table B. Membership of hexagons in salience sets. Hexagon Dominance Subordinance 2409 2 2 2410 2 1 2529 2 1 2530 2 1 2649 3 2 2650 3 1 2651 4 1 2652 3 1 2771 2 2 2772 3 1 2773 2 2 2774 2 2 2894 1 3 2895 1 3 2896 1 2 2897 1 4 3019 1 2 3020 1 4 3021 1 3 3022 1 3 3023 2 1 3145 2 3 3146 2 3 3147 2 3 Figure D. Initial units according to prioritization by salience. Having obtained the initial units by the prioritization protocol, we can proceed to do a first stage expansion through suitable modification for selection of pairings. The major modification consists of allowing only those hexagons 17 bordering 2897 or 3020 to become candidates. With considerably fewer candidate linkages than hexagons, there are only two levels on the salience axes as given in Table C. However, there is still a definite segregation of preference for linking numbers 2896 and 3021 into the network with both of these linkages connecting to 2897. Thus the firststage expansion map is as depicted in Figure E. Subsequent expansion of the network is a matter of repeating this scenario until the hexagons that would be linked into the network fall into the less desirable positions of Figure C. Of the two elements just added, both are in the most favorable column with regard to dominance. Unit 3021 is in the second best position with respect to subordination, and 2896 is one step below that Table C. Membership of first-stage linkages in salience sets. Hexagon Dominance Subordinance 2774 1 1 3019 1 1 2896 2 1 2896 1 2 3021 1 2 3145 1 1 3021 2 1 Figure E. Map of developing network after first-stage expansion. 18 4.2 Pattern Extraction Pattern extraction from multivariate environmental information is notably important in our biodiversity work from two standpoints. One of these pertains to the pursuit of prioritization with multiple indicators as above. The computations involved in partial ordering are highly iterative and recursive, which can lead to combinatorial constraints on practicality as the number of cases (instances) increases from a few hundred to thousands. One way of coping with these computational challenges is to use multivariate pattern extraction techniques to obtain collectives of cases that have similar patterns of indicator values. The collectives can then be treated comparatively for salience in terms of central values of indicators for the cases comprising a collective. Cases comprising the salient collectives can then be further prioritized among individuals in a multi-stage modality. Pattern extraction as strategically chosen clustering techniques is thus used as a kind of data compression. In so doing it is essential that emphasis be placed on obtaining a high degree of homogeneity within clusters as opposed to obtaining fewer clusters that have large intercluster differences. The other major role for pattern extraction is in its signal processing sense for provisional partitioning of landscapes into mosaics from remotely sensed multispectral data. Placement of spatial instances of the spectral patterns on the landscape provides a point of departure for comparative landscape ecological investigation of habitat diversity. The image data are thus progenitors for pattern maps of the landscape, with patterns providing a further framework analysis of attributes at multiple scales. Without such a quasi-intelligent spatial organization of the landscape, it becomes quite difficult to conduct synoptic investigations and to detect secondary patterns of change in the landscape over time. Landscape change dynamics determine sustainability of ecological diversity. Our poly-pattern approach to landscapes through remote sensing (Figure F) provides an adaptable image model that serves purposes well beyond the initial images. Figure F. Pattern-based view of a landscape in Jalgaon, India from remote sensing. The pattern picture of the landscape in Figure F would not be possible to render in this manner by standard remote sensing techniques such as color infrared (CIR) composites. It draws information from all bands, not just three bands, giving contrasts and coloration not otherwise possible. Coloring healthy vegetation green is much more easily understandable by the public than a reddish rendering in CIR. In pattern mode, each pattern can be rendered selectively without affecting the rendering of other patterns. This is more informative and gives greater distinctions among landscape elements than classical remote sensing methods of enhancement. Furthermore, the areas occupied by each individual color casting can be easily extracted since this is a map of image information rather than a 3-band composite. Pattern-Based Compression of Multi-Band Image Data for Landscape Analysis (Myers & Patil, 2006 – Springer) offers a guidebook for landscape pattern perspectives from remote sensing. 19 5. Digital Governance and Hotspot Geoinfomatics with Forest Cover Data 5.1 Introduction Impact of modern communication and information technologies on the society in various ways cannot be overstated. Its recognition is reflected in emergence of digital governance worldwide. Purpose of digital governance, stated variously, is to empower public for information access and analysis to enable transparency, accuracy, and efficiency for societal good at large. In this context, development and applications of methodologies for geoinformatic hotspot analysis of spatial and temporal data are of utmost importance. . Patil and Taillie (2004) proposed the upper level set (ULS) scan statistic. Patil et al. (2008a) report software implementation of the ULS scan statistic. The ULS scan statistic and its software implementation differ from the widely used SaTScan system in three main respects: • The ULS scan statistic uses an irregularly shaped scanning window unlike the circularly shaped window used by SaTScan • The ULS scan statistic can be used to detect hotspots in any structure with the network topology whereas SaTScan is applicable to geospatial regions only. • The software provides an option of the use of the gamma distribution to model response data that are of continuous nature in addition to the binomial and Poisson models. The second item in the above list seems to be quite significant in view of wide interest in hotspots in a network setting such as sensor networks [Patil et al. (2008b)]. In addition to the responses that can be modeled using binomial, Poisson, and gamma distributions, there is a need for a model that can handle continuous fraction responses. In this section, we present a model that can be used to analyze this kind of data for hotspot detection. The beta distribution is a natural choice for modeling continuous fractional data. But because of its lack of additive property, it is not suitable for generating simulated replications of data which are essential for computing p-values. We propose a suitable transformation of the data so that the gamma distribution serves as a reasonably good approximate model. Software reported in Patil et al. (2008a) has now the capability to process continuous fractional data. We illustrate use of the software and viability of the proposed model to detect hotspot with forest cover data. The forest cover may be seen at times to be a biodiversity indicator. 5.2 Scan statistics for geospatial hotspot detection. We have the following: R: A geographical connected region, T: A set of ‘cells’ forming a partition or tessellation of R, N: cardinality of T, n1, n2, …, nN: ‘sizes’ of the N cells, and y1, y2, …, yN: responses of interest for the N cells. Here y1, y2, …, yN are assumed to be a particular realization of independently distributed random variables Y1, Y2, …, YN that have distributions with a common form but with different parameter values that account for cell-to-cell response variation. Interpretation of size of a cell depends on the context. For example, if Y1, Y2, …, YN are binomial random variables, then n1, n2, …nN are respective numbers of trials. If Y1, Y2, …, YN have Poisson distribution, where, for a = 1, 2, …, N, Ya represents the number of events of a given type that occur at random in cell a with intensity λa, then n1, n2, …nN are areas of the N cells and E[Ya] = naλa. In general, for a = 1, 2, …, N, ya/na is the response rate or the intensity of the response for cell a. The spatial scan statistic seeks to identify “hotspots” or clusters of cells that have elevated response or, more precisely, elevated response rates compared with the rest of the region. Clearly, we are interested in the responses adjusted for cell sizes rather than in raw responses. It is possible that adjustment for some other characteristic such as gender or age is meaningful in some studies. Given a cluster of cells, C, the response rate for the cluster is the ratio: 20 ∑Cya / ∑Cna where ∑C indicates summation over cells a belonging to the cluster C. This suggests that we assume that the parametrized family of distributions of Y1, Y2, …, YN is additive. In addition, a cluster of cells to be considered as a potential hotspot or a candidate hotspot needs to satisfy two geometrical properties: (1) Cells within the cluster should be connected, that is, any two cells a1, a2 in the cluster should be adjacent to each other or there should be a sequence of cells b1, b2, … bk, k ≥ 1, all inside the cluster, such that a1 is adjacent to b1, a2 is adjacent to bk, and any two successive cells in the sequence are adjacent to each other. Such a cluster of connected cells will be called a zone. The set of all zones in R will be denoted by Ω. This requirement merely says precisely that two disjoint clusters with significantly elevated responses would constitute two distinct hotspots. (2) The zone should not be excessively large so that the complement of the zone rather than the zone itself would constitute the background. This is achieved by limiting the search for hotspots to zones that do not comprise more than a certain threshold percentage, say, fifty percent of the entire region in size. The process of hotspot detection then involves testing for each eligible zone in Ω the null hypothesis that its response rate is the same as that of the rest of the region, that is, the zone is not a hotspot, against the alternative hypothesis that its response rate is higher in comparison with that of the rest of the region. We conclude that there is no hotspot if the null hypothesis is not rejected for each eligible zone. This hypothesis testing model is formulated precisely as described below with the binomial response model used for illustration. Under the binomial response model, each Ya is Binomial (na, pa), 1 ≤ a ≤ N and Y1, Y2, …, YN are independently distributed. Then the null hypothesis that there is no hotspot, that is, response rates for all cells are equal is stated as: H0: p1 = p2 = … = pN = p0, say, against the alternative hypothesis that there exists a non-empty zone Z in Ω for which the response rate is higher than that for the rest of the region. Formally, the alternative hypothesis is: H1: There is a non-empty zone Z ε Ω and values 0 ≤ pnz , pz ≤ 1 such that ⎛ pz for all cells a in Z pa =⎨ ⎝ pnz for all cells a in R – Z, and pz > pnz. The zone Z specified in the alternative hypothesis is an unknown parameter along with pz and pnz. Thus the full model involves three parameters: Z, pz, and pnz with Z ε Ω and H0 implying Z = Φ. For testing the null hypothesis using the likelihood ratio, under the assumption of the binomial response model, the maximum likelihood estimates (MLEs) p̂0 , p̂ z , and p̂nz of p0 under H0 and of pz and pnz for a given zone Z under H1 are readily obtained as respective response ratios so that the likelihood functions L0( p̂0 ) and L1(Z, p̂ z , p̂nz ) are available. Our objective is to maximize L1(Z, p̂ z , p̂nz ) as Z varies over Ω, that is to compute the MLE of Z. If the ratio of the maximized L1(Z, p̂ z , p̂nz )/ L0( p̂0 ) is significantly high then MLE of Z is declared as a hotspot. However, Ω is generally so large that its size makes it impractical to maximize L1(Z, p̂ z , p̂nz ) as Z varies over Ω by exhaustive search. One approach to obtain an approximate solution to the maximization problem is to replace the original parameter space Ω by a smaller, more tractable subset Ω0 of Ω, and maximize L1(Z, p̂ z , p̂nz ) as Z varies over Ω0 by exhaustive search. Success of this approach of reduction of the parameter space depends on how well the reduced parameter space Ω0 brackets the MLE over full Ω. Widely used SaTScan software [Kulldorff (2006)] uses Ω0 = ΩSatScan obtained as the set of zones covered by a collection of series of expanding concentric circles with centers at centroid of each cell. It may do a poor job of detecting actual hotspots that are not quite compact. Below we review the ULS scan statistic, an alternative to the circular scan statistic as developed by Patil and Taillie (2004).that depends on the data and takes care of connectedness of clusters using adjacency. It is based on the concept of the upper level set (ULS) tree. For comparative nature of the circular scan statistic and the ULS scan statistic, the reader is referred to Patil and Taillie (2004). Because of their wide applicability in epidemiology, the above two models are implemented in SaTScan which has large following. Continuous response models have received relatively much less attention. Patil and Taillie (2004) discuss an approach to modeling continuous response distribution with gamma and lognormal as illustrations. 21 5.3 Continuous Fractional Response Model and Forest Cover Data Analysis The gamma model discussed in the previous section is applicable in hotspotting when continuous responses are positive valued and additive in nature, a situation that occurs quite frequently. Another situation with continuous responses that occurs frequently in practice is when they are between 0 and 1 when it seems plausible to postulate Y ~ beta(α, β) with the pdf: α-1 fY(y; α, β) = Γ(α+β)/(Γ(α)Γ(β))y (1-y)β-1, 0 < y < 1, where α > 0, β > 0 where Y represents a typical cell response. However, the beta family does not possess the additive property. Hence, to begin with, we propose the transformation: X = Y/(1-Y). X has the beta distribution of the second kind (Patil et al 1984) with the pdf α-1 fX(x; α, β) = Γ(α+β )x α+ β /(Γ(α)Γ(β) (1+x) ), 0 < x, where α > 0, β > 0 We note that this distribution also arises as a mixture of the gamma distribution gamma(k, α) on parameter k where 1/k ~ gamma(1, β) [Patil et al. (1984)]. In the absence of availability of an exact model with properties in conformance of our guiding principles, it appears reasonable to approximate the exact model, namely the mixture of the gamma distribution, with a straight gamma distribution that satisfies our criteria. Thus we propose to treat Y/(1-Y) as gamma(k, β). In many situations with continuous fractional response the beta distribution of the first kind (Patil et al 1984) rather than the standard beta may be more applicable. The beta distribution of the first kind with parameters r, s, α, and β has the pdf fY(y; r, s, α, β) = Γ(α+β)/(Γ(α)Γ(β))(y-r) α-1 (s-y)β-1/(s-r) α+β-1, r < x <s, where 0 ≤ r < s ≤ 1, α > 0, β > 0 The simple transformation, Y’ = (Y – r)/(s – r), takes us to the beta scenario. However, r and s are typically unknown. Hence, to be able to deal with the beta distribution of the first kind using the technique developed for the standard beta distribution, one may use the transformation Y’ = (Y – r̂ )/( ŝ – r̂ ), ..................................................................................................... (1) where r̂ and ŝ are reasonable estimates of r and s, respectively. For our purpose, we will use ymin and ymax for r̂ and ŝ , respectively, where ymin = min { ya | a ε T }, and ymax = max { ya | a ε T } mostly because of computational ease. In Section 5.4, we will describe an application of the continuous fraction response model to Jalgaon district forest cover data using software implementation of the ULS scan statistic described in detail in Patil et al. (2008a). Its current version is able to handle the continuous fraction response model in addition to binomial, Poisson, and gamma models. Results reported in Section 5.4 indicate that application of the gamma model to approximate the beta model of the second kind is a viable technique to do hotspot detection with data, where the beta model is appropriate. 22 5.4. Jalgaon District Forest Cover. Table 2 shows Jalgaon district (Maharashtra) forest cover 2001-02 data by tehsil 1 . F F Table 2. Jalgaon District Forest Cover Serial Number 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Total Tehsil Name Geographical Area (Hectares) 844.15 484.53 413.38 398.77 1217.63 954.36 463.53 646.11 511.03 825.07 1360.72 820.41 791.21 935.70 954.38 11620.98 Amalner Bhadgaon Bhusawal Bodvad Chalisgaon Chopda Dharangaon Edlabad Erandol Jalgaon Jamner Pachora Parola Raver Yawal Forest Area Forest Cover (Hectares) 21.90 78.49 29.60 56.37 121.11 162.13 19.47 132.57 24.26 142.68 155.72 72.46 98.59 264.05 308.18 1687.58 0.02594 0.16199 0.07160 0.14136 0.09946 0.16988 0.04200 0.20518 0.04747 0.17293 0.11444 0.08832 0.12461 0.28220 0.32291 0.14522 We intend to determine if some cluster of tehsils can be considered as a hotspot based on the data in the last column of the table, using the ULS software (referred to as the ‘program’ hereafter) mentioned above. Figure 3 shows the contents of the input file to the program, which are described below : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.025943257 0.161992034 0.071604819 0.141359681 0.099463712 0.169883482 0.042003754 0.205181780 0.047472751 0.172930782 0.114439414 0.088321693 0.124606615 0.282195148 0.322911209 5 4 3 2 1 0 0 2 1 2 2 1 0 2 2 6 8 7 7 11 6 5 3 6 5 3 4 1 7 5 12 11 9 10 12 9 8 13 9 6 9 8 4 14 9 12 10 13 14 14 9 12 11 8 11 9 6 12 10 11 14 10 8 13 Figure 3. Input File for Jalagaon Forest Cover Data 23 The program requires that each cell in the region be identified serially as 0, 1, 2, … with one line of input for each cell in the data file in that order. The first entry in a given line is the cell identifier, the second entry is the ‘size’ of the cell, the third entry is the response. Remaining entries in the line are identifiers of cells that are adjacent to the cell mentioned at the beginning of the line. For the fractional continuous data as in the current case, the model to be used is specified by the user as ‘beta’. For the beta model, each cell size needs to be input as 1 and the response is assumed to be between 0 and 1. The program automatically unitizes data using the transformation (1) above. However, it is necessary to adjust the data values 0 and 1 so that computed probability densities are not zero. The program replaces the zero data value by u/2 and the unit data value by (1 + v )/2, where u is the smallest non-zero data value and v is the largest non-unit data value after unitization. Unitized data values are further subjected to the transformation y/(1-y) before application of the gamma model. The program also allows the user to specify the threshold percentage to limit the size of a hotspot relative to the total size of the entire region. We ran the program five times with five threshold values of 10%, 20%, 30%, 40%, and 50%. Results of the five runs are summarized in Table 2. A map of Jalgaon district identifying each tehsil and the hotspot consisting of three tehsils as detected by the program are shown in Figure 4. Table 3. Jalgaon District Forest Cover Hotspots Threshold % 10 20 30 40 50 Member Count 1 3 3 3 3 Member Tehsils 14 7,13,14 7,13,14 7,13,14 7,13,14 p-value 0.180 0.040 0.049 0.083 0.106 Incidentally, tt may be worth noting that the zone consisting of tehsils 7, 13, and 14 happens to have the maximum likelihood value for threshold values of 20%, 30%, 40%, and 50%, however, with different p-values. This situation is explained by the fact that, when we increase the threshold, we increase the set of competing candidate zones and the maximum likelihood values occurring in the simulated samples exceed the likelihood of the top candidate zone as per actual data set more often. On the other hand, with the threshold of 10%, the p-value of the zone = {14} is greater than that of zone = {7, 13, 14} when the threshold is 0.20%. This is due to a greater probability of a high response over a small area purely by chance. We conclude that choice of the threshold is an important consideration in hotspot analysis from the point of view of the manager responsible for making practical decisions. Figure 4. The shaded area is a hotspot at 5% level for thresholds of 20% and 30% 24 The three tehsils making up the hotspot in Figure 4 are Yawal, Raver, and Edlabad (now known as Muktainagar). All the three tehsils are located in the Satpuda mountain region and are known for their forest. Appropriately they have been identified as a hotspot. More importantly, the model presented in the paper asserts through the p-value the degree to which the tehsils stand out as forest covered areas within the district. 6. A Unique Prototype Novel and Innovative District Level Initiative to Help Restore and Enhance Agriculture, Biodiversity, Nature Conservation, and Diverse Eco-Cultural Community Development A district level watershed surveillance and research institute (JalaSRI) : This is now functional in the spirit of triadic digital governance and hotspot geoinformatics for natural resource monitoring, etiology, early warning, and sustainable management, with emphasis on model watershed, rural entrepreneurial youth brigades, appropriate smart sensor networks, etc., to help with improved restoration, enhancement, and impact assessment in response to district level linking of small rivers and streams as a vehicle for monsoon rainwater harvesting and management in the face of water scarcity. This is to provide a shot in the arm to restore and enhance agriculture, biodiversity, nature conservation, drinking water, and community life. The Jalgaon district model of digital governance in this context is in progress to be a prototype model, bringing together academia, agencies, and communities at the district level to innovatively help improve the synergistics of present day science and technology and local wisdom for watershed assessment, development, and sustainable livelihood. JalaSRI is a leading partner together with the district Collectorate, watershed communities, and others, such as, international crop research institute for semi-arid tropics (ICRISAT). Model Watershed Development : Most rainfed areas in the tropical developing world faces water scarce situation even during the crop growing monsoon season. Through community watershed management, scarce water resources can be conserved through rainwater harvesting and use efficiently for enhancing agricultural productivity, improving livelihoods and minimizing land degradation. Through a project sponsored by Ministry of Agriculture, Government of India, a model watershed of 1000 ha is being established in Jalgaon district by adopting number of collective action, convergence, capacity building and consortium approach for harvesting of rainwater, efficient use for increasing agricultural productivity, improving livelihoods through income generating activities and building sustainability through capacity building of all the stakeholders. Dr. SP Wani, Principal Scientist (Watersheds) and Regional Theme Coordinator, GT-Agroecosystems, Asia from International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), is leading this project. With ICRISAT and JalaSRI as close collaborators, Dr. SP Wani is building partnership between JalaSRI and ICRISAT with an intention to share the experiences of ICRISAT as well as of JalaSRI not only in the district but with other organizations working in the area of water management through developing tropical conditions. Digital Governance and Hotspot Geoinformatics : The NSF DGP project, with Dr. G.P. Patil as the “Principal Investigator”, has been instrumental to conceptualize surveillance geoinformatics partnership among several interested cross-disciplinary scientists in academia, agencies, and private sector across the nations. Under his able leadership, JalaSRI efforts are driven by a wide variety of case studies of interest to agencies, academia, and private sector involving critical societal issues, such as public health, ecosystem health, ecohealth, financial health, biodiversity and threats to biodiversity, emerging infectious diseases, water management and conservation, persistent poverty, environmental justice, social networks, sensor networks, energy conservation, early warning, and disaster management. It involves research of space-time diseases, poverty, pollution, object identification and tracking, early detection, early warning, hotspot trajectories and trends. River Connectivity Project in Jalgaon District : The monsoon water may not be flown away wasted but through river connectivity it may be used for drinking as well as irrigation and other purposes. The district administration in Jalgaon under the versatile collaborative leadership of district executive engineer VD Patil and district collector Vijay Singhal, IAS, M.Tech. IIT, Delhi have implemented connectivity of rivers and streams. Four existing canals were used and repaired and their capacities were also enhanced. 25 Existing natural nalas, river beds were used to a great extent and some additional canals and channels were also dug. For this work, 2 crores of rupees from the scarcity fund were obtained from the Government. These 2 crores of rupees saved at least nine crores of rupees of drinking water alone which would have otherwise been spent on tankers in supplying drinking water. Thousands of hectares of area have come under irrigation plus there is reduction in water losses because of repairs of canals. Total benefits received by agriculturalists have ranged between 45-50 crores. This has helped improve the economic condition of farmers’ community, prohibiting migration of local and poor people to other areas in search of jobs. People in the areas are now well versed with the modalities of the project and have much benefited by the results. Such is the success story in place for connectivity of rivers in the Jalgaon district. It has had its impact on the potential for restoration and enhancement of biodiversity, in conjunction with the potential for reductions in threats to biodiversity. Biodiversity and Habitat Conservation Working Group : Under the leadership of Dr. Gauri Rane, JalaSRI has since its inception an active biodiversity and habitat conservation group. Dr Rane has received her training in Pune, Penn State, and Dehradun. The activities of the Group are directed to study the topography and the biodiversity of the Jalgaon district forest, to identify biodiversity rich areas and their status in the study area, to list out endemic, endangered, threatened species and medicinal plants of the forest, to observe geographical distribution of plants and animals and distributional pattern of the species, to suggest potential site for corridor building, for the comeback of the tiger and to engage in agroforestry for the district irrigation schemes. JalaSRI has some fifteen Working Groups across the spectrum relevant at district level. Interestingly, JalaSRI has its anthem, a powerful anthem. One part speaks of : To conserve bio-diversity Is a necessity Nature has a whole lot of purpose In its variety Promote the culture of embracing nature Take to the task of shaping our future Geo-Informatics Environmental statistics Public health officials And social scientists Working together all in accordance To realise the dream of digital governance For more equally exciting information inclusive of JalaSRI on Stage Dance Drama, see Patil et.al (2008). May this Jalgaon district JalaSRI prototype example be instructive and inspirational to districts of similar makeup in Maharashtra, in India, and in the world. The following diagrams may be suggestive : 26 Innovative and Unique Prototype An Innovative and Unique Prototype District Level River Linking Initiative 16 17 Project Presentation by Vijay Singhal, (IAS) Collector Jalgaon to to Hon’ Hon’ble H.E. Smt. Pratibhatai Patil, President of India and Hon’ Hon’ble Sh. Sharad Pawar, Union Minister for Agriculture. 18 District Level River Linking Field Trip under the Leadership of V.D. Patil, District Executive Engineer Model Watershed, Ideal Watershed Jalgaon, MS, India In the River Linking Project Area One thousand Hectares ( approx 2500 Acres) JalaSRI, District Collectorate, Collectorate, and ICRISAT (CGIAR/World Bank) Permanent Field Work Station for Sustainable Livehood, Livehood, Youth Investment, Sensor Network, Digital Governance and Hotspot GeoInformatics, Degree Programs in GeoInformatics 19 27 8. References Grassle, F., G.P. Patil, W. Smith., and C. Taillie. (1979). Ecological Diversity in Theory and Practice. International Co-operative Publishing House, Fairland, MD. Gove, J., Patil, G.P., & Taillie, C. (1994). A mathematical programming model for maintaining structural diversity in uneven-aged forest stands with implications to other formulations. Ecological Modelling, 79, 11-19. Gove, J. H., Patil, G. P., and Taillie, C. (1994). A mathematical programming model for maintaining structural diversity in uneven-aged forest stands with implications to other formulations. Ecological Modelling 79, 11-19. Johnson, G.D. & Patil, G.P. (2006). Environmental and Ecological Statistics Series: Volume 1: Landscape Pattern Analysis for Assessing Ecosystem Condition. New York, NY: Springer. Kulldorff, M. (1997). A Spatial Scan Statistic, Communications in Statistics: Theory and Methods, 26(6), 1481--1496. Kulldorff, M. (2006). SaTScan™ v 7.0: Software for the spatial and space-time scan statistics, Information Management Services Inc., Silver Spring, MD Kulldorff, M., Nagarwalla, N. (1995). Spatial Disease Clusters: Detection and Inference, Statistics in Medicine, 14, 799--810. Kulldorff, M., Rand, K., Gherman, G., Williams, G., and DeFrancesco, D. (1998). SaTScan v 2.1: Software for the spatial and space-time scan statistics, National Cancer Institute, Bethesda, MD. Myers, W.L. and G.P. Patil (2006). Biodiversity in the Age of Ecological Indicators, Acta Biotheoretica, 54, pp. 119-123. Myers, W., J. Bishop, R. Brooks and G.P. Patil (2001). Composite spatial indexing of regional habitat importance. Community Ecology 2(2): 213–220. Myers, W. & Patil, G.P. (2006). Environmental and Ecological Statistics Series: Volume 2: Pattern-based Compression of Multi-band Image Data for Landscape Analysis. New York, NY: Springer. Myers, W., Bishop, J., Brooks, R. and Patil, G. P. (2001). Composite Spatial Indexing of Regional Habitat Importance. Community Ecology 2(2): 213-220. Myers, W. and G. P. Patil. 2008. Semi-subordination sequences in multi-measure prioritization problems. Chapter 7 in: R. Todeschini and M. Pavan, Eds. Ranking Methods: Theory and Applicatons, Volume 27 of Data Handling in Science and Technology. Amsterdam: Elsevier. Myers, W., G. P. Patil and Y. Cai. 2006. Exploring patterns of habitat diversity across landscapes using partial ordering. In: R. Bruggemann and L. Carlsen, Eds. Partial Order in Environmental Sciences and Chemistry. Berlin: Springer. Pp. 309-325. Myers, W., J. Bishop, R. Brooks, T. O’Connell, D. Argent, G. Storm and J. Stauffer, Jr. 2000. The Pennsylvania GAP Analysis final report. The Pennsylvania State University, Univ. Park, PA 16802. Myers, W., J. Bishop, R. Brooks and G. P. Patil. 2001. Composite spatial indexing of regional habitat importance. Community Ecology 2(2): 213-220. 28 Patil, G.P. & Taillie, C. (1982). Diversity as a concept and its measurement. Journal of the American Statistical Association, 77, 548-567. (Invited discussion paper). Patil, G.P. & Taillie, C. (2004a). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 11, 183-197. Patil, G.P. & Taillie, C. (2004b). Multiple indicators, partially ordered sets, and linear extensions: Multi-criterion ranking and prioritization. Environmental and Ecological Statistics, 11, 199-228. Patil, G.P. (2002). Diversity profiles. Technical Report 2001-0206. Also in: A. El-Shaarawi and W.W. Piegorsch (Eds). Encyclopedia of Environmentrics. Wiley, pp. 555-61. Patil, G.P. (2001). Statistical ecology and environmental statistics. Technical Report 2001-0401. Also in: Jeff Wood (Ed). Encyclopedia of Life Support Systems. EOLSS Publisher. United Nations Project. Patil, G. P. and Taillie, C. (1979a). An overview of diversity. In Ecological Diversity in Theory and Practice 5, (eds. J. F. Grassle, G. P. Patil, W. K. Smith and C. Taillie), 3-27. Fairland, Maryland, USA: International Co-operative Publishing House. Patil, G. P., and Taillie, C. (1979b). A study of diversity profiles and orderings for a bird community in the vicinity of Colstrip, Montana. In Contemporary Quantitative Ecology and Related Ecometric, (eds. Patil, G. P. and Rosenzweig, M.), 23-48. Burtonsville, MD, USA: International Co-operative Publishing House. Patil, G.P. (2007). Statistical geoinformatics of geographic hotspot detection and multicriteria prioritization for monitoring, etiology, early warning and sustainable management for digital governance in agriculture, environment, and ecohealth, Journal of Indian Society of Agricultural Statistics, 61, 132--146. Patil, G.P., Boswell, M.T., and Ratnaparkhi, M.V. (1984). Dictionary and Classified Bibliography of Statistical Distributions in Scientific Work. Vol. 2: Univariate Continuous Models, International Co-operative Publishing House, Burtonsville, MD. Patil, G.P., Taillie, C. (2003). Geographic and network surveillance via scan statistics for critical area detection, Statistical Science, 18(4), 457--465. Patil, G.P., Acharya, R., Glasmier, A., Myers, W., Phoha, S., and Rathbun, S. (2006a). Hotspot detection and prioritization geoinformatics for digital governance, In Digital Government: Advanced Research and Case Studies, (Eds., H. Chen, L. Brandt, V. Gregg, R. Traunmuller, S. Dawes, E. Hovy, A. Macintosh, C. Larson), Springer Publishers, US. Patil, G.P., Modarres, R., Myers, W.L., and Patankar, P. (2006b). Spatially Constrained Clustering and Upper Level Set Scan Hotspot Detection in Surveillance GeoInformatics, Environmental and Ecological Statistics, 13, 365—377. Patil, G.P., Acharya, R., Myers, W., Phoha, S., and Zambre R. (2007). Hotspot Geoinformatics for detection, prioritization, and security, In Encyclopedia of Geographical Information Science, (Eds., S. Shekhar and H. Xiong), Springer Publishers. Patil, G.P., Acharya, R., and Phoha, S. (2007). Digital governance, hotspot detection, and homeland security, In Encyclopedia of Quantitative Risk Analysis, Wiley, New York. Patil, G.P., Acharya, R., Modarres, R., Myers, W.L., and Rathbun, S.L. (2007). Hotspot geoinformatics for digital government. In Encyclopedia of Digital Government, Volume II, (Eds. Ari-Veikko Anttiroiko and Matti Malkia), 919 29 Patil, G.P., Joshi, S.W., and Rathbun, S.L. (2007). Hotspot geoinformatics, environmental risk, and digital governance, In Encyclopedia of Quantitative Risk Analysis, Wiley, New York, 927, Idea Group Reference, Hershey, PA. Patil, G.P., Joshi, S.W., Myers, W.L., and Koli, R.E. (2008a). ULS Scan Statistic for Hotspot Detection with Continuous Gamma Response, In Joe Naus Volume, (Eds. Glaz, Joseph et al.), Birkhauser, Boston, MA (in press). Patil, G.P., Patil, V.D., Pawde, S.P., Phoha, S., Singhal, V., and Zambre, R. (2008b). Digital governance, hotspot geoinformatics, and sensor networks for monitoring, etiology, early warning, and sustainable management, In Geoinformatics for Natural Resource Management, (Ed. P.K. Joshi), Nova Science Publishers, New York (in press). Patil, G.P., Pawde, S.P., Rane, G.M., Zambre, R.A., Wani, S.P., Paranjape, Jhelum. (2008). A Picturesque Informative Pamphlet : District Level Watershed Surveillance and Research Institute JalaSRI. Penn State CSEES TR 2008-1204. Rane, Gauri.M., Pandey, Rahul.K., Bhardwaj, Jaya., Murthy, Rama, Myers, Wayne, Patil, Ganapati. (2008). Biographical and Bio-cultural Context for Collaborative Conservation and Resource Sustainability in Jalgaon District of Maharashtra, India. JalaSRI TR 2008-1015. Scott, J. M., F. Davis, B. Custi, R. Noss, B. Butterfield, C. Groves, H. Anderson, S. Caicco, F. D’Erchia, T. C. Edwards, Jr., J. Ulliman and R. G. Wright. 1993. GAP Analysis: a geographic approach to protection of biological diversity. Wildlife Monographs No. 123. Scott, J., Csuti, B., Estes, J., & Anderson, H. (1989). Status assessment of biodiversity protection. Conservation Biology, 3, 85-87. Taylor, L. R. (1978). Bates, Williams, Hutchinson – a variety of diversities. In Diversity of Insect Faunas. I. A. Mound and N. Waloff (eds), pp. 1-18. Oxford: Blackwell Scientific Publications. Venables, W. and D. Smith. 2004. An introduction to R, revised and updated. Bristol, UK: Network Theory Limited. 146 p. 30