This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. Covariate-Directed Sampling for Assessing Species Richness G.P. Patil, Glen Johnson and Matteo Grigolettol Abstract.-Since species richness is a spatially non-additive variable, it can not be estimated with conventional moment estimators. We may, however, exploit the species-area relationship which implies that the number of species increases according t o a law of diminishing returns as the sampled area increases. An efficient sampling method would then maximize the number of species encountered within a fixed amount of area that can be affordably sampled. We suggest that when spatial covariate information is available, it should be exploited for directing the location of sample units in such a manner that increases the habitat diversity observed within a minimum number of sample units. This approach was evaluated through a retrospective assessment of breeding bird species richness in Pennsylvania using cumulative tree richness as the covariate. Working with EMAP hexagons (635 square kilometers) as the primary sampling unit, we found that although tree richness was a fairly weak covariate, it still outperformed random sampling. INTRODUCTION W i t h such a strong and legitimate concern for t h e alarming loss of biodiversity on our planet (Wilson, 1988; Stevens, l995), monitoring methods are essential for quantifying t h e biodiversity of large geographic regions. W i t h all t h e shortcomings of "indices", actual species richness (the number of different species) appears t o b e t h e least controversial measure of biodiversity. Meanwhile, biodiversity researchers recognize t h a t reliable methods of estimation still require development (Yoon, 1995). Since species richness is a spatially non-additive variable, we can not estim a t e its total from conventional moment estimators, such as by multiplying a sample mean by t h e number of population units in t h e sampling frame. For this reason, we t u r n t o species-area curves, which are used by ecologists for several reasons (Kilburn, l966), including t h e prediction of species richness in larger areas t h a n those sampled (Evans, Clark and Brand, 1955). Center for Statistical Ecology and Environmental Statistics, Department of Statistics, Penn State University, University Park, PA 16801 The species-area relationship basically states that as the area within a homogeneous habitat increases, the number of different species encountered will also increase until "a point of no return", after which increasing the area does not further increase the number of different species encountered. The common model for this process, as originally proposed by Arrhenius (1921), is a power function, presented as S = kAz, where S is the species richness and A is the area, while z and k are population specific parameters. For applications to wildlife conservation issues, see Usher (1985). Although the power funct ion has been traditionally used to model the species-area relationship within homogeneous habitat, Johnson and Patil (1995) observed a classical power function response for breeding bird richness across the whole state of Pennsylvania which encompasses very heterogeneous habitat. If the power function is fit from sample data, its ability to extrapolate to a larger area is limited by upward bias since this model is unbounded (Williams, 1995). Our objective is to develop a sampling plan that maximizes the acceleration of a species-area curve towards it's plateau in order to encounter the most species within a sampled area. This is especially critical when sampling from a very large geographic area like the state of Pennsylvania since this can rapidly become a very expensive exercise. SAMPLING STRATEGIES When constructing a species-area curve from successive aggregation of discrete sample units, the usual approaches are to either combine sample units in a continuous fashion or t o combine units that are obtained at random from throughout the region of interest. When habitat is diverse across the region, random aggregation may result in a steeper curve than is obtained from continuous aggregation because spatially discontinuous sample units may encounter more diverse habitats, therefore increasing the chance of encountering different species. The expected number of species encountered in n sample units of equal size obtained at random, E[S,], can be readily computed (Kobayashi, 1979) for providing a benchmark t o compare other sampling protocols. An alternative approach to continuous aggregation or random sampling is to perform directed sampling based on values of some covariate that are readily available for the sample units. With the advent of geographic information systems,we feel that such covariate information is becoming more readily available for geographic areas at the landscape scale and above. The desired property of a covariate would be t o direct sampling in a manner that accelerates the species-area curve faster than would be observed with spatially continuous or random sampling. BREEDING BIRDS IN PENNSYLVANIA HSI Number of Species 8....5 5 t o " '73t082 .... ::: .... .. . :gi 82 to 64 t o 7 3 t o 100 91 100 t o 109 Figure 1: Bird richness in the hexagons. We evaluated the proposed approach of covariate-directed sampling through a retrospective study of a known community of Breeding birds in Pennsylvania. Our database is described in Johnson and Patil (1995). The sampling frame consists of a tessellation of Pennsylvania by hexagons, each 635 km2, of the Environmental Protection Agency's Environmental Monitoring and Assessment Program (EMAP). Associated with each hexagon are species lists for breeding birds, other vertebrate groups and trees. While other groups are based on records of occurrence, the breeding birds are based on the much more thorough Pennsylvania Breeding Bird Survey (Brauning, 1992). The distribution of species richness for breeding birds with respect to EMAP hexagons is displayed in Figure 1 in the form of a greyscale thematic map. Of the information available in our database, tree species presented the most promising covariate for choosing an optimal hexagon ordering for ultimately measuring bird species richness. We basically hypothesized that differences in bird species are likely to be associated with differences in tree species; therefore, if hexagons are chosen in an order that corresponds t o maximum acceleration of the tree species richness curve, using this same ordering will accelerate the bird richness curve. Constructing the optimal tree species richness curve was performed by choosing the first hexagon as the one containing the highest tree species richness. After noting which species were in the first hexagon, all members of these species were deleted from the remaining hexagons. The second hexagon was then chosen as the one containing the highest tree richness. Steps 1 and 2 were then repeated until all the tree O 8 $ E O 5'. 0 03 - 9 V o 0 first directed cycle mean for random sampling after first cycle mean for random sampling Hexagons Observed Figure 2: Species-area curve from tree-directed sampling for one cycle, followed by the expected value from subsequent random sampling. The expected curve from completely random sampling is also shown. species had been accounted for. We discovered that all tree species were accounted for within the first nine hexagons sampled. At this point we experimented with two techniques. The first one is to randomly sample part of the remaining hexagons. The second technique is to reintroduce all of the tree species that are within the unsampled hexagons, repeating the procedure used for the first cycle, and so on for a certain number of cycles (this is called a completely directed procedure). At each step the new bird species found in the hexagon were recorded. The completely directed procedure proved to perform somewhat better, as seen in Figures 2 and 3 which provide results over the whole state. For the protocol which sampled the first nine hexagons by tree-directed sampling, we fit the power function model, via linear regression, for additional random samples of 20, 30 and 50. The results are presented in Table 1, where extrapolation to the total area (statewide) appears to yield unacceptably high bias. Since we are using a covariate (number of tree species) for the first cycle followed by random sampling, then the number of new tree species considered increases just for the first nine hexagons, and it becomes constant thereafter (just the area increases). To take this into account, we might use a model of the form S = kAZTP,where T is the number of tree species and p is a new parameter. Such a model has been used by Rafe et al. (1985); what they call habitat heterogeneity in our case is represented by number of tree species. The extrapolated curves for both models are compared in Figure 4 for the sampling protocol of one tree-directed cycle, followed by 20 randomly chosen hexagons. 50 100 Hexagons Observed Figure 3: Species-area curve from completely tree-directed sampling, with demarcation of each sampling cycle. The expected curve from completely random sampling is also shown for comparison. Table 1: Extrapolated species richness (EX) for one tree-directed cycle (9 hexagons) plus subsequent random samples of size 20, 30 and 50. Bias equals the extrapolated minus the total known species. Sample size 29 k 36.87 0.157 EX bias 230 +52 Here we see a substantial reduction in bias from incorporating the covariate. The main drawback of a model which incorporates the covariate is that there is no trivial way to use it when covariate-directed selection of all sample units comes into play (completely directed procedure). There are no clear values of the variable T to be used in the second and the following cycles. This problem is important since we obtain the steepest species-area curve with the completely directed procedure. Besides tree species, we also had available information on (i) fish, (ii) mammals, (iii) reptiles and amphibians and (iv) butterflies and skippers. Using these covariates we reapplied the same analysis as with trees, but did not discover a stronger covariate than tree richness. Results from studying these other covariates can be found in Grigoletto, et al. (1995). DISCUSSION Most of the cycle lengths in Figure 5 vary little around 10 hexagons. Why I ____._____..__._.------------- I I I extrapolating curve for model (1) extrapolating curve for model (2) 50 100 150 Hexagons Observed Figure 4: Extrappolating curves for Model 1 ( S = kAZ) and Model 2 ( S = kAzTP). Cycle 1 Tundra Swan cycle 2 Short-eared cycle 3 cycle 4 A Sedge Wren Owl *+ 4 Swainson's Thrush S u m m e r Tanager Pine Siskin Figure 5: Observed hexagons and unencountered species after 4 cycles. does that happen? If we look at it is clear that the observed sites tend to cluster. Since we are using tree- directed sampling, this happens because if there are certain areas consisting of multiple hexagons with a high number of tree species, then in different cycles we tend to re-observe these areas. In fact, the larger clusters do contain hexagons from each of the four cycles. After four cycles, only six bird species remain unencountered. In Figure 5, we see that most of the unencountered species are rare and appear just in one, two or three hexagons. Pine Siskin is an exception since this bird is quite spread over central northern Pennsylvania. Since this region is a mostly forested area, this seems to suggest that if the "area covered by forest" could be used jointly with tree richness, we would have a stronger covariate. The primary purpose of this paper is to suggest the idea of covariatedirected sampling for achieving observational economy when sampling to estimate species richness, a spatially non-additive variable. We illustrated the approach with a retrospective study of breeding birds in Pennsylvania, where the database at our disposal provided species lists for several other communities. Of these, tree richness appeared to be the best available covariate. Although tree richness was not strongly correlated with bird species richness, tree-directed sampling did accelerate the bird species-area curve faster than was expected with completely random sampling. Therefore, if other covariates can be obtained that are more strongly correlated with bird richness, then such covariates are expected to further improve the sampling efficiency. An outstanding question that still remains is how to estimate species richness statistically, in a manner that allows construction of confidence bounds. Progress has been made (Bunge and Fitzpatrick, 1993) for the case when population densities are also measured within each sampled species; however, we were faced with presence/absence responses for each species, where the dataanalytic approach appears to be the only alternative. Acknowledgments Prepared with partial support from the United States Environmental Protection Agency, Environmental Monitoring and Assessment Program, EMAP Design and Statistics Group under a Cooperative Agreement Number CR821783. The contents have not been subjected to Agency review and therefore do not necessarily reflect the views of the Agency and no official endorsement should be inferred. REFERENCES Arrhenius, 0. 1921. Species and area. Journal of Ecology, 9:95-99. Brauning, D.W. 1992. Atlas of Breeding Birds in Pennsylvania. University of Pittsburgh Press, Pittsburgh. 484 pp. Bunge, J. and Fitzpatrick, M. 1993. Estimating the number of species: a review. Journal of the American Statistical Assoc., 88:364-373. Evans, F.C., Clark, P. J . and Brand, R.H. 1955. Estimation of the number of species present on a given area. Ecology, 36:342-343. Grigoletto, M., Johnson, G., Patil, G.P. and Taillie, C. 1995. Using CovariateDirected Sampling of EMAP Hexagons to Assess the Statewide Species Richness of Breeding Birds in Pennsylvania. Technical Report no. 951102. Center for Statistical Ecology and Environmental Statistics, Department of Statistics, Penn State University, University Park, PA. Johnson, G.D. and Patil, G.P. 1995. Estimating statewide species richness of breeding birds in Pennsylvania. Coenosis, lO(2-3):81-87. Kilburn, P.D. 1966. Analysis of the species-area relation. Ecology, 47:831843. Kobayashi, S. 1979. Species-area curves. in Ord, J.K., Patil, G.P. and Taillie, C (eds.), Statistical Distributions in Ecological Work, pp. 349-368. International Co-operative Publishing House, Fairland, Maryland. Rafe, R.W., Usher, M.B and Jefferson, R.G. 1985 Birds on reserves: the influence of area and habitat on species richness. Journal of Applied Ecology, 22:327-335. Stevens, W.K. 1995. How many species are being lost? Scientists try new yardstick. The New York Times, p. C4, July 25. Usher, M.B . 1985. Implications for the species-area relationship for wildlife conservation. Journal of Environmental Mngt., 2 l : l 8 l - l 9 l . White, D., Kimerling, A. J . and Overton, W.S. 1992. Cartographic and geometric components of a global sampling design for environmental monitoring. Cartographic and Geographic Information Systems, 19(1):5-22. Williams, M.R. 1995. An extreme-value function model of the species incidence and species-area relations. Ecology, 76(8):2607-2616. Wilson, E. 0 . 1988. Biodiversity. National Academy Press, Washington, D.C., 521 pp. Yoon, C.K. 1995. Monumental inventory of Costa Rican forest's insects under way. The New York Times, p.C4, July 11.