Appendix A - Proceedings of the Royal Society B

advertisement
ELECTRONIC SUPPLEMENTARY MATERIAL
Appendix A. Detailed description of methods of point pattern analysis (adapted from Wang
et al. [1] and Wiegand et al. [2]). Only in this Appendix A, reference numbers refer to an own
reference list provided at the end of the appendix.
Summary statistics
To quantify the spatial patterns found at the three forests we used recent techniques of spatial
point pattern analysis [3-6] and summary statistics such as the pair-correlation function [4],
Ripley’s K-function [3], and the cumulative distribution function of nearest neighbour
distances [5]. The bivariate pair-correlation function g12(r) can be estimated using the quantity
2g12(r) which is the mean density of trees of species 2 at distance r away from trees of the
focal species 1, whereby 2 is the mean density of trees of species 2 in the whole study area.
The estimation of the pair correlation function requires use of a kernel function to define “at
distance r” which basically places rings with radius r and width dr around the focal points
[6,7]. Ripley’s K-function K(r) [3] is the cumulative version of the pair-correlation function,
i.e. the quantity 2K12(r) is the average number of trees of species 2 within distance r from
trees of the focal species 1. For simulation envelopes or goodness-of-fit tests, the
transformation L12(r) = (K12(r)/)0.5 – r of the K-function is usually used because it stabilizes
the variance [5,6] and shows an expectation of L12(r) = 0 for independence, as opposed to the
expectation K12(r) =  r2 of the K-function.
Although the pair correlation function and the K-function are closely related [i.e.,
dK12(r)/dr = 2π r g12(r)], the K-functions is a cumulative statistic and needs careful
interpretation. When using the K-function, interaction effects at small distances (such as
repulsion) are only gradually diluted out by independence at larger distances and can lead to a
superficial impression of repulsion over longer distances than operate in reality [7]. Therefore
the pair correlation function is better suited to quantify in analysis 2 scale-dependent
1
interaction effects. However, if the interest is to quantify how individuals of a species 2 are
distributed within neighbourhoods of individuals of a species 1 (i.e., analysis 1), cumulative
summary statistics are better suited.
To describe additional characteristics of the spatial patterns we used the bivariate
distribution function D12(r) which gives the fraction of trees of the focal species 1 that have
their nearest species 2 neighbour within distance r [5,6]. Note that D(r) is often referred to as
G(y) in the literature [5], but we have adopted the notation of the recent textbook by Illian et
al. [6].
The g12-, K12- and D12- statistics are usually interpreted for homogeneous patterns to
indicate interactions among pairs of points. In this case they reflect properties of a “typical
tree” of the pattern [6]. However, the bivariate patterns at our study site are certainly not all
homogeneous, which means that a typical tree of a pattern may not exist. Instead we
interpreted the g- and K- functions as averages taken over all trees of the focal pattern and
designed our analyses and null models so as to account for potential heterogeneities.
Homogeneous Poisson null model (analysis 1)
Our basic question was conceptually simple: we wanted to know how the trees of a given
species 2 were distributed within local neighbourhoods of the trees of a focal species 1. Did
they occur on average more (or less) frequently within the neighbourhoods than expected by
chance alone, and was this association homogeneous or heterogeneous? In the heterogeneous
case this distribution varies substantially among trees of the focal species, e.g. some species 1
trees may have many species 2 neighbours but other species 1 trees have few species 2
neighbours. To distinguish the various types of spatial associations from those that may arise
purely by chance (individuals of species 2 co-occur within a neighbourhood of species 1 at a
frequency no different from that expected by chance alone), we compared the observed
bivariate point patterns with a null model in which the locations of the focal species remained
2
unchanged but trees of species 2 were distributed randomly and independently of the locations
of species 1 (i.e. a homogeneous Poisson process). Clearly, testing against this null model is
often not very informative [7]; however, we used this test to quantify and categorize the
overall bivariate spatial associations based on a scheme developed by Wiegand et al. [2]. The
scheme uses the bivariate K12(r) and D12(r) as test statistics and distinguishes four significant
types of spatial associations that may occur between two (possibly heterogeneous) patterns. A
“no significant” type arises if neither K12(r) nor D12(r) show significant departures from the
heterogeneous Poisson null model.
Heterogeneous Poisson null model (analysis 2)
The external effect of the environment often follows topographic gradients [8,9] and Seidler
& Plotkin [10] found that cluster sizes in tropical forests were related with seed dispersal
syndromes, ranging from approximately 20m for ballistic, 50m for gravity, gyration and wind,
and >100m for animal dispersal. Thus, the external effect of the environment and that of
dispersal limitation result at our scale of observation (i.e., 25-50ha) in patchy distribution
patterns predominantly on intermediate spatial scales. However, the impact of direct species
interactions such as competition or facilitation is limited to the immediate neighbourhood of
the trees. Thus, acknowledging the multi-scale nature of the spatial association patterns, we
selectively studied the small-scale association pattern by using a null model which
randomizes the trees of species 2 conditionally on their observed large-scale pattern and
where the trees of species are kept unchanged. In practice, this can be done by displacing the
known locations of trees randomly within a neighbourhood with radius R (i.e. a heterogeneous
Poisson null model; [2,11]. This displacement leaves the density λ2(x) of species 2 unchanged,
but the local displacement of species 2 removes potential signal of small-scale interactions at
scales r < R. Contrasting the observed pattern to realizations of this null model will therefore
detect only small-scale interspecific interaction effects.
3
While this analysis can be conducted for any displacement distance R, it is desirable to
use a distance which is likely to separate biological effects. In general it is expected that direct
interactions among larger trees only occur within a limited spatial separation (say < 30 m).
For example, Hubbell et al. [12] found that the neighbourhood effects of conspecific density
on survival disappeared within approximately 12–15 m of the focal plant. Several other
studies using individual-based analyses of local neighbourhood effects on tree growth and
survival confirmed this result [13-15] suggesting that direct plant–plant interactions in forests
may fade away at larger scales. We therefore used a separation distance of R = 30m [2].
An interesting question is whether separation of scales occurred: this can be tested in a
simple way. Because the heterogeneous Poisson process conditions on the spatial structure for
scales larger than 30m, it is only able to indicate significant effects at scales smaller than 30m.
In cases without separation of scales we expect therefore that the frequency of significant
effects, taken over all pairs of species, should fade away smoothly at 30 m. However, if smallscale effects operate only over a short range (i.e. r << 30 m), the frequency of significant
effects should disappear well below the threshold of 30 m.
Technically, we implement this null model as a heterogeneous Poisson process [7] for
the second species (the individuals of the focal species remain unchanged) where its intensity
function λ2(x) is non-parametrically estimated by using an Epanechnikov kernel with
bandwidth of R = 30m. The Epanechnikov kernel is defined as
 3
d2
 4 R (1  R 2 )

eR (d )  

0


Rd R
otherwise
where d is the distance from a focal point, and R the bandwidth. For a given location (x, y) the
intensity (x, y) is constructed by using a moving window with circular shape and radius R
4
around location (x, y) and summing up all points in the circle, but weighting them with factor
eR(d) according to their distance d from the focal location (x, y). Clearly, the intensity estimate
depends on the bandwidth R: for large h one obtains smooth intensity functions and for small
h the estimated function is rough and may obscure the fundamental structure of the
distribution [4]. We used a biological argument and defined the bandwidth R= 30m as the
maximal scale at which second-order effects are expected in tropical forests.
Goodness-of-fit test to assess significance of bivariate patterns against a null model.
For a given species pair we contrasted the observed summary statistics to that expected under
an appropriate null model. We used a Monte-Carlo approach to test for significant departures
from the null model. To this end, we generated for each species pair 199 realizations of the
given null model and used a goodness-of-fit test (GoF) to evaluated the overall ability of the
null model to fit a summary statistic for a given distance interval [5,6,17,18]. The GoF test
collapses the scale-dependent information of a functional summary statistic [e.g., g12(r)] into a
single index ui. The index ui represents the accumulated deviation of the observed summary
statistic from the expected summary statistic under the null model, summed up over an
appropriate distance interval (rmin, rmax) (i.e., a Cramer-von Mises type statistic as e.g. used in
Plotkin et al. [19]):
ui 
rmax
 ( Hˆ (r )  H (r ))
r  rmin
i
2
(1)
where the Hˆ i (r ) is the empirical summary statistic of the observed pattern (i = 0) and that of
the simulated patterns (i = 1,...m), and H(r) the expected summary statistic under the null
model. If the expected test statistic H(r) is not known analytically, H(r) can be replaced by
5
H i (r ) 
1 m ˆ
 H j (r )
m j  0, j  i
(2)
which is the average over all summary statistics, Hˆ i (r ) except the one with index i. Note that
H 0 ( r ) yields the average over the summary statistic of all m simulated patterns and provides
therefore an unbiased estimate of H(r) under the null model [5: p. 14].
For the GoF test the ui are calculated for the observed data (i = 0) and for the simulated
data (i = 1...m) and the rank of u0 among all ui is determined. The observed P value of this test
is
pˆ  1 
rank u0   1
m 1
(3)
For example, if the u0 computed for the observed pattern was larger than the ui computed for
each of the m = 199 simulations of the null model we have rank[ui] = 200 and
pˆ  1  (199 / 200)  0.005 . Details can be found in Diggle [5] and Loosmore & Ford [17].
It is recommended to use the GoF test together with visual inspection of simulation
envelopes [5] which are, for example, the 5th lowest and highest value of the summary
statistic at a given distance r calculated from the 199 simulated patterns. These simulation
envelopes are equivalent to a GoF test applied for the single distance r with a 5% error rate
(i.e., the null hypothesis is rejected if the rank of u0 is larger than 190) [18]. The GoF test
statistic ui reduces in this case to the quantity ui  [ Hˆ i (r )  H (r )]2 where H(r) is the
expectation of the summary statistic for distance r under the null model [e.g., g12(r) = 1 for the
homogenous Poisson null model in analysis 1].
Note that this GoF approach does not strictly test if the null model is accepted or
rejected, but only if the specific index u0 calculated for the observed pattern for the chosen
6
functional summary statistic over the specified distance interval (rmin, rmax) is within the range
of the ui calculated for the stochastic realizations i of the null model [17]. This means in
practice that the GoF test is somewhat sensitive to the distance interval selected. For example,
if the departure from the null model occurs only at small scales of say 5m, but the test is
conducted over an interval of 0-100m a true departure may be overpowered and not detected.
Therefore, the P-value alone does not convey the nature of discrepancy between the data and
the null model. It should always be used in conjunction with visual inspection of the
simulation envelopes.
Scheme to characterize bivariate association in analysis 1
In analysis 1 we want to characterise how species 2 is distributed within neighbourhoods
(with radius r) around a species 1. This suggests use of cumulative summary statistics. The
spatial association between two species can be characterized by the cumulative distribution
function P12(n, r) that gives the probability of finding n trees of species 2 within
neighbourhoods of radius r around trees of species 1. If the point configurations between pairs
of trees of the two species are the same all over the study plot except for stochastic variation
(i.e. homogeneous patterns), we do not need the full distribution P12(n, r) to describe the
association between the two species. In this case the mean of P12(n, r) with respect to n
suffices, which is given by 2K12(r). However, we cannot expect that a typical bivariate point
configuration exists at the three forest plot because the patterns of several species show
heterogeneities. We therefore need a second characteristic of P12(n, r) in order to characterize
the bivariate spatial association patterns more fully. This is because the same value of the
mean (i.e. 2K12(r)) may arise for substantially different situations, e.g. if (i) all trees of
species 1 have more or less the same number of neighbours of species 2 (i.e. a homogenous
pattern) or if (ii) a few trees of species 1 have many species 2 neighbours but many trees of
species 1 have no species 2 neighbours (an extremely heterogeneous pattern). Wiegand et al.
7
[2] selected the value of the distribution P12(n, r) at n = 0 as an additional summary statistic.
P12(n = 0, r) is the probability that a tree of species 1 has within distance r no neighbour of
species 2, i.e. P12(n = 0, r) = 1 - D12(r). Because the summary statistics K12(r) and D12(r)
express fundamentally different properties of bivariate point patterns [6], they are a good
choice for classifying different types of bivariate associations. The expectations of the two
summary statistics under the null model yield D12exp  1  e2r and K12exp(r) = r2 where the
2
“exp” superscript indicates “expected by the null model of no spatial patterning”. The two
axes of the scheme are defined as
Pˆ (r )
2
 Dˆ 12 (r )  (1  e  2r )
Mˆ (r )  ln( Kˆ 12 (r ))  ln( r 2 )
whereby the hat symbol indicates the observed value. We subtracted the theoretical values
under the null model to move null association onto the origin of the scheme (i.e. no departure
from the null model) and log-transformed the K-function in order to weight positive or
negative departures from the null model in the same way [2].
The two-axis scheme allows for four fundamental types of bivariate association. In the
case of “segregation” (type I), both the average number of neighbours within distance r and
the proportion of nearest neighbours within distance r are smaller than expected [i.e. Mˆ (r ) <
0 and Pˆ ( r ) < 0]. In the case of “mixing” (type III), both are larger than expected [i.e. Mˆ (r ) >
0 and Pˆ ( r ) > 0]. In the case of “partial overlap” (type II), the mean number of trees of species
2 within neighbourhoods of radius r around trees of species 1 is larger than would be expected
according to the null model [i.e. Mˆ (r ) > 0] and the probability that a tree of species 1 has no
neighbour of species 2 is smaller than expected [i.e. Dˆ 12 (r ) < 0]. This configuration is only
8
possible for heterogeneous patterns if some trees of species 1 are surrounded at the given
neighbourhood r by many trees of species 2 but others are surrounded by few (or no) trees of
species 2. Finally, in the case opposite to partial overlap (type IV), trees of species 1 are
highly clustered and trees of species 2 overlap the cluster of species 1 (see figure 1f). As a
result, the mean number of species 2 neighbours is smaller than expected [ Mˆ (r ) < 0], but the
probability to have the nearest neighbour of species 2 within distance r is larger than expected
[i.e. Dˆ 12 (r ) > 0]. This is because a few trees of species 2 are in fact the nearest neighbour of
most trees of the highly clustered species 1. Type IV associations will rarely occur [2].
Determining the association type in analysis 1
For each neighbourhood r we classified the bivariate pattern of a given species pair into one
of the five association types “no association”, “segregation”, “partial overlap”, “mixing” and
“type IV” and counted the number of cases for the different distances r (i.e., figure 2). The
simplest approach to accomplish this would be to determine if the values of the two observed
summary statistics D12(r) and K12(r) were for distance r outside the simulation envelopes. If
both observed summary statistics were located within the simulation envelopes the “no
association” type would be assigned to this species pair, and if at least one of the summary
statistics were located outside the simulation envelope one of the four remaining types would
be assigned as explained above. However, this approach is prone to problems associated with
multiple testing because we repeated this assessment for different distances r. We therefore
eliminated in a previous step all species pairs for which the GoF test conducted over the entire
2-50m distance interval did not detect significant departures from the null model. We
assigned these species pairs for all neighbourhoods r the “no association” type.
We conducted the GoF test for two summary statistics, the distribution function D12(r)
of the distances to the nearest neighbour and the L12(r). Use of the L-function instead of the Kfunction is recommended here to stabilize the variance [18]. Because we used two summary
9
statistics D12(r) and L12(r) simultaneously to assess departures from the null model we
conducted the GoF test for each summary statistic with an error rate of 2.5% which yields an
approximate error rate of 5% for both summary statistics together.
For species pairs that did not fit the null model we counted for a given association type
the number of cases where the observed value of the summary statistic was located outside the
simulation envelopes. This occurred if the GoF test for distance r yielded for at least one of
the two summary statistics L12(r) and D12(r) a rank larger than 195. This corresponds to a
2.5% error rate for a single summary statistic and an error rate of  5% for both summary
statistics together. The species pair was then assigned to the corresponding association type
(i.e., segregation, partial overlap, mixing, or type IV).
Determining the association type in analysis 2
For each distance r we classified the bivariate pattern of a given species pair into one of the
three interaction types “no interaction”, “repulsion” and “attraction”. If the GoF test
conducted with the pair correlation function over the distance interval 2-30m indicated no
significant departure from the heterogeneous Poisson null model, the species pair was
assigned the “no interaction” type for all distances r. However, even if the GoF test indicated
an overall departure from the null model, the significant effect may not occur at a given
neighbourhood r. We therefore assigned a species pair at neighbourhood r the “no
interaction” type if the observed pair correlation function was outside simulation envelopes
with a 5% error rate (i.e., the fifth lowest and highest values). If the observed value of the pair
correlation function was below or above the envelopes the species pair showed repulsion or
attraction, respectively.
REFERENCES
10
1. Wang, X., Wiegand, T., Hao, Z., Li, B., Ye, J. & Zhang, J. 2010 Species associations
in an old-growth temperate forest in north-eastern China. J. Ecol. 98, 674–686.
(doi: 10.1111/j.1365-2745.2010.01644.x)
2. Wiegand, T., Gunatilleke, C.V.S. & Gunatilleke, I.A.U.N. 2007a. Species
associations in a heterogeneous Sri Lankan Dipterocarp forest. Am. Nat. 170, E77E95. (doi: 10.1086/521240)
3. Ripley, B.D. 1981 Spatial statistics, New York: Wiley.
4. Stoyan, D. & Stoyan, H. 1994 Fractals, random shapes and point fields: methods of
geometrical statistics, New York: Wiley.
5. Diggle, P.J. 2003 Statistical analysis of point patterns, London: Arnold.
6. Illian, J., Penttinen, A., Stoyan, H. & Stoyan, D. 2008 Statistical analysis and
modeling of spatial point patterns. Chichester: Wiley and Sons.
7. Wiegand, T. & Moloney, K.A. 2004 Rings, circles, and null-models for point pattern
analysis in ecology. Oikos 104, 209-229. (doi: 10.1111/j.0030-1299.2004.12497.x)
8. Harms, K.E., Condit, R., Hubbell, S.P. & Foster, R.B. 2001 Habitat associations of
trees and shrubs in a 50-ha neotropical forest plot. J. Ecol. 89, 947-959. (doi:
10.1111/j.1365-2745.2001.00615.x)
9. Comita, L.S., Condit, R. & Hubbell, S.P. 2007 Developmental changes in habitat
associations of tropical trees. J. Ecol. 95, 482-492. (doi: 10.1111/j.13652745.2007.01229.x)
10. Seidler, T.G. & Plotkin, J.B. 2006 Seed dispersal and spatial pattern in tropical trees.
PLoS. Biol. 4, 2132–2137. (doi:10.1371/journal.pbio.0040344)
11. Wiegand, T., Gunatilleke, C.V.S., Gunatilleke, I.A.U.N. & Huth, A. 2007b How
individual species structure diversity in tropical forests. Proc. Natl. Acad. Sci. U.S.A.
104, 19029-19033. (doi: 10.1073/pnas.0705621104)
11
12. Hubbell, S.P., Ahumada, J.A., Condit, R. & Foster, R.B. 2001 Local neighborhood
effects on long-term survival of individual trees in a neotropical forest. Ecol. Res. 16,
859–875. (doi: 10.1046/j.1440-1703.2001.00445.x)
13. Uriarte, M., Condit, R., Canham, C.D. & Hubbell, S.P. 2004 A spatially explicit model
of sapling growth in a tropical forest: does the identity of neighbours matter? J. Ecol.
92, 348–360. (doi: 10.1111/j.0022-0477.2004.00867.x)
14. Stoll, P. & Newbery, D.M. 2005 Evidence of species-specific neighborhood effects in
the dipterocarpaceae of a Bornean rain forest. Ecology 86, 3048–3062. (doi:
10.1890/04-1540)
15. Uriarte, M., Hubbell, S.P., John, R., Condit, R. & Canham, C.D. 2005 Neighborhood
effects on sapling growth and survival in a neotropical forest and the ecological
equivalence hypothesis. In Biotic interactions in the tropics: their role in the
maintenance of species diversity (eds D. Burslem, M. Pinard & S. Hartley), pp. 89–
106. Cambridge: Cambridge University Press.
16. Law, R., Illian, J., Burslem, D.F.R.P., Gratzer, G., Gunatilleke, C.V.S. & Gunatilleke,
I.A.U.N. 2009 Ecological information from spatial patterns of plants: insights from
point process theory. J. Ecol. 97, 616–628. (doi: 10.1111/j.1365-2745.2009.01510.x)
17. Loosmore, N.B., & Ford, E.D. 2006 Statistical inference using the G or K point
pattern spatial statistics. Ecology 87, 1925-1931. (doi:10.1890/00129658(2006)87[1925:SIUTGO]2.0.CO;2)
18. Grabarnik, P., Myllymäki, M. & Stoyan, D. 2011 Correct testing of mark
independence for marked point patterns. Ecological Modelling 222, 3888-3894. DOI:
10.1016/j.ecolmodel.2011.10.005
19. Plotkin, J.B., Potts, M.D., Leslie, N., Manokaran, N., LaFrankie, J. & Ashton, P.S.
2000 Species-area curves, spatial aggregation, and habitat specialization in tropical
forests. J. Theor. Biol. 207, 81–99. (doi:10.1006/jtbi.2000.2158)
12
Download