ELECTRONIC SUPPLEMENTARY MATERIAL Appendix A. Detailed description of methods of point pattern analysis (adapted from Wang et al. [1] and Wiegand et al. [2]). Only in this Appendix A, reference numbers refer to an own reference list provided at the end of the appendix. Summary statistics To quantify the spatial patterns found at the three forests we used recent techniques of spatial point pattern analysis [3-6] and summary statistics such as the pair-correlation function [4], Ripley’s K-function [3], and the cumulative distribution function of nearest neighbour distances [5]. The bivariate pair-correlation function g12(r) can be estimated using the quantity 2g12(r) which is the mean density of trees of species 2 at distance r away from trees of the focal species 1, whereby 2 is the mean density of trees of species 2 in the whole study area. The estimation of the pair correlation function requires use of a kernel function to define “at distance r” which basically places rings with radius r and width dr around the focal points [6,7]. Ripley’s K-function K(r) [3] is the cumulative version of the pair-correlation function, i.e. the quantity 2K12(r) is the average number of trees of species 2 within distance r from trees of the focal species 1. For simulation envelopes or goodness-of-fit tests, the transformation L12(r) = (K12(r)/)0.5 – r of the K-function is usually used because it stabilizes the variance [5,6] and shows an expectation of L12(r) = 0 for independence, as opposed to the expectation K12(r) = r2 of the K-function. Although the pair correlation function and the K-function are closely related [i.e., dK12(r)/dr = 2π r g12(r)], the K-functions is a cumulative statistic and needs careful interpretation. When using the K-function, interaction effects at small distances (such as repulsion) are only gradually diluted out by independence at larger distances and can lead to a superficial impression of repulsion over longer distances than operate in reality [7]. Therefore the pair correlation function is better suited to quantify in analysis 2 scale-dependent 1 interaction effects. However, if the interest is to quantify how individuals of a species 2 are distributed within neighbourhoods of individuals of a species 1 (i.e., analysis 1), cumulative summary statistics are better suited. To describe additional characteristics of the spatial patterns we used the bivariate distribution function D12(r) which gives the fraction of trees of the focal species 1 that have their nearest species 2 neighbour within distance r [5,6]. Note that D(r) is often referred to as G(y) in the literature [5], but we have adopted the notation of the recent textbook by Illian et al. [6]. The g12-, K12- and D12- statistics are usually interpreted for homogeneous patterns to indicate interactions among pairs of points. In this case they reflect properties of a “typical tree” of the pattern [6]. However, the bivariate patterns at our study site are certainly not all homogeneous, which means that a typical tree of a pattern may not exist. Instead we interpreted the g- and K- functions as averages taken over all trees of the focal pattern and designed our analyses and null models so as to account for potential heterogeneities. Homogeneous Poisson null model (analysis 1) Our basic question was conceptually simple: we wanted to know how the trees of a given species 2 were distributed within local neighbourhoods of the trees of a focal species 1. Did they occur on average more (or less) frequently within the neighbourhoods than expected by chance alone, and was this association homogeneous or heterogeneous? In the heterogeneous case this distribution varies substantially among trees of the focal species, e.g. some species 1 trees may have many species 2 neighbours but other species 1 trees have few species 2 neighbours. To distinguish the various types of spatial associations from those that may arise purely by chance (individuals of species 2 co-occur within a neighbourhood of species 1 at a frequency no different from that expected by chance alone), we compared the observed bivariate point patterns with a null model in which the locations of the focal species remained 2 unchanged but trees of species 2 were distributed randomly and independently of the locations of species 1 (i.e. a homogeneous Poisson process). Clearly, testing against this null model is often not very informative [7]; however, we used this test to quantify and categorize the overall bivariate spatial associations based on a scheme developed by Wiegand et al. [2]. The scheme uses the bivariate K12(r) and D12(r) as test statistics and distinguishes four significant types of spatial associations that may occur between two (possibly heterogeneous) patterns. A “no significant” type arises if neither K12(r) nor D12(r) show significant departures from the heterogeneous Poisson null model. Heterogeneous Poisson null model (analysis 2) The external effect of the environment often follows topographic gradients [8,9] and Seidler & Plotkin [10] found that cluster sizes in tropical forests were related with seed dispersal syndromes, ranging from approximately 20m for ballistic, 50m for gravity, gyration and wind, and >100m for animal dispersal. Thus, the external effect of the environment and that of dispersal limitation result at our scale of observation (i.e., 25-50ha) in patchy distribution patterns predominantly on intermediate spatial scales. However, the impact of direct species interactions such as competition or facilitation is limited to the immediate neighbourhood of the trees. Thus, acknowledging the multi-scale nature of the spatial association patterns, we selectively studied the small-scale association pattern by using a null model which randomizes the trees of species 2 conditionally on their observed large-scale pattern and where the trees of species are kept unchanged. In practice, this can be done by displacing the known locations of trees randomly within a neighbourhood with radius R (i.e. a heterogeneous Poisson null model; [2,11]. This displacement leaves the density λ2(x) of species 2 unchanged, but the local displacement of species 2 removes potential signal of small-scale interactions at scales r < R. Contrasting the observed pattern to realizations of this null model will therefore detect only small-scale interspecific interaction effects. 3 While this analysis can be conducted for any displacement distance R, it is desirable to use a distance which is likely to separate biological effects. In general it is expected that direct interactions among larger trees only occur within a limited spatial separation (say < 30 m). For example, Hubbell et al. [12] found that the neighbourhood effects of conspecific density on survival disappeared within approximately 12–15 m of the focal plant. Several other studies using individual-based analyses of local neighbourhood effects on tree growth and survival confirmed this result [13-15] suggesting that direct plant–plant interactions in forests may fade away at larger scales. We therefore used a separation distance of R = 30m [2]. An interesting question is whether separation of scales occurred: this can be tested in a simple way. Because the heterogeneous Poisson process conditions on the spatial structure for scales larger than 30m, it is only able to indicate significant effects at scales smaller than 30m. In cases without separation of scales we expect therefore that the frequency of significant effects, taken over all pairs of species, should fade away smoothly at 30 m. However, if smallscale effects operate only over a short range (i.e. r << 30 m), the frequency of significant effects should disappear well below the threshold of 30 m. Technically, we implement this null model as a heterogeneous Poisson process [7] for the second species (the individuals of the focal species remain unchanged) where its intensity function λ2(x) is non-parametrically estimated by using an Epanechnikov kernel with bandwidth of R = 30m. The Epanechnikov kernel is defined as 3 d2 4 R (1 R 2 ) eR (d ) 0 Rd R otherwise where d is the distance from a focal point, and R the bandwidth. For a given location (x, y) the intensity (x, y) is constructed by using a moving window with circular shape and radius R 4 around location (x, y) and summing up all points in the circle, but weighting them with factor eR(d) according to their distance d from the focal location (x, y). Clearly, the intensity estimate depends on the bandwidth R: for large h one obtains smooth intensity functions and for small h the estimated function is rough and may obscure the fundamental structure of the distribution [4]. We used a biological argument and defined the bandwidth R= 30m as the maximal scale at which second-order effects are expected in tropical forests. Goodness-of-fit test to assess significance of bivariate patterns against a null model. For a given species pair we contrasted the observed summary statistics to that expected under an appropriate null model. We used a Monte-Carlo approach to test for significant departures from the null model. To this end, we generated for each species pair 199 realizations of the given null model and used a goodness-of-fit test (GoF) to evaluated the overall ability of the null model to fit a summary statistic for a given distance interval [5,6,17,18]. The GoF test collapses the scale-dependent information of a functional summary statistic [e.g., g12(r)] into a single index ui. The index ui represents the accumulated deviation of the observed summary statistic from the expected summary statistic under the null model, summed up over an appropriate distance interval (rmin, rmax) (i.e., a Cramer-von Mises type statistic as e.g. used in Plotkin et al. [19]): ui rmax ( Hˆ (r ) H (r )) r rmin i 2 (1) where the Hˆ i (r ) is the empirical summary statistic of the observed pattern (i = 0) and that of the simulated patterns (i = 1,...m), and H(r) the expected summary statistic under the null model. If the expected test statistic H(r) is not known analytically, H(r) can be replaced by 5 H i (r ) 1 m ˆ H j (r ) m j 0, j i (2) which is the average over all summary statistics, Hˆ i (r ) except the one with index i. Note that H 0 ( r ) yields the average over the summary statistic of all m simulated patterns and provides therefore an unbiased estimate of H(r) under the null model [5: p. 14]. For the GoF test the ui are calculated for the observed data (i = 0) and for the simulated data (i = 1...m) and the rank of u0 among all ui is determined. The observed P value of this test is pˆ 1 rank u0 1 m 1 (3) For example, if the u0 computed for the observed pattern was larger than the ui computed for each of the m = 199 simulations of the null model we have rank[ui] = 200 and pˆ 1 (199 / 200) 0.005 . Details can be found in Diggle [5] and Loosmore & Ford [17]. It is recommended to use the GoF test together with visual inspection of simulation envelopes [5] which are, for example, the 5th lowest and highest value of the summary statistic at a given distance r calculated from the 199 simulated patterns. These simulation envelopes are equivalent to a GoF test applied for the single distance r with a 5% error rate (i.e., the null hypothesis is rejected if the rank of u0 is larger than 190) [18]. The GoF test statistic ui reduces in this case to the quantity ui [ Hˆ i (r ) H (r )]2 where H(r) is the expectation of the summary statistic for distance r under the null model [e.g., g12(r) = 1 for the homogenous Poisson null model in analysis 1]. Note that this GoF approach does not strictly test if the null model is accepted or rejected, but only if the specific index u0 calculated for the observed pattern for the chosen 6 functional summary statistic over the specified distance interval (rmin, rmax) is within the range of the ui calculated for the stochastic realizations i of the null model [17]. This means in practice that the GoF test is somewhat sensitive to the distance interval selected. For example, if the departure from the null model occurs only at small scales of say 5m, but the test is conducted over an interval of 0-100m a true departure may be overpowered and not detected. Therefore, the P-value alone does not convey the nature of discrepancy between the data and the null model. It should always be used in conjunction with visual inspection of the simulation envelopes. Scheme to characterize bivariate association in analysis 1 In analysis 1 we want to characterise how species 2 is distributed within neighbourhoods (with radius r) around a species 1. This suggests use of cumulative summary statistics. The spatial association between two species can be characterized by the cumulative distribution function P12(n, r) that gives the probability of finding n trees of species 2 within neighbourhoods of radius r around trees of species 1. If the point configurations between pairs of trees of the two species are the same all over the study plot except for stochastic variation (i.e. homogeneous patterns), we do not need the full distribution P12(n, r) to describe the association between the two species. In this case the mean of P12(n, r) with respect to n suffices, which is given by 2K12(r). However, we cannot expect that a typical bivariate point configuration exists at the three forest plot because the patterns of several species show heterogeneities. We therefore need a second characteristic of P12(n, r) in order to characterize the bivariate spatial association patterns more fully. This is because the same value of the mean (i.e. 2K12(r)) may arise for substantially different situations, e.g. if (i) all trees of species 1 have more or less the same number of neighbours of species 2 (i.e. a homogenous pattern) or if (ii) a few trees of species 1 have many species 2 neighbours but many trees of species 1 have no species 2 neighbours (an extremely heterogeneous pattern). Wiegand et al. 7 [2] selected the value of the distribution P12(n, r) at n = 0 as an additional summary statistic. P12(n = 0, r) is the probability that a tree of species 1 has within distance r no neighbour of species 2, i.e. P12(n = 0, r) = 1 - D12(r). Because the summary statistics K12(r) and D12(r) express fundamentally different properties of bivariate point patterns [6], they are a good choice for classifying different types of bivariate associations. The expectations of the two summary statistics under the null model yield D12exp 1 e2r and K12exp(r) = r2 where the 2 “exp” superscript indicates “expected by the null model of no spatial patterning”. The two axes of the scheme are defined as Pˆ (r ) 2 Dˆ 12 (r ) (1 e 2r ) Mˆ (r ) ln( Kˆ 12 (r )) ln( r 2 ) whereby the hat symbol indicates the observed value. We subtracted the theoretical values under the null model to move null association onto the origin of the scheme (i.e. no departure from the null model) and log-transformed the K-function in order to weight positive or negative departures from the null model in the same way [2]. The two-axis scheme allows for four fundamental types of bivariate association. In the case of “segregation” (type I), both the average number of neighbours within distance r and the proportion of nearest neighbours within distance r are smaller than expected [i.e. Mˆ (r ) < 0 and Pˆ ( r ) < 0]. In the case of “mixing” (type III), both are larger than expected [i.e. Mˆ (r ) > 0 and Pˆ ( r ) > 0]. In the case of “partial overlap” (type II), the mean number of trees of species 2 within neighbourhoods of radius r around trees of species 1 is larger than would be expected according to the null model [i.e. Mˆ (r ) > 0] and the probability that a tree of species 1 has no neighbour of species 2 is smaller than expected [i.e. Dˆ 12 (r ) < 0]. This configuration is only 8 possible for heterogeneous patterns if some trees of species 1 are surrounded at the given neighbourhood r by many trees of species 2 but others are surrounded by few (or no) trees of species 2. Finally, in the case opposite to partial overlap (type IV), trees of species 1 are highly clustered and trees of species 2 overlap the cluster of species 1 (see figure 1f). As a result, the mean number of species 2 neighbours is smaller than expected [ Mˆ (r ) < 0], but the probability to have the nearest neighbour of species 2 within distance r is larger than expected [i.e. Dˆ 12 (r ) > 0]. This is because a few trees of species 2 are in fact the nearest neighbour of most trees of the highly clustered species 1. Type IV associations will rarely occur [2]. Determining the association type in analysis 1 For each neighbourhood r we classified the bivariate pattern of a given species pair into one of the five association types “no association”, “segregation”, “partial overlap”, “mixing” and “type IV” and counted the number of cases for the different distances r (i.e., figure 2). The simplest approach to accomplish this would be to determine if the values of the two observed summary statistics D12(r) and K12(r) were for distance r outside the simulation envelopes. If both observed summary statistics were located within the simulation envelopes the “no association” type would be assigned to this species pair, and if at least one of the summary statistics were located outside the simulation envelope one of the four remaining types would be assigned as explained above. However, this approach is prone to problems associated with multiple testing because we repeated this assessment for different distances r. We therefore eliminated in a previous step all species pairs for which the GoF test conducted over the entire 2-50m distance interval did not detect significant departures from the null model. We assigned these species pairs for all neighbourhoods r the “no association” type. We conducted the GoF test for two summary statistics, the distribution function D12(r) of the distances to the nearest neighbour and the L12(r). Use of the L-function instead of the Kfunction is recommended here to stabilize the variance [18]. Because we used two summary 9 statistics D12(r) and L12(r) simultaneously to assess departures from the null model we conducted the GoF test for each summary statistic with an error rate of 2.5% which yields an approximate error rate of 5% for both summary statistics together. For species pairs that did not fit the null model we counted for a given association type the number of cases where the observed value of the summary statistic was located outside the simulation envelopes. This occurred if the GoF test for distance r yielded for at least one of the two summary statistics L12(r) and D12(r) a rank larger than 195. This corresponds to a 2.5% error rate for a single summary statistic and an error rate of 5% for both summary statistics together. The species pair was then assigned to the corresponding association type (i.e., segregation, partial overlap, mixing, or type IV). Determining the association type in analysis 2 For each distance r we classified the bivariate pattern of a given species pair into one of the three interaction types “no interaction”, “repulsion” and “attraction”. If the GoF test conducted with the pair correlation function over the distance interval 2-30m indicated no significant departure from the heterogeneous Poisson null model, the species pair was assigned the “no interaction” type for all distances r. However, even if the GoF test indicated an overall departure from the null model, the significant effect may not occur at a given neighbourhood r. We therefore assigned a species pair at neighbourhood r the “no interaction” type if the observed pair correlation function was outside simulation envelopes with a 5% error rate (i.e., the fifth lowest and highest values). If the observed value of the pair correlation function was below or above the envelopes the species pair showed repulsion or attraction, respectively. REFERENCES 10 1. Wang, X., Wiegand, T., Hao, Z., Li, B., Ye, J. & Zhang, J. 2010 Species associations in an old-growth temperate forest in north-eastern China. J. Ecol. 98, 674–686. (doi: 10.1111/j.1365-2745.2010.01644.x) 2. Wiegand, T., Gunatilleke, C.V.S. & Gunatilleke, I.A.U.N. 2007a. Species associations in a heterogeneous Sri Lankan Dipterocarp forest. Am. Nat. 170, E77E95. (doi: 10.1086/521240) 3. Ripley, B.D. 1981 Spatial statistics, New York: Wiley. 4. Stoyan, D. & Stoyan, H. 1994 Fractals, random shapes and point fields: methods of geometrical statistics, New York: Wiley. 5. Diggle, P.J. 2003 Statistical analysis of point patterns, London: Arnold. 6. Illian, J., Penttinen, A., Stoyan, H. & Stoyan, D. 2008 Statistical analysis and modeling of spatial point patterns. Chichester: Wiley and Sons. 7. Wiegand, T. & Moloney, K.A. 2004 Rings, circles, and null-models for point pattern analysis in ecology. Oikos 104, 209-229. (doi: 10.1111/j.0030-1299.2004.12497.x) 8. Harms, K.E., Condit, R., Hubbell, S.P. & Foster, R.B. 2001 Habitat associations of trees and shrubs in a 50-ha neotropical forest plot. J. Ecol. 89, 947-959. (doi: 10.1111/j.1365-2745.2001.00615.x) 9. Comita, L.S., Condit, R. & Hubbell, S.P. 2007 Developmental changes in habitat associations of tropical trees. J. Ecol. 95, 482-492. (doi: 10.1111/j.13652745.2007.01229.x) 10. Seidler, T.G. & Plotkin, J.B. 2006 Seed dispersal and spatial pattern in tropical trees. PLoS. Biol. 4, 2132–2137. (doi:10.1371/journal.pbio.0040344) 11. Wiegand, T., Gunatilleke, C.V.S., Gunatilleke, I.A.U.N. & Huth, A. 2007b How individual species structure diversity in tropical forests. Proc. Natl. Acad. Sci. U.S.A. 104, 19029-19033. (doi: 10.1073/pnas.0705621104) 11 12. Hubbell, S.P., Ahumada, J.A., Condit, R. & Foster, R.B. 2001 Local neighborhood effects on long-term survival of individual trees in a neotropical forest. Ecol. Res. 16, 859–875. (doi: 10.1046/j.1440-1703.2001.00445.x) 13. Uriarte, M., Condit, R., Canham, C.D. & Hubbell, S.P. 2004 A spatially explicit model of sapling growth in a tropical forest: does the identity of neighbours matter? J. Ecol. 92, 348–360. (doi: 10.1111/j.0022-0477.2004.00867.x) 14. Stoll, P. & Newbery, D.M. 2005 Evidence of species-specific neighborhood effects in the dipterocarpaceae of a Bornean rain forest. Ecology 86, 3048–3062. (doi: 10.1890/04-1540) 15. Uriarte, M., Hubbell, S.P., John, R., Condit, R. & Canham, C.D. 2005 Neighborhood effects on sapling growth and survival in a neotropical forest and the ecological equivalence hypothesis. In Biotic interactions in the tropics: their role in the maintenance of species diversity (eds D. Burslem, M. Pinard & S. Hartley), pp. 89– 106. Cambridge: Cambridge University Press. 16. Law, R., Illian, J., Burslem, D.F.R.P., Gratzer, G., Gunatilleke, C.V.S. & Gunatilleke, I.A.U.N. 2009 Ecological information from spatial patterns of plants: insights from point process theory. J. Ecol. 97, 616–628. (doi: 10.1111/j.1365-2745.2009.01510.x) 17. Loosmore, N.B., & Ford, E.D. 2006 Statistical inference using the G or K point pattern spatial statistics. Ecology 87, 1925-1931. (doi:10.1890/00129658(2006)87[1925:SIUTGO]2.0.CO;2) 18. Grabarnik, P., Myllymäki, M. & Stoyan, D. 2011 Correct testing of mark independence for marked point patterns. Ecological Modelling 222, 3888-3894. DOI: 10.1016/j.ecolmodel.2011.10.005 19. Plotkin, J.B., Potts, M.D., Leslie, N., Manokaran, N., LaFrankie, J. & Ashton, P.S. 2000 Species-area curves, spatial aggregation, and habitat specialization in tropical forests. J. Theor. Biol. 207, 81–99. (doi:10.1006/jtbi.2000.2158) 12