Incorporating dominant species as proxies for biotic interactions strengthens plant community models Peter C. le Roux, Loïc Pellissier, Mary S. Wisz and Miska Luoto Appendix S1. Supplementary materials Supplementary materials and methods Vascular plant species cover and environmental characteristics were quantified at two sites on the Saana massif (69° N 20° E) in north-western Finland. The study sites were located on a north- and south-facing slope, separated by 2.5 km. Both sites were above the birch (Betula pubescence ssp. czerepanovii) treeline, at c. 700 m a.s.l. in the Saana Nature Reserve. At the nearby Kilpisjärvi research station (< 2 km away; 480 m a.s.l.) January and July temperatures average -13.4 and 11.0 °C, with mean annual precipitation of 458 mm (1961 – 2011; Finnish Meteorological Institute, Finland). The mesotopography of each quadrat was classified on a ten-point scale following Bruun et al. (2006), with convex ridge tops assigned the maximum value and the bottom of depressions the minimum. Soil moisture and temperature were measured in each quadrat during the peak growing season (16 & 17 July 2012 for the north and south sites respectively; > 24 hours after previous rainfall) using a hand-held time-domain reflectometry sensor (FieldScout TDR 300, Spectrum Technologies, Plainfield, IL, USA; using 7.5 cm sensor rods) and a digital temperature probe (TFX 392 SKW-T thermometer, Ebro Electronic; Ingolstadt, Germany; 10 cm depth; see Aalto, le Roux & Luoto 2013 for details). Maximum potential solar radiation (i.e. assuming clear sky conditions) was calculated for each quadrat (McCune & Keon 2002), using slope and aspect values recorded in the field. Soil samples were collected at 4 m intervals across each grid, where after soil pH was determined in the Laboratory of Geoscience and Geography (University of Helsinki) from air-dried soil samples following the standardized ISO 10390:1994(E) procedure. Bilinear interpolation was subsequently used to estimate the pH within each quadrat. Rock cover (i.e. percentage areal cover) was visually estimated in each quadrat from exposed rock. All analyses were conducted in R statistical software (R Development Core Team 2011), using the mgcv (Wood 2011) and gbm (Elith et al. 2008) packages to implement generalized additive models and boosted regression trees and the QuantReg package (Koenker 2009) to run quantile regression. References for supplementary materials Aalto, J., le Roux, P. C. & Luoto, M. (2013) Vegetation mediates soil temperature and moisture in arctic-alpine environments. Arctic, Antarctic, and Alpine Research, 45, 111. Bruun, H. H., Moen, J., Virtanen, R., Grytnes, J. A., Oksanen, L. & Angerbjörn, A. (2006) Effects of altitude and topography on species richness of vascular plants, bryophytes and lichens in alpine communities. Journal of Vegetation Science, 17, 37-46. Elith, J., Leathwick, J. R. & Hastie, T. (2008) A working guide to boosted regression trees. Journal of Animal Ecology, 77, 802-813. Koenker, R. (2009) quantreg: Quantile Regression. Retrievable from http://CRAN.Rproject.org/package=quantreg. McCune, B. & Keon, D. (2002) Equations for potential annual direct incident radiation and heat load. Journal of Vegetation Science, 13, 603-606. R Development Core Team (2011) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Wood, S. N. (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical methodology), 73, 3 - 36. Appendix S2. Modelling the cover of dominant species (Supplementary materials and results) Analyses in this study use the observed cover of three dominant plant species as predictor variables, assuming plant cover to be a reasonable a proxy for the frequency (and therefore also total intensity) with which these species interaction with, and impact upon, co-occurring sub-dominant plant species. Since Betula nana, Empetrum nigrum ssp. hermaphroditum and Junipersus communis contributed significantly to explaining patterns of community richness, composition and functional structure at our two study sites, accurate predictions of their cover outside of our study location would likely benefit models of arctic-alpine tundra vegetation elsewhere in this habitat type. To determine the accuracy with which the cover of the three species could be predicted we used a six-fold non-random cross-validation method (implemented using generalized linear models, generalized additive models and boosted regression trees), based on the same six abiotic predictor variables described in the main text: mesotopography, soil temperature, soil moisture, maximum potential solar radiation, soil pH and rock cover. Boosted regression trees provide the most accurate predictions of the cover of these species, explaining on average 36 and 30% of the deviance in their cover in the north and south site respectively. Generalized linear models and generalized additive models performed much worse, explaining 11 and 10% of deviance in cover at the south site, and 2 and 2% at the north site respectively. Variable importance varied strongly between sites, but was more consistent between methods (Fig. S7). Thus, at fine-scales the six abiotic variables have relatively low predictive power for the cover of the dominant species. Therefore, in the absence of additional abiotic predictor variables with which to improve models of dominant species cover, field-quantified measurements (and not modelled estimates) of dominant species cover appear necessary to improve community models of arctic-alpine vegetation. Table S1. Characteristics of the two study sites, including total vascular plant cover, the cover of the three dominant plant species, and vascular plant biomass. All values are mean ± SE, except for soil pH where the median value is presented. North South Betula nana 5.76 ± 0.25 4.00 ± 0.24 Empetrum nigrum 13.87 ± 0.66 20.57 ± 0.72 Juniperus communis 0.65 ± 0.13 5.52 ± 0.43 All vascular species 27.23 ± 0.73 43.98 ± 0.74 Biomass (grams per 0.04 m2) 9.86 ± 0.34 15.63 ± 0.52 Vascular plant species richness (1 m2) 11.19 ± 0.22 14.21 ± 0.20 Median vegetation height (cm) 4.19 ± 0.07 6.54 ± 0.13 Leaf dry matter content (mg.g-1) 300.7 ± 13.0 279.4 ± 12.8 Moisture (%) 31.29 ± 0.48 28.71 ± 0.29 Temperature (°C) 8.50 ± 0.05 8.93 ± 0.06 pH (median) 4.63 ± 0.01 5.37 ± 0.02 Rock cover (%) 9.20 ± 0.43 20.62 ± 0.78 0.26 ± 0.01 0.72 ± 0.01 Vegetation cover (%) Soil characteristics Potential solar radiation (MJ.cm–2.yr–1) Figure S1. The mean (± SE) fit of simple (six abiotic predictor variables) and full (six abiotic and three biotic predictors) models of species occurrence, measured by the area under the curve of a receiver operating characteristic plot (AUC) and true skill statistic (TSS), for both the northern (N; n = 43 species) and southern (S; n = 54 species) study sites. Three species distribution modelling techniques were implemented: generalized linear models (GLM), generalized additive models (GAM), and boosted regression trees (BRT). Significance of improvement assessed using one-tailed paired t-tests. Figure S2. Variable importance (%; mean ± SE) for the six abiotic predictor and the three biotic predictor variable when modelling the occurrence of every sub-ordinated species, as determined from the full models in both the north (n = 43 species) and south (n = 54) study sites. Three statistical techniques were implemented: generalized linear models (GLM), generalized additive models (GAM), and boosted regression trees (BRT). Variable importance was calculated for GLMs and GAMs from each variable’s drop contribution (i.e. change in deviance associated with exclusion of that variable from a model containing all the other predictors), while the method of Friedman (2001) was used for BRTs. Variables’ contributions were scaled to sum to 100, with higher values indicating stronger influence on the response variable. Figure S3. Loess smooth (± 95 % CI) fitted to the relationship between community-weighted mean leaf dry matter content (observed and predicted) and the combined cover of the three dominant species (Betula nana + Empetrum nigrum ssp. hermaphroditum + Juniperus communis) in the north and south site. Three statistical methods were used; GLM = generalized linear models, GAM = generalized additive models, BRT = boosted regression trees. The “simple” predictions are models using only abiotic predictor variables, while the “full” model comprised both abiotic and biotic predictor variables. Observed and predicted species richness exclude occurrences of the three dominant plant species. Figure S4. Relationship between community-weighted mean leaf dry matter content and the cover of the three dominant plant species (assumed here to represent a gradient of increasing competitive pressure). Dashed lines represent results from quantile regression performed on the upper and lower 20th percentiles of the data. Figure S5. Variable importance when directly modelling sub-ordinate species richness and community-weighted mean leaf dry matter content (based on full models comprising six abiotic and three biotic predictor variables) for both the northern and southern sites. Figure S6. Histogram of leaf dry matter content (LDMC) of all modelled species for which data were available. The LDMC values for the three dominant species (Empetrum nigrum ssp. hermaphroditum, Betula nana and Juniperus communis) are indicated with arrows, and rank 22nd, sixth and second respectively when compared to the values of the modelled subdominant species. Figure S7. Mean (and maximum) relative importance of six abiotic predictor variables when modelling the cover of the three dominant species (Betula nana, Empetrum nigrum ssp. hermaphroditum and Juniperus communis). GLM = generalized linear model, GAM = generalized additive model, BRT = boosted regression trees, Mesotop. = mesotopography, Moisture = soil moisture, Temperature = soil temperature, Radiation = maximum potential solar radiation, pH = soil pH, Rock = rock cover.