International Journal of Geographical Information Science: Special Issue Paper Which environmental variables should I use in my biodiversity model? Supplementary Information Kristen J Williams1, Lee Belbin1,2, Michael P Austin1, Janet L. Stein3, Simon Ferrier1 1. CSIRO Ecosystem Sciences, Canberra, Australia; 2. Atlas of Living Australia, Hobart, Australia; 3. Fenner School of Environment and Society, Australian National University, Canberra, Australia Correspondence: Kristen J Williams (kristen.williams@csiro.au); CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601, Australia, phone +61 2 62464213 CONTENTS Table 1. Substitutable subsets of variables that are alternatives and not sensibly included in the same model (without justification). Table 2. Similar variables that can be included in the same model (not substitutable). Table 3. MaxEnt model results for Eucalyptus delegatensis in southeastern Australia. Table 4. Summary of relative contribution and permutation importance (percents) of variables included in the four MaxEnt models (from Table 3) summarised by two climate and three substrate groups. Table 5. Relative contribution and permutation importance of variables included in the four MaxEnt models for Eucalyptus delegatensis presence (from Table 3). Table 6. GDM model results for vascular plants across the Australian continent. Table 7. Summary of relative contribution (sum of coefficient values) and partial deviance explained (percent) by variables included in the two GDM models (from Table 6). Table 8. Relative contribution (sum of coefficients) and partial deviance explained (%) by variables included in the two GDM models for vascular plant compositional dissimilarity (from Table 6). Table 9. Overview of variables used in MaxEnt and GDM models compared with variables tested. Figure 1. Predicted (probability of presence) natural distribution of Eucalyptus delegatensis for four alternative MaxEnt models. International Journal of Geographical Information Science: Special Issue Paper Table 1. Substitutable subsets of variables that are alternatives and not sensibly included in the same model (without justification). Group 1 Type Atmospheric water Subset1 RAINI, RAINX, EVAPI, EVAPX Subset2 ADEFI, ADEFX (possibly also, ARID_MIN, ARID_MAX) 2 Rainfall seasonality SLRAIN1, SLRAIN2 SRAIN1MP, SRAIN2MP 3 Geological age GEOLLMNAGE, GEOLLRNGEAGE GEOLMEANAGE, GEOLRANGEAGE 4 Soil hydrology SOLPAWHC SOLDEPTH, CLAY 5 Soil pedality PEDALITY HPEDALITY Comments Atmospheric water deficit is derived from rainfall and evaporation (no recommendation, user preference, test both separately). Possibly consider aridity indices in subset2, but may also test these in combination with susbet1. Subset 1 is a factor ratio using the logarithm of rainfall, subset 2 is a simple rainfall ratio (test alternatives, or choose subset1 for continental studies and subset 2 for regional studies) Subset 1 is the logarithm of age, subset 2 is in millions of years (recommend using subset1) Subset 1 is a derivative of variables in subset 2 (recommend using subset 2, more direct interpretation of soil attribute and clay % is also a factor in soil nutrient supply). CLAY contributes information about soil nutrient status and structure (in addition to hydrology) and could be included in a model with SOLPAWHC. Subset 1 is a categorical variable, subset 2 is ordered. Recommend using subset 2 International Journal of Geographical Information Science: Special Issue Paper Table 2. Similar variables that can be included in the same model (not substitutable). Group 1 Type Substrate fertility Subset1 FERT Subset2 NUTRIENTS 2 Terrain flatness MRRTF, MRVBF RIDGETOP, VALLEYBOTTOM 3 Soil attribute reliability DATASUPT WR_UNR, KSAT_ERR 4 Temperature extremes MINTI, MAXTX TMINABSI, TMAXABSX 5 Temperature range TRNGX TRNGA 6 Humidity RH2MIN, RH2MAX VPD2MIN, VP2MAX 7 Temperature MAXTI, MAXTX MINTI, MINTX Comments Subset1 is derived from 1:1< geology mapping by expert interpretation, subset2 is derived from the Atlas of Australian soils using data and expert interpretation. Correlated but can be tested together. Recommend both. Subset1 and subset2 are different summaries at 1km grid of value and heterogeneity within 9sec grid estimates of MRRTF and MRVBF. Correlated but can be tested together. Recommend both. Subset1 is an overall estimate of soil property interpretation reliability based on available data and subset2 are specific to particular soil attributes. Correlated but can be tested together. Recommend subset1. Subset1 are long-term average monthly maximum and minimum temperatures generated at 1km grid using ANUCLIM v5.1, subset 2 are absolute monthly maximum and minimum values over 50 years generated from daily 5km grid SILO surfaces. Correlated but can be tested together. Recommend subset1. Subset 1 measures the maximum of monthly diurnal temperature ranges, subset 2 measures the annual difference between hottest day and coldest night. Correlated but can be tested together. Recommend subset1. Subset 1 is the relative humidity (ratio), subset 2 is the vapour pressure deficit (difference). Correlated but can be tested together. Recommend subset1. Subset 1 measures daytime temperatures, subset2 measures night time temperatures. Correlated but can be tested together. Recommend both. International Journal of Geographical Information Science: Special Issue Paper Table 3. MaxEnt model results for Eucalyptus delegatensis in southeastern Australia. Summary of training and test statistics (AUC) using 25% test data (54 presence records) and 75% training data (162 presence records) to validate the relative performance of each model generated. Different models were generated using substitutable subsets of environmental variables. Model 1a: evaporation and precipitation, potential soil water holding capacity. Model 1b: evaporation and precipitation, soil depth and texture (percent clay). Model 2a: precipitation deficit and aridity, potential soil water holding capacity. Model 2b: precipitation deficit and aridity, soil depth and texture (percent clay). Model 100% Training AUC 75% Training AUC Model 1a 0.947 0.949 Model 1b 0.945 0.949 Model 2a 0.945 0.949 Model 2b 0.944 0.952 25% Test AUC and standard deviation1 0.890 +/0.022 0.903 +/0.019 0.892 +/0.022 0.887 +/0.022 Regularized training gain Unregularized training gain Unregularized test gain # variables Most important variable2 1.692 2.057 1.020 23 1.649 1.996 1.230 23 1.649 1.994 1.145 24 1.678 2.049 1.145 23 MAXTI (38.2) MAXTI (38.4) MAXTI (40.0) MAXTI (38.7) 1. Calculated as per equation 2 in (DeLong et al. 1988). 2. Training data percent contribution Table 4. Summary of relative contribution and permutation importance (percents) of variables included in the four MaxEnt models summarised by two climate and three substrate groups. Results for individual variables are given in Table 5. variable Model1a Model1b Model2a Model2b contribution Importance contribution Importance contribution Importance contribution Importance water 18.14 27.42 14.30 21.63 12.47 20.80 15.39 13.88 energy 65.48 33.69 69.73 40.03 70.21 31.37 66.24 34.79 soil 5.45 4.43 7.27 12.96 6.19 4.42 6.74 12.04 geoscience 5.49 3.34 4.16 2.04 6.75 5.23 7.02 7.02 terrain 5.43 31.12 4.53 23.33 4.38 38.17 4.61 32.28 International Journal of Geographical Information Science: Special Issue Paper Table 5. Relative contribution and permutation importance of variables included in the four MaxEnt models for Eucalyptus delegatensis presence. Shading identifies the two climate and three substrate groups summarised in Table 4. Model 1a Variable Model 1b Model 2a contribution permutation contribution permutation ARID_MAX - - - - ARID_MIN - - - - 4.0792 RAINX 3.7543 4.2896 2.4593 2.5732 EVAPI 2.2372 2.5127 2.5033 2.146 EVAPX 4.2323 8.59 5.0813 10.5557 - - - - RPRECMAX 0.6217 0.9927 0.7386 1.3318 0.5955 1.2418 RPRECMIN 0.2922 1.4098 1.261 0.9936 SLRAIN1 4.3668 8.0886 4.0575 11.5284 4.9931 7.38 SLRAIN2 2.6392 1.5412 2.3375 3.7179 2.5196 1.4208 MAXTI 46.792 1.9771 47.9136 8.0951 51.131 9.3364 46.106 2.2444 MINTI 1.9949 12.127 2.3754 8.5935 1.1464 4.6968 1.8137 8.9536 MINTX 10.3755 3.0802 9.3235 5.7213 10.2811 2.5018 12.8647 7.4872 0.8582 1.1989 1.0081 1.7875 RADNX 0.5848 4.1639 4.0676 5.86 0.521 2.0353 RH2MIN 0.3801 3.5035 0.4372 2.0157 0.4141 1.1714 0.4341 1.267 0.7235 3.9214 0.7418 3.459 TRNGX 4.4175 6.0087 4.2738 6.0571 4.189 4.2713 4.2759 11.3744 RTXMIN 0.9399 2.8247 0.4833 2.4898 0.7984 1.6509 BDENSITY 0.4486 1.4896 0.3293 1.1452 CLAY 0.428 2.883 0.3121 2.3552 3.4562 5.7047 2.6453 2.8373 RADNI 4.2558 6.357 RH2MIN KSAT contribution Model 2b contribution permutation 2.4471 1.293 3.2273 4.8328 2.5394 - - - - - - - - 0.8174 1.2585 NUTRIENTS 2.8679 2.9751 3.6106 5.8895 3.1319 SOLDEPTH - - 1.966 1.4443 - SOLPAWHC 2.5854 1.4576 - 3.0532 permutation 2.9543 1.4672 - International Journal of Geographical Information Science: Special Issue Paper Model 1a Variable Model 1b contribution permutation FERT 1.4663 1.2863 GEOLLMEANAGE 4.0206 2.0491 contribution 4.1638 Model 2a permutation 2.0385 GRAVITY Model 2b contribution permutation contribution permutation 1.5678 1.3989 1.4921 1.2053 4.3935 2.5159 4.8102 2.8582 0.7857 1.32 0.7194 2.9548 EROSIONAL 1.5473 1.1891 0.4266 1.0958 0.6216 1.3208 0.4793 1.2603 MRRTF 0.6365 13.4384 0.6205 12.2037 0.6462 20.9168 0.5587 12.8779 MRVBF 0.218 2.435 0.2383 2.1633 0.1072 2.4392 0.2043 5.3782 SLOPE 1.9549 2.7581 2.3919 1.9169 2.0103 3.8092 2.5314 5.225 VALLEYBOTTOM 1.0748 11.3015 0.8559 5.9536 0.9963 9.6859 0.8375 7.5376 International Journal of Geographical Information Science: Special Issue Paper Table 6. GDM model results for vascular plants across the Australian continent. Percent deviance explained, sum of coefficient values and intercept. Different models were generated using substitutable subsets of environmental variables. Model 1 evaporation and precipitation. Model 2: precipitation deficit and aridity. The two substitutable soil variables – soil depth and water holding capacity – did not contribute minimum levels of partial deviance explained to be retained in the final model. Model Number of predictors1 % deviance explained (> 0.02) Intercept 50.33 Sum of coefficient values for all predictors 26.02 Model 1 27 Model 2 28 50.17 24.48 1.4462 1.4468 1. Number of predictors includes sampling covariates and geographic distance predictor. Table 7. Summary of relative contribution (sum of coefficient values) and partial deviance explained (percent) by variables included in the two GDM models. Results for individual variables are given in Table 8. Model 1 variable water Summed coefficients (relative contribution %) 11.25 (43.25%) energy soil Model 2 1.10 Summed coefficients (relative contribution %) 9.06 (37.02%) 7.49 (28.78%) 0.77 8.40 (34.33%) 1.20 1.35 (5.18%) 0.35 1.36 (5.56%) 0.36 geoscience 2.06 (7.90%) 0.43 2.08 (8.51%) 0.43 terrain 0.91 (3.51%) 0.12 0.98 (3.99%) 0.13 other 2.96 (11.38%) 0.56 2.60 (10.60%) 0.54 Partial % deviance explained Partial % deviance explained 1.16 International Journal of Geographical Information Science: Special Issue Paper Table 8. Relative contribution (sum of predictor coefficient values) and partial of deviance explained (%) by variables included in the two GDM models for vascular plant compositional dissimilarity. Shading identifies the variable groups: two climate, three substrate and covariates summarised in Table 7. COVARPLANTS refers to two sampling covariates which take into account sampling inadequacies through the number of species and observation records aggregated at the 0.01° grid scale (see Williams et al. 2010). Model1 Partial Deviance Explained (%) Relative Contribution Model2 Partial Deviance Explained (%) Variable Relative Contribution RAINX 5.460645 0.17308 - - EVAPX 1.125213 0.090956 - - EVAPI 0.840935 0.069545 - - ADEFX - - 3.113301 0.028234 ADEFI - - 1.525199 0.243337 RPRECMIN 1.115339 0.066047 1.586704 0.154569 SLRAIN1 2.466874 0.645147 2.608743 0.676711 SLRAIN2 0.24507 0.053681 0.227876 0.056239 MAXTX 1.131683 0.02804 0 0 MAXTI 2.006137 0.148737 2.736282 0.398598 TMAXABSX 0.314126 0.024183 0.413698 0.030735 RADNX 0.959297 0.10692 1.126785 0.065975 RADNI 0.729256 0.123138 0.982256 0.253473 TRNGX 1.025938 0.095511 1.081095 0.096862 RTIMIN 0.653083 0.112414 0.87582 0.205535 RH2MAX 0 0 0.525599 0.034251 WINDSPMIN 0.669577 0.133405 0.662541 0.113334 HPEDALITY 0.212145 0.127517 0.208824 0.125604 COARSE 0.18639 0.055869 0.182173 0.053381 CLAY 0.446703 0.095465 0.451461 0.096171 CALCRETE 0.148287 0.041682 0.173019 0.059234 BDENSITY 0.354797 0.030082 0.344541 0.028426 WII_WGS1KB 0.294236 0.056399 0.305238 0.057207 GRAVITY 1.152121 0.159316 1.161817 0.162091 GEOLLRNGEAGE 0.609643 0.211048 0.615012 0.214859 EROSIONAL 0.122406 0.028733 0.120382 0.028761 ROUGHNESS 0.47921 0.053093 0.498765 0.056615 MRVBF 0.310472 0.037011 0.358424 0.047646 COVARPLANTS (2) 0.187043 0.021233 0.187441 0.033362 GEOGRAPHIC DISTANCE 2.77459 0.537892 2.407314 0.506649 International Journal of Geographical Information Science: Special Issue Paper Table 9. Overview of variables used in MaxEnt and GDM models compared with variables tested. Model details are given in Table 5 for the MaxEnt analysis and Table 8 for the GDM analysis. Group Variable MaxEnt 1a MaxEnt 1b MaxEnt 2a GDM 2 Included ADEFI 1 1 ADEFX 1 1 ARID_MAX ARID_MIN water 1 MaxEnt 2b GDM 1 1 1 1 2 EVAPI 1 1 1 3 EVAPX 1 1 1 3 RAINI 0 RAINX 1 RPRECMAX 1 1 1 RPRECMIN 1 1 SLRAIN1 1 SLRAIN2 1 1 1 3 1 3 1 1 4 1 1 1 1 6 1 1 1 1 5 SRAIN1MP 0 SRAIN2MP 0 MAXTI 1 1 1 1 MINTI 1 1 1 1 4 MINTX 1 1 1 1 4 MAXTX 1 1 1 RADNI 1 1 1 1 1 4 RADNX 1 1 1 1 1 5 RH2MAX 1 1 1 1 1 5 1 1 RH2MIN 2 RTIMAX 0 RTIMIN energy 1 1 RTXMAX RTXMIN 2 0 1 1 1 3 TMAXABSX 1 1 2 TMINABSI 0 TRNGA 0 TRNGI 0 TRNGX 1 1 1 1 1 1 6 VPD2MAX 0 VPD2MIN 0 WINDRI 0 WINDRX 0 WINDSPMAX 0 WINDSPMIN BDENSITY 1 1 1 1 CALCRETE soil 6 CLAY COARSE 1 1 2 1 1 4 1 1 2 1 1 4 1 1 2 1 1 DATASUPT HPEDALITY KS_ERR 0 2 0 International Journal of Geographical Information Science: Special Issue Paper Group Variable MaxEnt 1a KSAT NUTRIENTS MaxEnt 2a MaxEnt 2b GDM 1 GDM 2 1 1 4 1 2 1 1 SOLDEPTH SOLPAWHC MaxEnt 1b 1 1 1 1 1 2 WR_UNR GEOLLMEANAGE 0 1 1 1 1 GEOLLRNGEAGE geoscience FERT 4 1 1 GRAVITY 1 1 1 1 1 2 3 1 1 MAGNETICS terrain Included 4 0 WIII_WGS1KB1 - - - - 1 1 2 EROSIONAL 1 1 1 1 1 1 6 MRRTF 1 1 1 1 MRVBF 1 1 1 1 4 1 1 RELIEF 0 RIDGETOPFLAT 0 ROUGHNESS SLOPE 1 1 1 1 1 1 1 1 1 TWI VALLEYBOTTOM 6 1 2 4 0 1. Weathering intensity index not tested in the MaxEnt models. 4 International Journal of Geographical Information Science: Special Issue Paper Figure 1. Predicted probability of presence of Eucalyptus delegatensis for four alternative MaxEnt models for the natural distribution. White areas indicate prediction values <0.1 or fall outside the analysis domain.