1 Supporting Information Appendix S1. Species distribution modelling details Table S1. Background on derivation of environmental variables for each grid cell used in the modelling process. Variable roads towns Perth intensive land use habitat diversity protected areas coast creeks distance water distance coast remnant veg water forestry visitor centres DEC web DEC named Derivation Total length of all sealed and unsealed roads (not including unmapped tracks) in each grid cell (km) Averaged for each grid cell from a 100m raster of Euclidean distance from the nearest town with a population over 10,000 (km) Averaged for each grid cell from a 100m raster of Euclidean distance from the capital city Perth, where 75% of the population of the area live (km) Total area of intensive land use, including residential, industrial, commercial, recreational, intensive animal production and horticulture, calculated per grid cell (Department of Agriculture and Food 2006)*. Converted into presence/absence to deal with skewed distribution (0=absent,1=present) Pre-European extent from the best large-scale vegetation map of 376 plant associations in Western Australia (Beard 1980a; Beard 1980b)**, overlaid with a current remnant vegetation layer. Number of different habitat types currently present within each grid cell calculated. Number of protected areas in each grid cell calculated from a map of all protected areas (including IUCN levels 1-6, unallocated Crown Land and Freehold, and State Forest/Timber Reserve) managed for conservation (by WA Department of Environment and Conservation) Coastline present in any part of the grid cell. Converted into presence/absence to deal with skewed distribution (0=absent,1=present) Total length of all permanent and semi-permanent watercourses in each grid cell (km) Averaged for each grid cell from a 100m raster of Euclidean distance from the nearest saltwater or freshwater body (creek, river, ocean, lake, pool, estuary) (km) Averaged for each grid cell from a 100m raster of Euclidean distance from the coastline (km) Total summed area of vegetation per grid cell calculated from the current remnant vegetation layer (km2) Permanent water land use present in any part of the grid cell (reservoir/dam, lake, estuary, or creek/river), calculated by merging the data for ‘creeks’ and ‘water’, then converting into presence/absence to deal with over-dispersion (0=absent,1=present) Total area of forestry land use, including plantation or production, calculated per grid cell (Department of Agriculture and Food 2006)*. Converted into presence/absence to deal with skewed distribution (0=absent,1=present) Averaged for each grid cell from a 100m raster of Euclidean distance from the nearest tourist visitor centre (km) At least one protected area in the grid cell advertised on the internet: http://www.dec.wa.gov.au (WA Department of Environment and Conservation) (0=absent,1=present) At least one named (gazetted or State approved) protected area (WA Department of Environment and Conservation), where un-named areas are largely unofficial or 2 newly acquired lands (0=absent,1=present) DEC trails At least one marked trail (not a sealed or unsealed road) in a protected area (WA Department of Environment and Conservation), calculated from a review of information available for all protected areas and supplemented by overlaying a map of tracks over the existing layer of DEC protected areas (0=absent,1=present) urban Point layer data of all WA towns and cities buffered by 100m and merged with urban land use (residential) layer (Department of Agriculture and Food 2006)*. Total area calculated per grid cell and converted into presence/absence to deal with skewed distribution (0=absent,1=present) recreation Recreational land use present (e.g. local parks, gardens, cultural services) (Department of Agriculture and Food 2006)*. Converted into presence/absence to deal with skewed distribution (0=absent,1=present) agriculture Total area of dryland agricultural land use (grazing modified pastures, cropping, seasonal horticulture), calculated per grid cell (Department of Agriculture and Food 2006)* threatened sp Grid overlaid on partitioned atlas survey records from between 1998 and 2002 of only threatened species (Wildlife Conservation Act 1950 in WA) to calculate number of threatened species detected in this time for each grid cell. Converted into presence/absence to deal with skewed distribution (0=absent,1=present) *Land use mapped 1: 25 000 in urban areas, 1:100 000 in agricultural areas and 1:250 000 in pastoral zones ** Remnant habitat type mapped at 1:250,000, describing pre-cleared Western Australian vegetation types (Beard 1980a; Beard 1980b) 3 Table S2. GLM of the importance of prior knowledge for predicting future surveys (response variable: number of surveys per grid cell 2003–2007) Covariates Intercept Number of surveys per grid cell 1998 to 2002 (standardised) Null deviance Residual deviance Explained deviance Estimate Std. Error z value 0.31 0.01 35.04 0.80 0.02 1773.5 1155.5 0.35 45.46 Pr(>|z|) <0.001 <0.001 4 Table S3. Model parameters for the three accessibility models. Model Covariates Estimate tourism roads Intercept water DEC trails roads visitor centres -1.43 0.61 0.72 1.79 -0.11 Std. Error 0.05 0.21 0.13 0.13 0.10 access roads Intercept roads intensive land use -1.40 1.80 0.60 roads Intercept roads -1.37 2.07 z value Pr(>|z|) -30.61 2.97 5.67 13.69 -1.13 <0.001 0.003 <0.001 <0.001 0.257 0.05 0.13 0.13 -30.65 13.95 4.73 <0.001 <0.001 <0.001 0.05 0.12 -29.91 17.11 <0.001 <0.001 5 Figure S1. Distribution of model parameters showing (a) Probability of survey using logistic regression and a threatened species and protected areas hypothesis (‘conservation concern’), with important explanatory variables of (b) frequency of protected areas per grid cell, (c) number of different habitats per grid cell, (d) detection of a threatened species during the main atlas period in 1998–2002, and (e) road density, a single variable that explained a high level of deviance. 6 (a) (b) 0.8 5 Predicted probability of survey Density. Bandwidth= 0.05478 0.7 4 3 2 1 0.6 0.5 0.4 0.3 0.2 0.1 0 0.0 0.4 0.8 absence presence predicted value actual cell value 2003-2007 AUC= 0.829 (c) 1.0 True positive rate 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 False postive rate Figure S2. Discrimination capacity of the optimal distribution model developed for surveys between 2003 and 2007 in the south-west biodiversity hotspot, showing (a) distribution of predicted probability values associated with either surveyed (solid line) or unsurveyed (dotted line) cells, (b) a boxplot of absence vs. presence values for 2003–2007 relative to their predicted probability of being surveyed, and (c) the ROC curve (AUC = 0.829). 7 (a) (b) 4 Predicted probability of survey Density. Bandwidth= 0.0531 0.8 3 2 1 0.6 0.4 0.2 0 0.0 0.4 0.8 absence presence predicted value actual cell value 2008-2011 AUC= 0.729 (c) 1.0 True positive rate 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 False postive rate Figure S3. Discrimination capacity of the optimal distribution model (‘conservation concern’), tested with surveys between 2008 and 2011 in the south-west biodiversity hotspot, showing (a) distribution of predicted probability values associated with either surveyed (solid line) or unsurveyed (dotted line) cells, (b) a boxplot of absence vs. presence values for 2008–2011 relative to their predicted probability of being surveyed, and (c) the ROC curve (AUC = 0.729).