NaturePrint v2.0: Elements and integration A technical explanation of the NaturePrint v2.0 analysis Purpose of facts sheet This facts sheet is a detailed, technical description of each of the elements of NaturePrint v2.0. If you are interested in a short description of NaturePrint v2.0 and its potential uses, please refer to the Introduction to NaturePrint v2.0 facts sheet. The first part of this facts sheet describes the species information including how we analysed species records, generated species assemblage distribution models, incorporated plants and fish, categorised wetlands, treated threatened species records and recognised important differences between feeding and breeding habitat for threatened birds and bats. A description of how we generated the species distribution models (SDMs) is available in a separate facts sheet: NaturePrint: Terrestrial species distribution models. The first part of this facts sheet explains when the primary data was captured from standard databases, how non-standard data was generated, our logic for including and excluding data types and the methods we used to generate the inputs for the optimisation analysis. This document also describes some of the approaches we explored that were unsuccessful. The second part of this facts sheet describes how these elements were used in the optimisation analysis to generate the integrated Strategic Natural Values map. We explain how the optimisation analysis works to rank each pixel (75 x 75 m cell) across Victoria and describe the spatial surfaces we used to mediate this process including habitat condition, connectivity, and whole of landscape connectivity. In the final part of the facts sheet we describe how we generated the risk layers: probability of clearing and weed invasion and incorporated these into the integrated analysis using the concept of retention. Part 1: Generating species information There are over 3500 plant species in Victoria - how were they incorporated in NaturePrint v2? 1. Zonation – the software we have used to undertake the integrated statewide analysis – can use species level inputs. We decided against building SDMs for each of Victoria’s plant species for the NaturePrint v2 analysis because: o If we generated SDMs for each plant species we would need to have 5000 inputs into Zonation. We currently do not have powerful enough computers to consider so many inputs at a useful scale and extent using Zonation or any other spatial prioritisation tool. o We want to incorporate species from different groups (birds, mammals, plants etc) equitably into the analysis irrespective of their relative richness. 2. Given the reasons described above, we decided to integrate all our species level data in an ecologically sensible way such that we had fewer inputs. So we developed 100 terrestrial ‘assemblages’ (or spatially co-occurring ‘communities’) of vascular plants and vertebrate animals as inputs into the analysis. 1 Actions for Biodiversity Conservation How were the terrestrial assemblages generated? 3. In summary, we used Victorian vegetation quadrats and terrestrial vertebrate fauna SDMs (see NaturePrint: Terrestrial species distribution models) to find groups of species that co-occur. All of these sites with their constituent taxa were then analysed to find patterns and coherent groups within the data. We then modelled the distribution of these groups, or ‘assemblages’ in the same way that we might model species distributions (SDMs). Paragraphs 4-13 describe these methods in detail. 4. The data. We noted the presence or absence of 3228 plant species at each of 30,423 quadrat sites across Victoria. We extracted this data from the Victorian Biodiversity Atlas in September 2011, which equates to the 2010-data release. We checked each of the first generation vertebrate SDMs at each quadrat location to see whether habitat for that species was more than 70% likely to be found at that location. This produced a matrix of 30,423 sites x 3722 taxa for analysis (see Figure 1). Figure 1. Example of the matrix of plant and terrestrial fauna present (signified by ‘1’) and absent (signified by ‘0’) in each quadrat. The pattern of presence and absence is used to cluster for assemblages. Quadrat Location Plant taxa x a b c d 3228 a b c d 1 0 1 1 0 1 0 1 0 2 1 0 1 1 1 1 0 0 3 0 0 0 1 0 1 1 0 4 1 1 0 0 1 0 1 1 y Terrestrial fauna SDMs 494 30, 423 5. We then wanted to look for similar patterns across the matrix as these would tell us about groups of species that co-occur. There are many ways / methods / approaches to aggregating and classifying biological data. Some of these have been around for a while, and all have their strengths and weaknesses. One of the key limitations to clustering these data is purely computational - many intensive numerical methods simply cannot be usefully applied to such a large dataset with existing computing power. We investigated many clustering methods, and selected the ‘self-organising maps’ algorithm for its tractability and useful outputs. The number of assemblages selected to be discovered in the data was arbitrary, but with a view to the computational efficiency of the spatial optimisation process. 6. Each of the 100 assemblages was checked to see whether the groupings of species made ecological sense. One assemblage consisted of a single quadrat site and another assemblage consisted of outliers such as beaches and highly disturbed sites. Both were removed from further consideration. The remaining 98 were considered ecologically useful. Modelling the distribution of the assemblages 7. The above methods effectively provided ‘presence’ data for each assemblage. We also need data about the geographic ‘absence’ of assemblages to create distribution models. One method to assign absences could be to randomly distribute them across Victoria. However, this has the potential to miss rare or very localised 2 Actions for Biodiversity Conservation types of the environment such as water bodies or rocky outcrops which might be important in predicting where the assemblage occurs and doesn’t occur. 8. Our aim was to stratify absences within environmental types to ensure we don’t miss important but geographically restricted types. To do this we needed to know where environmental types occur across Victoria and we used a similar procedure as for generating the assemblages. We clustered a subset of the spatially explicit or mapped environmental predictor data (see appendix) for south east Australia into ‘environmental types or domains’. 9. We then assigned an equal number of absence data randomly to each of these environmental types and these 200,000 random stratified sites formed the pool from which random absences could be drawn. 10. Once we had presence data for our 98 assemblages and a stratified pool of potential absence data, we modelled the spatial distribution of assemblages using Random Forests. This is a machine- learning process that aggregates hundreds of randomly generated decision trees. Spatial expressions of the assemblages were produced at a resolution of 75 m (Figure 2). Figure 2. Spatial expression of the assemblages. Mapping assemblages 11. The distributions of assemblages are continuous and overlapping. That is, at each pixel you might have a 10% probability of assemblage A, 15% probability of assemblage B, etc. 12. The combination of these 98 assemblage distribution models can be used to form a ‘maximum likelihood’ view of which assemblage is most likely to be present at each pixel. Our flat map of the distribution of the 98 terrestrial assemblages shows the assemblage that has the maximum likelihood of all the assemblages for the pixel (Figure 3). 13. This map also helps to inform us about targeting areas to collect new biodiversity information (e.g. survey for plants and animals) where our knowledge is poor and potentially affecting our decision-making. That is, where the maximum likelihood is low, we are less certain about what terrestrial biodiversity occurs in those locations, and the map can be used to stratify future sampling efforts. 3 Actions for Biodiversity Conservation Figure 3. Maximum likelihood of assemblages How are freshwater species included in NaturePrint v2? 14. Freshwater fish assemblages (including yabbies and freshwater crayfish) were derived from the Sustainable Rivers Audit (SRA) data. Although the SRA data doesn’t provide information on small tributaries, it is repeated sampling over the long term, it can broadly be interpreted as presence/ absence data and is relatively comprehensive for Victoria. 15. The assemblages were found from the SRA data using a clustering method called expectation maximisation (with default settings). Expectation maximisation clustering discovers the number of groups in the data that satisfy predetermined levels of difference/similarity. 16. These methods found 17 fish communities/assemblages in the SRA data. The assemblages were considered ecologically reasonable by fish biologists from DSE’s Arthur Rylah Institute. 17. The distribution of the fish assemblages was derived using the same methods as the terrestrial assemblage distributions, that is stratifying ‘absences’ using the environmental domains and modelling spatial distribution using Random Forests (see above for more detail). Similar to the terrestrial assemblages, the distribution of the fish assemblages are continuous and overlapping and can by mapped on the maximum likelihood view with the terrestrial assemblages. How did we generate assemblages for areas that don’t have vegetation quadrats, for example wetlands? 18. We looked at the waterfowl data to see if there were places that people went regularly over time and recorded birds. We hoped to use the waterfowl dataset to generate ‘bird quadrats’ specific to wetlands, similar to the concept of vegetation quadrats. We found that some wetlands had thousands of repeat surveys while other wetlands had only been surveyed once. As a consequence, wetlands with thousands of repeat surveys invariably had a comparatively high recorded species richness and those that had few visits had a low recorded species richness. These differences tended to swamp the expected differences between bird assemblages at hydrologically/geomorphologically different wetlands. While there are many potential approaches to subsample from the data to remove these biases, this is currently beyond the capacity of NaturePrint v2. 19. In NaturePrint v2.0 we included wetland types based on Corrick’s 53 major and minor mapping units as surrogates for wetland communities. As new data becomes available in the future we intend to review the modelling and representation of wetlands assemblages. 4 Actions for Biodiversity Conservation How are threatened species treated in NaturePrint v2? 20. We incorporated post-1970 records of 777 VROT plant and animal species as point files in the optimisation. See NaturePrint: Terrestrial species distribution models for further information on accuracy of records. We have also incorporated the VROT fish ‘points’ as reaches. The point data and the modeled assemblage data are both concurrently part of the optimization objective (see How does the NaturePrint analysis rank each pixel across Victoria? for details). 21. The following dot points explain our reasons for including point records in the analysis as well as assemblage distribution models when we have previously explained that SDMs are much more useful than points. o Outliers in species records can be either erroneous or true records. If true, the outliers can be relatively more important than the majority of other records in telling us about the types of environments in which species live and hence inform the species distribution model. However, models are generally more interested in the core body of the data that describes the environment of species. We build processes into how we generate models to look after outliers but there is only so much we can do. This incorporation of rare and threatened species records directly into the optimisation dilutes this problem. o In a species distribution model we make a prediction about the probability (between 0 and 1) of the species’ habitat occurring in each 75 m pixel across the state. The probability of occurrence in a pixel where the species was actually detected, given assumptions of accuracy of identification and location, is probably approaching 1. That is, the probability that the pixel forms part of the actual distribution is quite high. The model doesn’t explicitly know that and could assign a lower probability to the pixel where the species was found. o There are some phenomena in the environment that we can’t currently predict very well with our existing environmental data. For example, some locations that have high utility for rare and threatened species are very difficult to ‘describe’ in terms of the environmental variables such as climate, terrain and soil. In these cases there are likely to be other drivers or habitat features that we either do not yet understand, or we find it difficult to produce useful spatial environmental surfaces to predict a species occurrence (e.g. it is currently very difficult to produce a credible spatial representation of tree hollows across the state, yet we know that these habitat features are critical to key taxa). In these instances, retaining point data can provide extra information about locations (e.g. a good example of this in Victoria is the Chiltern region where many rare and threatened species persist despite its apparently ‘poor’ landscape connectivity and condition). o There is capacity in Zonation spatial optimisation software to combine both the naturalist view and the data view. This approach seems reasonable – given that the naturalist view embodies a significant component of our stakeholder base and directly shapes our priority setting. Considering breeding habitat in the optimisation analysis 22. Some fauna species have two distinct and important habitat types. We reflected the importance of both types in the optimisation analysis by splitting the site records of selected (see below) species into two subsets per species: 1) records collected during the breeding season (or from colonial roosting/breeding sites) and 2) all other records (within accuracy constraints). We included each sub-set of records as separate ‘species’ in the analysis (e.g. ‘Red-tailed Black Cockatoo feeding’ and ‘Red-tailed Black Cockatoo breeding’). 23. Only rare and threatened bird and bat species were considered for inclusion using this method. 24. The birds included Australasian Bittern, Red-tailed Black Cockatoo, Whiskered Tern, Lewin’s Rail, Turquoise Parrot, Powerful Owl, Regent Parrot, Superb Parrot, Regent Honeyeater, Little Tern, Fairy Tern and Hooded Plover. The bats included the Southern and Eastern Bentwing Bats and the Eastern Horseshoe Bat. 25. Future NaturePrint versions could consider two types of habitat for fish e.g. spawning and feeding. This is partly accounted for in the aquatic assemblages described above. 26. All VROT site records data (168 separate text files indicating locations at 75 m resolution) were supplied to Zonation and treated in the same way as the assemblage models. That is, each record of each species was used in the analysis. 5 Actions for Biodiversity Conservation Part 2: Producing the integrated maps using optimisation How does the NaturePrint analysis rank each pixel across Victoria? 27. The software we used to do the analysis for the integrated maps is called Zonation. Zonation seeks to satisfy its ‘objective’ in a balanced way while iteratively removing pixels (reducing the area of the state). For NaturePrint v2.0, the objective is to maximise the area (i.e. number of pixels) retained for each of the values (assemblages, and VROT points), while also maximising their retention in areas of high landscape connectivity and condition. It also has a preference for a more complete (i.e. vegetated), compact, connected landscape, based on the logic that it is cheaper, easier and more likely to succeed to manage a more aggregated landscape. 28. Zonation starts by removing the pixel which is the least useful in meeting the objective, and then reassesses the whole stack of information again. The removal of a pixel fundamentally changes the relative importance of the remaining pixels. Therefore, the pixel to be removed in the next iteration is not necessarily the pixel that was judged ‘second least useful’ in the prior iteration. This is why Zonation is iterative as opposed to ranking the whole landscape in a single pass. 29. The process is repeated until all the cells are removed. The order in which cells are removed is recorded resulting in a ranking - this ranking determines the colours of the values gradations on the Strategic Natural Values map. The result is not a single set of sites rather it is a complete and nested solution. 30. For further technical explanation please see the Zonation user manual: http://www.helsinki.fi/bioscience/consplan/software/Zonation/Zonation_UserManual_v1.0.pdf What are the different integrated maps produced in NaturePrint v2.0? 31. NaturePrint v2 includes two integrated products: the Strategic Natural Values map and the Priorities for Action map. 32. The Strategic Natural Values map: protecting the full range of biodiversity values across Victoria aims to maximise the biodiversity values protected while minimising the amount of area required to achieve this. Based on existing data it ranks the relative contribution that each place (75 x 75 m pixels) makes towards protecting the full range of biodiversity values across Victoria. 33. The Priorities for Action map: re-considers the same ranking problem as above except that it is recalculated in the context of the potential for values to be lost or degraded over the next ten years (see below for detail). Apart from the most unique and irreplaceable sites, the ranking of many sites will change. It is intended that the map will be used to assist in identifying candidate areas for actions to achieve biodiversity conservation (actions may include combinations of regulation, publically-funded activities and strategic planning). 34. Both the Natural Values map and the Priorities for Action maps were generated by optimising the species and assemblage information described above along with the context, condition, retention and risk information described below. The Strategic Natural Values map 35. The optimisation for the Strategic Natural Values map seeks the objective described above and is mediated by two separate surfaces: connectivity and condition. Connectivity 36. We approached connectivity differently in NaturePrint versions 1 and 2. In NaturePrint v1 we incorporated connectivity into the ranking analysis in two ways. 6 Actions for Biodiversity Conservation o Each species distribution model was ‘smoothed’ at a spatial scale relevant to each species’ capacity to move and disperse. This smoothing process was indicative of the amount of habitat within a region. Each of these smoothed versions of the species distribution models (in addition to the ‘original’ species model) was included in the analysis. o We attempted to include preferred and potential movement pathways between selected large remnants of discrete but broadscale habitat types using circuit theory modelling in the software “Circuitscape”. Moilanen (2011) clearly explains some of the problems we encountered with this approach. 37. NaturePrint v2 does not incorporate these approaches to connectivity because of the significant changes to the nature of the version 2 inputs and the greatly expanded complexity of the analysis - both in terms of the numbers of inputs and the resolution of these inputs. 38. NaturePrint v2 employs more generalised methods to yield a connected and aggregated (and also more computationally tractable) solution. These methods, Edge Removal and Landscape Connectivity, are based on the logic that it is generally cheaper to connect vegetated areas if they are proximal. Edge removal 39. The pixel removal procedure was scheduled so that any new pixels chosen for removal could only be selected from the pixels remaining at the interface between the already removed and the retained pixels. 40. As a consequence the sequence of pixel removal has profound implications for how the analysis proceeds. To counter some of the perverse outcomes that may result from this rule, Zonation was initiated with preseeded edges within the state. These were regions that were defined as treeless and urban including the state boundaries and coastline. Whole of landscape connectivity 41. We used DSE’s landscape connectivity surface to regulate the pixel ranking process. This was achieved by using landscape connectivity as a ‘cost’ surface. That is, places with a higher landscape connectivity were assigned a lower cost. Remnant vegetation was uniformly assigned the lowest ‘cost’ and as we moved beyond the edge of remnant vegetation (as defined by NV extent 2005) costs increased linearly as connectivity values diminished. Consequently if all other inputs to the pixel ranking decision are equal, pixel removal will proceed by first selecting pixels with the lowest landscape connectivity. 42. An alternative to using landscape connectivity to produce pixel ranking solutions that are aggregated would be to use a boundary length penalty function (BLP) in Zonation. However, using boundary length penalty for such a large and complex analysis (i.e. in terms of the number of pixels to juggle in each iteration) is computationally intensive and the analysis would take several months to complete even using a very powerful computer. Condition 43. We further regulated and mediated the pixel ranking process using site condition. Site condition is a modelled surface representing understorey condition based on habitat hectares. For more information on the derivation of this surface, please see: Newell, G., White, M., Griffioen, P. and Conroy, M. (2006) Vegetation condition mapping at a landscapescale across Victoria. Ecological Management and Restoration 7: S65-S68 Kocev, D., Dzeroski, S., White, M.D., Newell, G.R. and Griffioen, P. (2009) Using single and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecological Modelling 220(8): 1159-1168 Department of Sustainability and Environment (2008) Native Vegetation Net gain accounting - first approximation report. State of Victoria, Department of Sustainability and Environment, East Melbourne. 7 Actions for Biodiversity Conservation 44. Consequently if all other inputs to the pixel ranking decision are equal, pixel removal will proceed by first removing pixels with the lower condition – retaining pixels with the higher condition. 45. In effect we are supplying the ‘pre-European settlement’ view of the distribution of each assemblage – see the outer polygon in Figure 4 – and the modelled condition of the assemblage currently. In the simplified example in Figure 4, the condition is 5%, 10%, 20%, 40% and 50% of the maximum habitat hectares score. In this hypothetical example, the polygons with 40 and 50% represent the extant extent of the assemblage. 46. Zonation attempts to balance the proportion of habitat lost across all species (in our case assemblages) at each step of the ranking process. Therefore habitat and condition interact rather like habitat hectares – as each species' habitat is removed (all other things being equal) the lowest quality habitat will be removed first. Figure 4. Hypothetical condition in each part of the distribution of an assemblage. proportion of maximum condition score. Percentages are the Part 3: Incorporating risk in the integrated map 47. The Priorities for Action map further refines the analysis by incorporating some of the risks posed to the persistence of the values we are attempting to protect. A pixel’s place in the statewide ranking may alter if we also preference locations where values are insecure and potentially may be lost over those locations where the values will be retained and are secure. 48. In order to implement this in the analysis we included a risk horizon of ten years and attempted to impute the potential of substantially losing values as a consequence of land-use intensification or weed invasion. Of course, these are not the only threats to the persistence of biodiversity values, however these are two threats for which we were able to generate credible statewide surfaces. As work progresses we hope to bring realistic spatial representations of other threats. 49. In summary, the Priorities for Action map could be considered a proof-of-concept demonstrating how we might incorporate some major risks but it cannot be interpreted as a comprehensive picture of priorities for action as it only incorporates two types of potential loss. Retention 50. The logic here is there is less need to reserve and manage sites where there are no anticipated threats. Zonation incorporates the idea of trading sites at risk over secure sites by including a “retention” layer. “Retention” like condition and other ‘cost-type’ inputs further mediates the pixel removal process – all other things being equal pixels that are subject to greater risk (in this case via land use change and weed invasion) will be retained before pixels subject to less risk. 8 Actions for Biodiversity Conservation How did we predict the probability of loss due to land use intensification over the next ten years? 51. We aimed to give every pixel across Victoria a probability of loss of values over the next ten years. Loss includes clearing and intensive agricultural use such as addition of superphosphate or change from grazing to intensive cropping. However, data is limited and the following paragraph explains how we estimated the probabilities. 52. The Net Gain Accounting Report estimates that we lose, on average, 4,500 ha of native vegetation per year across Victoria. This equates to 45,000 ha lost over next ten years. Almost all of this loss will be from private land either through accident or intentional efforts. 53. We started with each of the pixels that currently support native vegetation on private land having an equal probability of being lost at the background level of clearing. Then we re-weighted each pixel by the historic level of clearing within each environmental domain (see paragraph 8 for a description of environmental domains). How did we predict the relative probability of weed invasion over the next ten years? 54. We aimed to give every pixel across Victoria a probability of loss of values over the next ten years. Loss includes gross degradation due to weed invasion. However, data is limited and the following paragraphs explain how we estimated the probabilities. 55. We identified transformer weed species in Victoria. ‘Transformer species’ are a subset of invasive plants that have the capacity to change the character, condition, form or nature of ecosystems over substantial areas relative to the extent of that ecosystem. 56. We used DSE vegetation quadrat data to identify places where transformer species have been detected. 57. We used MAXENT modelling process (www.cs.princeton.edu/~schapire/maxent/) to build a presence only distribution model of transformer species using environmental predictor variables described above along with variables related to propagule pressure such as distances from edges, land uses, habitation and road types. 58. The resultant map presents as a surface predicting the ‘probability’ that a pixel has a transformer weed present. This is not the same as losing all the values at the site. Therefore we rescaled this to produce a relative risk of loss due to weed invasion employing the upper and lower bounds of the clearing probabilities. 59. These two relative risk surfaces were added together to create the ‘retention’ layer for inclusion in the Zonation analysis. Appendix Spatially explicit environmental predictors used to model the distribution of environmental domains (see paragraph 8). We have only used environmental predictors that are mapped comprehensively across the whole of Victoria. Potential predictor variables include gridded climate data derived from the ANUCLIM model, an independent data set of radiometric data that is a useful surrogate for soils and regolith, and a range of variables derived from digital elevation modelling such as terrain wetness and height above water feature. Note: Ecological Vegetation Classes are not used as environmental predictor variables for generating distribution models. See NaturePrint: Terrestrial species distribution models for further explanation. Terrain related variables Climate Anisotrophic Heating Index Evaporation - January Annual Insolation Evaporation - July Visible Sky (or exposure) Index Mean max daily temp - July 9 Actions for Biodiversity Conservation Terrain Wetness index Mean max daily temp - June Soil Rainfall - January Radiometrics (Potassium / Thorium) Rainfall - July Radiometrics (Inverse Potassium /Thorium ) Raindays - January Satellite data and derived indices Raindays - July ModisMean_EVI_Jan2000_June2011 Water ModisMean_MIR_Jan2000_June2011 Log of horiz. distance from major stream + vertical distance above stream Geographic location Log of horiz. distance from all mapped streams + vertical distance above all mapped streams Location X = easting for study area / log10 Log of horiz. distance from saline waterbody + vertical (altitude) distance above waterbody Location Y = northing for study area / log10 (altitude) 10