NaturePrint technical fact sheet - Department of Environment, Land

advertisement
NaturePrint v2.0:
Elements and integration
A technical explanation of the NaturePrint v2.0 analysis
Purpose of facts sheet
This facts sheet is a detailed, technical description of each of the elements of NaturePrint v2.0. If you
are interested in a short description of NaturePrint v2.0 and its potential uses, please refer to the
Introduction to NaturePrint v2.0 facts sheet.
The first part of this facts sheet describes the species information including how we analysed species
records, generated species assemblage distribution models, incorporated plants and fish, categorised
wetlands, treated threatened species records and recognised important differences between feeding
and breeding habitat for threatened birds and bats. A description of how we generated the species
distribution models (SDMs) is available in a separate facts sheet: NaturePrint: Terrestrial species
distribution models.
The first part of this facts sheet explains when the primary data was captured from standard databases,
how non-standard data was generated, our logic for including and excluding data types and the
methods we used to generate the inputs for the optimisation analysis. This document also describes
some of the approaches we explored that were unsuccessful.
The second part of this facts sheet describes how these elements were used in the optimisation analysis
to generate the integrated Strategic Natural Values map. We explain how the optimisation analysis
works to rank each pixel (75 x 75 m cell) across Victoria and describe the spatial surfaces we used to
mediate this process including habitat condition, connectivity, and whole of landscape connectivity.
In the final part of the facts sheet we describe how we generated the risk layers: probability of clearing
and weed invasion and incorporated these into the integrated analysis using the concept of retention.
Part 1: Generating species information
There are over 3500 plant species in Victoria - how were they incorporated in NaturePrint
v2?
1. Zonation – the software we have used to undertake the integrated statewide analysis – can use
species level inputs. We decided against building SDMs for each of Victoria’s plant species for the
NaturePrint v2 analysis because:
o
If we generated SDMs for each plant species we would need to have 5000 inputs into Zonation.
We currently do not have powerful enough computers to consider so many inputs at a useful
scale and extent using Zonation or any other spatial prioritisation tool.
o
We want to incorporate species from different groups (birds, mammals, plants etc) equitably
into the analysis irrespective of their relative richness.
2. Given the reasons described above, we decided to integrate all our species level data in an
ecologically sensible way such that we had fewer inputs. So we developed 100 terrestrial
‘assemblages’ (or spatially co-occurring ‘communities’) of vascular plants and vertebrate animals as
inputs into the analysis.
1
Actions for Biodiversity Conservation
How were the terrestrial assemblages generated?
3. In summary, we used Victorian vegetation quadrats and terrestrial vertebrate fauna SDMs (see NaturePrint:
Terrestrial species distribution models) to find groups of species that co-occur. All of these sites with their
constituent taxa were then analysed to find patterns and coherent groups within the data. We then modelled
the distribution of these groups, or ‘assemblages’ in the same way that we might model species distributions
(SDMs). Paragraphs 4-13 describe these methods in detail.
4. The data. We noted the presence or absence of 3228 plant species at each of 30,423 quadrat sites across
Victoria. We extracted this data from the Victorian Biodiversity Atlas in September 2011, which equates to
the 2010-data release. We checked each of the first generation vertebrate SDMs at each quadrat location to
see whether habitat for that species was more than 70% likely to be found at that location. This produced a
matrix of 30,423 sites x 3722 taxa for analysis (see Figure 1).
Figure 1. Example of the matrix of plant and terrestrial fauna present (signified by ‘1’) and absent (signified
by ‘0’) in each quadrat. The pattern of presence and absence is used to cluster for assemblages.
Quadrat
Location
Plant taxa
x
a
b
c
d
3228 a
b
c
d
1
0
1
1
0
1
0
1
0
2
1
0
1
1
1
1
0
0
3
0
0
0
1
0
1
1
0
4
1
1
0
0
1
0
1
1
y
Terrestrial fauna SDMs
494
30, 423
5. We then wanted to look for similar patterns across the matrix as these would tell us about groups of species
that co-occur. There are many ways / methods / approaches to aggregating and classifying biological data.
Some of these have been around for a while, and all have their strengths and weaknesses. One of the key
limitations to clustering these data is purely computational - many intensive numerical methods simply
cannot be usefully applied to such a large dataset with existing computing power. We investigated many
clustering methods, and selected the ‘self-organising maps’ algorithm for its tractability and useful outputs.
The number of assemblages selected to be discovered in the data was arbitrary, but with a view to the
computational efficiency of the spatial optimisation process.
6. Each of the 100 assemblages was checked to see whether the groupings of species made ecological sense.
One assemblage consisted of a single quadrat site and another assemblage consisted of outliers such as
beaches and highly disturbed sites. Both were removed from further consideration. The remaining 98 were
considered ecologically useful.
Modelling the distribution of the assemblages
7. The above methods effectively provided ‘presence’ data for each assemblage. We also need data about the
geographic ‘absence’ of assemblages to create distribution models. One method to assign absences could be
to randomly distribute them across Victoria. However, this has the potential to miss rare or very localised
2
Actions for Biodiversity Conservation
types of the environment such as water bodies or rocky outcrops which might be important in predicting
where the assemblage occurs and doesn’t occur.
8. Our aim was to stratify absences within environmental types to ensure we don’t miss important but
geographically restricted types. To do this we needed to know where environmental types occur across
Victoria and we used a similar procedure as for generating the assemblages. We clustered a subset of the
spatially explicit or mapped environmental predictor data (see appendix) for south east Australia into
‘environmental types or domains’.
9. We then assigned an equal number of absence data randomly to each of these environmental types and
these 200,000 random stratified sites formed the pool from which random absences could be drawn.
10. Once we had presence data for our 98 assemblages and a stratified pool of potential absence data, we
modelled the spatial distribution of assemblages using Random Forests. This is a machine- learning process
that aggregates hundreds of randomly generated decision trees. Spatial expressions of the assemblages were
produced at a resolution of 75 m (Figure 2).
Figure 2. Spatial expression of the assemblages.
Mapping assemblages
11. The distributions of assemblages are continuous and overlapping. That is, at each pixel you might have a 10%
probability of assemblage A, 15% probability of assemblage B, etc.
12. The combination of these 98 assemblage distribution models can be used to form a ‘maximum likelihood’
view of which assemblage is most likely to be present at each pixel. Our flat map of the distribution of the 98
terrestrial assemblages shows the assemblage that has the maximum likelihood of all the assemblages for the
pixel (Figure 3).
13. This map also helps to inform us about targeting areas to collect new biodiversity information (e.g. survey for
plants and animals) where our knowledge is poor and potentially affecting our decision-making. That is,
where the maximum likelihood is low, we are less certain about what terrestrial biodiversity occurs in those
locations, and the map can be used to stratify future sampling efforts.
3
Actions for Biodiversity Conservation
Figure 3. Maximum likelihood of assemblages
How are freshwater species included in NaturePrint v2?
14. Freshwater fish assemblages (including yabbies and freshwater crayfish) were derived from the Sustainable
Rivers Audit (SRA) data. Although the SRA data doesn’t provide information on small tributaries, it is
repeated sampling over the long term, it can broadly be interpreted as presence/ absence data and is
relatively comprehensive for Victoria.
15. The assemblages were found from the SRA data using a clustering method called expectation maximisation
(with default settings). Expectation maximisation clustering discovers the number of groups in the data that
satisfy predetermined levels of difference/similarity.
16. These methods found 17 fish communities/assemblages in the SRA data. The assemblages were considered
ecologically reasonable by fish biologists from DSE’s Arthur Rylah Institute.
17. The distribution of the fish assemblages was derived using the same methods as the terrestrial assemblage
distributions, that is stratifying ‘absences’ using the environmental domains and modelling spatial distribution
using Random Forests (see above for more detail). Similar to the terrestrial assemblages, the distribution of
the fish assemblages are continuous and overlapping and can by mapped on the maximum likelihood view
with the terrestrial assemblages.
How did we generate assemblages for areas that don’t have vegetation quadrats, for example
wetlands?
18. We looked at the waterfowl data to see if there were places that people went regularly over time and
recorded birds. We hoped to use the waterfowl dataset to generate ‘bird quadrats’ specific to wetlands,
similar to the concept of vegetation quadrats. We found that some wetlands had thousands of repeat
surveys while other wetlands had only been surveyed once. As a consequence, wetlands with thousands of
repeat surveys invariably had a comparatively high recorded species richness and those that had few visits
had a low recorded species richness. These differences tended to swamp the expected differences between
bird assemblages at hydrologically/geomorphologically different wetlands. While there are many potential
approaches to subsample from the data to remove these biases, this is currently beyond the capacity of
NaturePrint v2.
19. In NaturePrint v2.0 we included wetland types based on Corrick’s 53 major and minor mapping units as
surrogates for wetland communities. As new data becomes available in the future we intend to review the
modelling and representation of wetlands assemblages.
4
Actions for Biodiversity Conservation
How are threatened species treated in NaturePrint v2?
20. We incorporated post-1970 records of 777 VROT plant and animal species as point files in the optimisation.
See NaturePrint: Terrestrial species distribution models for further information on accuracy of records. We
have also incorporated the VROT fish ‘points’ as reaches. The point data and the modeled assemblage data
are both concurrently part of the optimization objective (see How does the NaturePrint analysis rank each
pixel across Victoria? for details).
21. The following dot points explain our reasons for including point records in the analysis as well as assemblage
distribution models when we have previously explained that SDMs are much more useful than points.
o
Outliers in species records can be either erroneous or true records. If true, the outliers can be relatively
more important than the majority of other records in telling us about the types of environments in which
species live and hence inform the species distribution model. However, models are generally more
interested in the core body of the data that describes the environment of species. We build processes
into how we generate models to look after outliers but there is only so much we can do. This
incorporation of rare and threatened species records directly into the optimisation dilutes this problem.
o
In a species distribution model we make a prediction about the probability (between 0 and 1) of the
species’ habitat occurring in each 75 m pixel across the state. The probability of occurrence in a pixel
where the species was actually detected, given assumptions of accuracy of identification and location, is
probably approaching 1. That is, the probability that the pixel forms part of the actual distribution is quite
high. The model doesn’t explicitly know that and could assign a lower probability to the pixel where the
species was found.
o
There are some phenomena in the environment that we can’t currently predict very well with our existing
environmental data. For example, some locations that have high utility for rare and threatened species
are very difficult to ‘describe’ in terms of the environmental variables such as climate, terrain and soil. In
these cases there are likely to be other drivers or habitat features that we either do not yet understand,
or we find it difficult to produce useful spatial environmental surfaces to predict a species occurrence
(e.g. it is currently very difficult to produce a credible spatial representation of tree hollows across the
state, yet we know that these habitat features are critical to key taxa). In these instances, retaining point
data can provide extra information about locations (e.g. a good example of this in Victoria is the Chiltern
region where many rare and threatened species persist despite its apparently ‘poor’ landscape
connectivity and condition).
o
There is capacity in Zonation spatial optimisation software to combine both the naturalist view and the
data view. This approach seems reasonable – given that the naturalist view embodies a significant
component of our stakeholder base and directly shapes our priority setting.
Considering breeding habitat in the optimisation analysis
22. Some fauna species have two distinct and important habitat types. We reflected the importance of both
types in the optimisation analysis by splitting the site records of selected (see below) species into two subsets
per species: 1) records collected during the breeding season (or from colonial roosting/breeding sites) and 2)
all other records (within accuracy constraints). We included each sub-set of records as separate ‘species’ in
the analysis (e.g. ‘Red-tailed Black Cockatoo feeding’ and ‘Red-tailed Black Cockatoo breeding’).
23. Only rare and threatened bird and bat species were considered for inclusion using this method.
24. The birds included Australasian Bittern, Red-tailed Black Cockatoo, Whiskered Tern, Lewin’s Rail, Turquoise
Parrot, Powerful Owl, Regent Parrot, Superb Parrot, Regent Honeyeater, Little Tern, Fairy Tern and Hooded
Plover. The bats included the Southern and Eastern Bentwing Bats and the Eastern Horseshoe Bat.
25. Future NaturePrint versions could consider two types of habitat for fish e.g. spawning and feeding. This is
partly accounted for in the aquatic assemblages described above.
26. All VROT site records data (168 separate text files indicating locations at 75 m resolution) were supplied to
Zonation and treated in the same way as the assemblage models. That is, each record of each species was
used in the analysis.
5
Actions for Biodiversity Conservation
Part 2: Producing the integrated maps using optimisation
How does the NaturePrint analysis rank each pixel across Victoria?
27. The software we used to do the analysis for the integrated maps is called Zonation. Zonation seeks to satisfy
its ‘objective’ in a balanced way while iteratively removing pixels (reducing the area of the state). For
NaturePrint v2.0, the objective is to maximise the area (i.e. number of pixels) retained for each of the values
(assemblages, and VROT points), while also maximising their retention in areas of high landscape connectivity
and condition. It also has a preference for a more complete (i.e. vegetated), compact, connected landscape,
based on the logic that it is cheaper, easier and more likely to succeed to manage a more aggregated
landscape.
28. Zonation starts by removing the pixel which is the least useful in meeting the objective, and then reassesses
the whole stack of information again. The removal of a pixel fundamentally changes the relative importance
of the remaining pixels. Therefore, the pixel to be removed in the next iteration is not necessarily the pixel
that was judged ‘second least useful’ in the prior iteration. This is why Zonation is iterative as opposed to
ranking the whole landscape in a single pass.
29. The process is repeated until all the cells are removed. The order in which cells are removed is recorded
resulting in a ranking - this ranking determines the colours of the values gradations on the Strategic Natural
Values map. The result is not a single set of sites rather it is a complete and nested solution.
30. For further technical explanation please see the Zonation user manual:
http://www.helsinki.fi/bioscience/consplan/software/Zonation/Zonation_UserManual_v1.0.pdf
What are the different integrated maps produced in NaturePrint v2.0?
31. NaturePrint v2 includes two integrated products: the Strategic Natural Values map and the Priorities for
Action map.
32. The Strategic Natural Values map: protecting the full range of biodiversity values across Victoria aims to
maximise the biodiversity values protected while minimising the amount of area required to achieve this.
Based on existing data it ranks the relative contribution that each place (75 x 75 m pixels) makes towards
protecting the full range of biodiversity values across Victoria.
33. The Priorities for Action map: re-considers the same ranking problem as above except that it is recalculated in
the context of the potential for values to be lost or degraded over the next ten years (see below for detail).
Apart from the most unique and irreplaceable sites, the ranking of many sites will change. It is intended that
the map will be used to assist in identifying candidate areas for actions to achieve biodiversity conservation
(actions may include combinations of regulation, publically-funded activities and strategic planning).
34. Both the Natural Values map and the Priorities for Action maps were generated by optimising the species and
assemblage information described above along with the context, condition, retention and risk information
described below.
The Strategic Natural Values map
35. The optimisation for the Strategic Natural Values map seeks the objective described above and is mediated by
two separate surfaces: connectivity and condition.
Connectivity
36. We approached connectivity differently in NaturePrint versions 1 and 2. In NaturePrint v1 we incorporated
connectivity into the ranking analysis in two ways.
6
Actions for Biodiversity Conservation
o
Each species distribution model was ‘smoothed’ at a spatial scale relevant to each species’ capacity to
move and disperse. This smoothing process was indicative of the amount of habitat within a region. Each
of these smoothed versions of the species distribution models (in addition to the ‘original’ species model)
was included in the analysis.
o
We attempted to include preferred and potential movement pathways between selected large remnants
of discrete but broadscale habitat types using circuit theory modelling in the software “Circuitscape”.
Moilanen (2011) clearly explains some of the problems we encountered with this approach.
37. NaturePrint v2 does not incorporate these approaches to connectivity because of the significant changes to
the nature of the version 2 inputs and the greatly expanded complexity of the analysis - both in terms of the
numbers of inputs and the resolution of these inputs.
38. NaturePrint v2 employs more generalised methods to yield a connected and aggregated (and also more
computationally tractable) solution. These methods, Edge Removal and Landscape Connectivity, are based
on the logic that it is generally cheaper to connect vegetated areas if they are proximal.
Edge removal
39. The pixel removal procedure was scheduled so that any new pixels chosen for removal could only be selected
from the pixels remaining at the interface between the already removed and the retained pixels.
40. As a consequence the sequence of pixel removal has profound implications for how the analysis proceeds. To
counter some of the perverse outcomes that may result from this rule, Zonation was initiated with preseeded edges within the state. These were regions that were defined as treeless and urban including the
state boundaries and coastline.
Whole of landscape connectivity
41. We used DSE’s landscape connectivity surface to regulate the pixel ranking process. This was achieved by
using landscape connectivity as a ‘cost’ surface. That is, places with a higher landscape connectivity were
assigned a lower cost. Remnant vegetation was uniformly assigned the lowest ‘cost’ and as we moved
beyond the edge of remnant vegetation (as defined by NV extent 2005) costs increased linearly as
connectivity values diminished. Consequently if all other inputs to the pixel ranking decision are equal, pixel
removal will proceed by first selecting pixels with the lowest landscape connectivity.
42. An alternative to using landscape connectivity to produce pixel ranking solutions that are aggregated would
be to use a boundary length penalty function (BLP) in Zonation. However, using boundary length penalty for
such a large and complex analysis (i.e. in terms of the number of pixels to juggle in each iteration) is
computationally intensive and the analysis would take several months to complete even using a very
powerful computer.
Condition
43. We further regulated and mediated the pixel ranking process using site condition. Site condition is a
modelled surface representing understorey condition based on habitat hectares. For more information on
the derivation of this surface, please see:

Newell, G., White, M., Griffioen, P. and Conroy, M. (2006) Vegetation condition mapping at a landscapescale across Victoria. Ecological Management and Restoration 7: S65-S68

Kocev, D., Dzeroski, S., White, M.D., Newell, G.R. and Griffioen, P. (2009) Using single and multi-target
regression trees and ensembles to model a compound index of vegetation condition. Ecological Modelling
220(8): 1159-1168

Department of Sustainability and Environment (2008) Native Vegetation Net gain accounting - first
approximation report. State of Victoria, Department of Sustainability and Environment, East Melbourne.
7
Actions for Biodiversity Conservation
44. Consequently if all other inputs to the pixel ranking decision are equal, pixel removal will proceed by first
removing pixels with the lower condition – retaining pixels with the higher condition.
45. In effect we are supplying the ‘pre-European settlement’ view of the distribution of each assemblage – see
the outer polygon in Figure 4 – and the modelled condition of the assemblage currently. In the simplified
example in Figure 4, the condition is 5%, 10%, 20%, 40% and 50% of the maximum habitat hectares score. In
this hypothetical example, the polygons with 40 and 50% represent the extant extent of the assemblage.
46. Zonation attempts to balance the proportion of habitat lost across all species (in our case assemblages) at
each step of the ranking process. Therefore habitat and condition interact rather like habitat hectares – as
each species' habitat is removed (all other things being equal) the lowest quality habitat will be removed first.
Figure 4. Hypothetical condition in each part of the distribution of an assemblage.
proportion of maximum condition score.
Percentages are the
Part 3: Incorporating risk in the integrated map
47. The Priorities for Action map further refines the analysis by incorporating some of the risks posed to the
persistence of the values we are attempting to protect. A pixel’s place in the statewide ranking may alter if
we also preference locations where values are insecure and potentially may be lost over those locations
where the values will be retained and are secure.
48. In order to implement this in the analysis we included a risk horizon of ten years and attempted to impute the
potential of substantially losing values as a consequence of land-use intensification or weed invasion. Of
course, these are not the only threats to the persistence of biodiversity values, however these are two threats
for which we were able to generate credible statewide surfaces. As work progresses we hope to bring
realistic spatial representations of other threats.
49. In summary, the Priorities for Action map could be considered a proof-of-concept demonstrating how we
might incorporate some major risks but it cannot be interpreted as a comprehensive picture of priorities for
action as it only incorporates two types of potential loss.
Retention
50. The logic here is there is less need to reserve and manage sites where there are no anticipated threats.
Zonation incorporates the idea of trading sites at risk over secure sites by including a “retention” layer.
“Retention” like condition and other ‘cost-type’ inputs further mediates the pixel removal process – all other
things being equal pixels that are subject to greater risk (in this case via land use change and weed invasion)
will be retained before pixels subject to less risk.
8
Actions for Biodiversity Conservation
How did we predict the probability of loss due to land use intensification over the next ten years?
51. We aimed to give every pixel across Victoria a probability of loss of values over the next ten years. Loss
includes clearing and intensive agricultural use such as addition of superphosphate or change from grazing to
intensive cropping. However, data is limited and the following paragraph explains how we estimated the
probabilities.
52. The Net Gain Accounting Report estimates that we lose, on average, 4,500 ha of native vegetation per year
across Victoria. This equates to 45,000 ha lost over next ten years. Almost all of this loss will be from private
land either through accident or intentional efforts.
53. We started with each of the pixels that currently support native vegetation on private land having an equal
probability of being lost at the background level of clearing. Then we re-weighted each pixel by the historic
level of clearing within each environmental domain (see paragraph 8 for a description of environmental
domains).
How did we predict the relative probability of weed invasion over the next ten years?
54. We aimed to give every pixel across Victoria a probability of loss of values over the next ten years. Loss
includes gross degradation due to weed invasion. However, data is limited and the following paragraphs
explain how we estimated the probabilities.
55. We identified transformer weed species in Victoria. ‘Transformer species’ are a subset of invasive plants that
have the capacity to change the character, condition, form or nature of ecosystems over substantial areas
relative to the extent of that ecosystem.
56. We used DSE vegetation quadrat data to identify places where transformer species have been detected.
57. We used MAXENT modelling process (www.cs.princeton.edu/~schapire/maxent/) to build a presence only
distribution model of transformer species using environmental predictor variables described above along with
variables related to propagule pressure such as distances from edges, land uses, habitation and road types.
58. The resultant map presents as a surface predicting the ‘probability’ that a pixel has a transformer weed
present. This is not the same as losing all the values at the site. Therefore we rescaled this to produce a
relative risk of loss due to weed invasion employing the upper and lower bounds of the clearing probabilities.
59. These two relative risk surfaces were added together to create the ‘retention’ layer for inclusion in the
Zonation analysis.
Appendix
Spatially explicit environmental predictors used to model the distribution of environmental domains
(see paragraph 8).
We have only used environmental predictors that are mapped comprehensively across the whole of Victoria.
Potential predictor variables include gridded climate data derived from the ANUCLIM model, an independent data
set of radiometric data that is a useful surrogate for soils and regolith, and a range of variables derived from
digital elevation modelling such as terrain wetness and height above water feature.
Note: Ecological Vegetation Classes are not used as environmental predictor variables for generating distribution
models. See NaturePrint: Terrestrial species distribution models for further explanation.
Terrain related variables
Climate
Anisotrophic Heating Index
Evaporation - January
Annual Insolation
Evaporation - July
Visible Sky (or exposure) Index
Mean max daily temp - July
9
Actions for Biodiversity Conservation
Terrain Wetness index
Mean max daily temp - June
Soil
Rainfall - January
Radiometrics (Potassium / Thorium)
Rainfall - July
Radiometrics (Inverse Potassium /Thorium )
Raindays - January
Satellite data and derived indices
Raindays - July
ModisMean_EVI_Jan2000_June2011
Water
ModisMean_MIR_Jan2000_June2011
Log of horiz. distance from major stream + vertical distance
above stream
Geographic location
Log of horiz. distance from all mapped streams + vertical
distance above all mapped streams
Location X = easting for study area / log10 Log of horiz. distance from saline waterbody + vertical
(altitude)
distance above waterbody
Location Y = northing for study area / log10
(altitude)
10
Download