Chapter 14 - Environmental niche modelling

advertisement
BCB 341: Principles of
Conservation Biology
APPROACHES TO NICHE-BASED
MODELLING – THEORY AND PRACTICE
Material: Dr Barend Erasmus
LECTURE STRUCTURE
• Why model species ranges?
• What is a niche? –
fundamental and realised
• Correlative range modelling –
background and assumptions
• Distribution datasets
• Variables and their selection
• Models and their selection
• Model calibration and
evaluation
WHY MODEL SPECIES RANGES?
We need to know where species occur and why
they occur where they do:
 We want to predict where a particular species
occurs.
 We want to know more about organismenvironment relationships
USED IN RESPONSE TO



increasing rates of habitat, and species loss,
incomplete (spatial and temporal) distribution info for a
large number of taxa,
existing distribution data collected in an ad hoc fashion.
Given the rate of species loss, it is unlikely that we will get the
distribution data that we need in time if we rely on
conventional survey techniques.
Atlases are an invaluable data source and cover very few taxa
but they are very important for model development and
calibration.
DISTRIBUTION MODELS HAVE BEEN USED TO PREDICT









species richness (Jetz & Rahbeck 2002)
centres of endemism (Johnson, Hay & Rogers 1998),
the occurrence of particular species assemblages (Neave,
Norton & Nix 1996),
the occurrence of individual species (Gibson et al. 2004),
the location of unknown populations (Raxworthy et al. 2004)
the location of suitable breeding habitat (Osborne, Alonso &
Bryant 2001),
breeding success (Paradis et al. 2000),
abundance (Jarvis & Robertson 1999),
genetic variability of species (Scribner et al. 2001)
THEY HAVE ALSO BEEN USED TO












help target field surveys (Engler, Guisan & Rechsteiner 2004),
aid in the design of reserves (Li et al.1999),
inform wildlife management outside protected areas (Milsom et al. 2000)
guide mediatory actions in human–wildlife conflicts (Sitati et al. 2003).
monitor declining species (Osborne, Alonso & Bryant 2001),
predict range expansions of recovering species (Corsi, Dupre & Boitani
1999),
estimate the likelihood of species’ long-term persistence in areas
considered for protection (Cabeza et al. 2004)
identify locations suitable for introduction (Debeljak et al., 2001)
identify locations suitable for reintroductions (Glenz et al., 2001).
identify sites vulnerable to local extinction (Gates & Donald 2000)
identify sites vulnerable to species invasion (Kriticos et al. 2003),
explore the potential consequences of climate change (Erasmus et al.
2002).
PRINCIPLES: FUNDAMENTAL NICHE
Definition: n-dimensional hypervolume
described by n environmental and
resource constraints within which a
species can maintain a viable
population.
The combination of conditions and
resources required by an individual
species defines the area in which it is
able to live.
(from Begon, Harper & Townsend 1990)
PRINCIPLES: REALISED NICHE
Fundamental niche never completely occupied
due to competitive interactions
 Actual occupied niche space that maintains
viable population is a subset of the
fundamental niche = realised niche

PRINCIPLES: RANGE EDGES
What determines the edge of geographic ranges?
 There are changes in local population dynamics at the edge of a
distribution, and more net losses than net gains
These population level changes are brought about by:
 Changes in abiotic factors (physical barriers, climate factors, absence of
essential resources) and biotic factors (impact of competitors, predators
or parasites)
 Genetic mechanisms that prevent species from becoming more
widespread.
Abiotic/biotic factors are only limiting because a species has not evolved
the morphological / physiological / ecological means to overcome them.
PRINCIPLES: RESPONSE CURVES
Plot of species presence with variation in some
environmental variable.
 Most models assume a Gaussian response, but
in fact it is seldom Gaussian, and may take on
a variety of shapes. Especially in complex
communities, response curves may exhibit
truncated forms due to biotic interactions.
 The ability of the chosen model to represent
this response curve is critical to model
performance.

RESPONSE CURVES ESTIMATION OF DIFFERENT MODELS
Source: Guisan and Zimmerman, 2000
SPECIFICS: NICHEBASED MODELLING
Species
Distribution
Model Calibration
Environmental Variables
Yes
Independent
evaluation
dataset
No
70/30% Random
Calibration/Evaluation
Sample
Independent
evaluation
dataset
Model Evaluation
Final Model used to project current
and future distributions
NICHE-BASED MODELLING –
ASSUMPTIONS
Assumptions:








Environmental factors drive species distribution
Species are in equilibrium with their environment
Limiting variables – are they really limiting?
Coincidence with climate or climate shift
Evidence for species dying/not reproducing due to climate
Collinearity of variables
Assumption of assembly rules: niche assembly vs dispersal
assembly
Static vs dynamic approaches: data snapshot or time series
response?
CAUTIONARY NOTE ON MODELLING IN
GENERAL
Risk of all models: GIGO- Garbage in, garbage
out
 Need to understand assumptions, explicit and
implicit
 Models are an abstraction of reality, meant to
improve our understanding of core processes.

SPECIFICS: VARIABLE SELECTION
Direct
Indirect
Definition
Variables with biological
relationship with study species
Variables that correlate with study species
because of correlation with series of
intermediate direct factors rather than
direct relationship
Example
Climate, nesting sites, soil nutrients
(plants), interacting species, site
isolation
Elevation, soil, topography, geology,
soil nutrients (animals)
Strength
Weakness
Model structure easily interpreted in
biological meaningful terms.
Direct biological relationship should
generalize better to new areas, and
be more effective for climate change
modeling than indirect predictors.
Provides more info for conservation
management
Data sets widely available in GIS
Low cost, ease of collection
Can be effective predictors, ie elevation in
mountainous areas
Encompasses a range of correlated
variables so should: result in
parsimonious models if variable selection
applied, recording fewer variables
Variables require greater effort to
record
Data sets may need to be estimated
for large spatial extents (using
indirect variables reducing
overall accuracy
Correlation with direct variables tend to be
location specific
Limited interpretation – biological meaning
inferred, resulting in increased
uncertainty
EXAMPLE OF HOW DIRECT/INDIRECT VARIABLES
MAY AFFECT A PLANT SPECIES
Click to
enlarge
VARIABLES AND THEIR SELECTION
• Species only select their habitats in the broadest sense (Heglund
2002), and distribution patterns are the cumulative result of a large
number of fine scale decisions made to maximize resource
acquisition.
• The more accurately these fine-scale resources can be approximated
and access quantified, the better the model should perform if all
models were equal.
• Predictions at broad scales can use broader environmental variables,
often associated with the fundamental niche,
• Finer scale predictions need to concern themselves more with those
variables that determine the realized niche.
(Pearson & Dawson 2003)

Variable selection determines generality
vs specificity of modelled output
Process, ie habitat
selection, reproduction
Theoretical
models
Pattern, eg habitat
occupancy
Specific
models
General empirical
models
(from Van Horne 2002)
ENVIRONMENTAL VARIABLES
MAP, Psummer, Pwinter
 MAT, Tmin, Tmax, Tmin06
 Soil (pH, texture, organic C, fertility)
 Avoid indirect measures of a variable which is a
challenge
 project into the future e.g. slope, aspect,
altitude
 Difficult variables – Solar radiation, wind

DERIVED VARIABLES

Growing degree days (e.g. base 5°C)

PET – Thornthwaite, Priestly-Taylor, Linacre

Water Balance – Crudely defined as MAP – PET

Favourable soil moisture days– Modelled using
e.g. ACRU, WATBUG

Palmer Drought Stress Index – PDSI Program
RECOMMENDATIONS FOR VARIABLE SELECTION
Recommendation
Use variables that show direct relationship with
organism
Consideration of interacting species
ID complete geographical region of interest
prior to sampling (Thuiller et al 2004)
Environmental stratification, with equal
samples between strata
Multiscale approach to sampling
Aim to sample at least 10 sites for every
environmental variable considered
Aim to model spatial autocorrelation, where
present; test to ensure adequate stats power for
autocorrelation analyses in design of sampling
scheme (Keitt et al 2002, Dungan et al 2002. More
background Legendre 2002, Perry et al 2002)
Collect independent evaluation data;
environmental stratification used in process.
Potential advantages
Improved predictive ability, especially over large
geographical extents or predicting responses to
environmental change
Improved predictive ability, greater biological validity
(modeling of realized niche), greater explanatory power and
ease of interpretation
Improved predictive ability with new data because model
does not need to extrapolate beyond conditions under which
model was constructed; explanatory conclusions more widely
applicable
Improved predictive ability, more accurate explanatory
analysis
Improved predictive ability, greater explanatory understanding,
more relevant to cons planning
More reliable model development and explanatory analysis,
improved predictive ability
Facilitated detection, characterization and subsequent
modeling of autocorrelation, improved understanding of
mechanisms generating distribution pattern, greater
predictive accuracy
Essential to test models, increase scientific rigour and
observational analyses. Idea of model generality and
predictive ability.
SPECIES DISTRIBUTION DATASETS
•Museum/Herbarium data e.g. Precis (Sabonet)
•Survey Atlas data e.g. Protea Atlas
•Expert Atlas e.g. Birds of Africa
•Field data e.g. Ackdat or TSP databases
•Presence / Absence data
•Georeference accuracy e.g. GPS / QDS
•Taxonomy affects numbers
•Taxonomic updates of older museum data
Data sources and
their typical scales
Locality Type
Museum Specimens
Presence
Herbaria Specimens
Presence
Expert Atlas
Presence/Absence
Survey Atlas
Presence/Absence
Fieldwork
Presence/Absence
11000m
15km
1-15
minutes
0.25- 1
degree
1-5
degree
SPECIES DISTRIBUTION DATASETS…2
Using existing data
 Ad hoc museum data – presence only
(Brotons et al 2004)
 Atlases – may be presence/absence.
Scaling down of atlas data: not a good idea
to attempt without due caution and model
validation (Araujo et al 2005)
 Flagship/Indicator species: depends on
objective of model – ecosystem function vs
biodiversity vs change detection
 Adaptation response depends on selected
flagship species, ie Proteas in CFR
SPECIES DISTRIBUTION DATASETS…3
Collecting new data to model
 Gradsect sampling – maximizing samples
across gradients (Wessels et al 1998)
 Focussed vs random (Hirzel & Guisan 2002):
‘Regular’ and ‘equal-stratified’ sampling
strategies is more accurate and more robust.
Improve sample design:
 (1)
increase sample size,
 (2) prefer systematic to random sampling and
 (3) include environmental information in the design
HOW DO WE CHOOSE A MODEL
TYPE?
DIFFERENT TYPES OF MODELS
BioClimatic envelope e.g. Bioclim
 Ordinary Regression e.g. incl. in Arc-SDM
 Generalised additive models (GAM) e.g. GRASP
 Generalised linear models (GLM) e.g. incl. in
Biomod
 Ordination (e.g. CCA) e.g. ENFA
 Classification and regression trees (CART) e.g. incl.
in Biomod
 Genetic Algorithm e.g. GARP
 Artificial neural networks e.g. SPECIES
 Bayesian e.g. WinBUGS

PRINCIPLES






What question do you want to answer?
Data considerations
 What environmental data do you have access to?
 What is the resolution and extent of this data?
 Categorical or continuous data?
Scale considerations. (Thuiller et al 2003 – GAMs better at
performing consistent across scales because of ability model to
complex response curves)
Different variables important at different scales (Pearson& Dawson
2003)
Good example of an informed modeled solution: Gibson et al 2004
Different models compared: summary of such studies in Segurado
& Araujo 2005, Thuiller et al 2003.
VARIOUS DECISION TREES FROM THE
LITERATURE
Click to enlarge.
(Guisan and Zimmerman, 2000)
DECISION TREES FROM THE LITERATURE (2)
Type of model
Potential application
1.Empirical behaviour of species
presence/absence to environmental
variables prioritized (e.g nonparametric models such as GAM,
classification trees and neural
networks)
Complex distribution patterns, i.e. where occurrences do
not respond to environmental variables according to a
predefined ‘shape’, ie widespread species
2.Focuses on general trend of
presence/absence response (e.g.
parametric models such as GLM)
Expected to provide reasonable models for species
responding to environmental
gradients as predicted by simple response curves.
3. Use presence-only data to seek
relationships with environmental
predictor (DOMAIN and ENFA)
Expected to provide models with high sensitivity (low
misclassification of true presences) but low overall
performance because it ignores the response of absence
data to environmental variables.
Useful if no reliable species absence data is available.
4. Use presence-only data and their
geographic positions to develop
predictions (spatial interpolators)
Complex distribution patterns, i.e. where occurrences do
not respond to environmental variables according to a
predefined ‘shape’, ie widespread species.
Expected to provide models with high sensitivity (low
misclassification of true presences) ) but low overall
performance because it ignores the response of absence
data to environmental variables
(Segurado & Araujo 2005)
IN CONCLUSION



In general, neural networks and GAM (possibly with an
autocorrelation coefficient) are the most robust.
Neural networks are black boxes: biological interpretation
is hard to do
Two options:

Choose an expert system (e.g. BIOMOD) that compares
models automatically, and selects the best one, or
choose a model that is generally robust.

Choose a method particularly suited to the questions
asked, i.e. ENFA when presence-only data is available.

However, GAM with pseudo-absence may outperform
presence-only techniques (Brotons et al 2004).
MODEL CALIBRATION AND EVALUATION
Once you have decided on a model type, then you
need an methodology to select the best model from a
suite of potential models, all with different
combinations of the selected environmental variables.
Stepwise selection of variables: order doesn’t matter in GAM, does
with GLM
Click magnifying glass to enlarge table.
(from Johnson & Omland 2004, Rushton et al 2004).
Frequency
Environmental Variables
Species Distribution
MODELS AND THEIR SELECTION - BIOCLIMATIC ENVELOPE
Value classes
IF
Tann =[23,29] °C AND Tmin06=[5,12] °C AND
Rann=[609,1420] AND Soils=[1,4,5,8]
THEN SP=PRESENT
MODELS AND THEIR SELECTION - GAM MODELING
For linear regression there is a dependent variable Y and
predictor variables X1 … Xp such that

Y     jj  
j 1
Additive models replace the linear function Bj with a smoothed
non-linear function fj

Y     fj( j )  
j 1
Owing to the binomial nature of the dependent variable we need
to use the “Logit” family (non-linear transformation)
ln[p/(1 - p)]  a  BX  e
HOW GOOD ARE THE PREDICTIONS?
(Fielding & Bell 1997, Guisan and Zimmerman, 2000)
• Output data = probability values
• Observed data = presence – absence data
How to compare?
 Need a probability threshold to derive a
misclassification matrix (MM)
Actual
+
Predicted
-
+
True
positive
(a)
False
positive
(c)
False
positive
(b)
True
negative
(d)
KAPPA STATISTIC







Based on the MM
Take into account chance agreement
Estimation of Kappa for a range of threshold and keep the best
Ke = [(TN+FN)x(TN+FP) + (FP+TP)x(FN+TP)]/n²
Ko = (TN + TP)/n
K = [Ko – Ke] / [1 – Ke]
Scales between 0 and 1; >0.7 good, 0.4 – 0.7 fair, <0.4 poor
(Thuiller 2004, pers comm.)
RECEIVER OPERATING CHARACTERISTIC
ANALYSIS (ROC)





Sensitivity TP/(FN+TP)
(true positive fraction)
Specificity TN/(FP+TN)
(true negative fraction)
Plot sensitivity and
specificity for a range of
thresholds
Calculate Area-undercurve (AUC):
0.8 good, 0.6 – 0.8 fair,
0.5 random, <0.6 poor
1
0.8
0.6
0.4
0.2
0
0.0
0.2
0.4
0.6
1 - specificity
0.8
1.0
HOW GOOD ARE THE PREDICTIONS?







Testing and training data sets (30:70)
Comparison across models, or across var’s with same model.
Number of explanatory variables.
Model development and improvement is iterative process
Delineating the predictive ability of predictor variables (Lobo et
al 2002)
Evaluate model output against historical data (Hilbert et al
2004)
Use of modelled data in conservation planning (Hannah et al;
Cabeza at al, 2004; Loiselle et al 2003)
Download