Ordination

advertisement
Adapted from Ecological Statistical Workshop,
FLC, Daniel Laughlin
Distance
Measures and
Ordination
Goals of Ordination
• To arrange items along an axis or multiple
axes in a logical order
• To extract a few major gradients that
explain much of the variability in the total
dataset
• Most importantly: to interpret the gradients
since important ecological processes
generated them
http://ordination.okstate.edu/
What makes ordination possible?
• Variables (species) are “correlated” (in a
broad sense)
• Correlated variables = redundancy
• Ordination thrives on the complex network
of inter-correlations among species
Ordination helps to:
• Describe the strongest patterns of
community composition
• Separate strong patterns from weak ones
• Reveal unforeseen patterns and suggest
unforeseen processes
“Direct” gradient analysis
• Order plots along measured environmental
gradients
• e.g., regress diatom abundance on salinity
“Indirect” gradient analysis
• Order plots according to
– covariation among species, or
– dissimilarity among sample units
• Following this step, we can then examine
correlations between environment and
ordination axes
• Axes = Gradients
• In PCA, these are called “Principal Components”
Data reduction
• Goal: to reduce the dimensionality of community
datasets
– (i.e., from 100 species down to 2 or 3 main gradients)
nxp
nxd
These d dimensions
represent the strongest
correlation structure in
the data
This is possible because of redundancy in the data (i.e., species are “correlated”)
Ordination Diagrams
Know two things:
1) What the points
represent (plots or
species?)
2) Distance in the diagram
is proportional to
compositional
dissimilarity
NMS Ordination
Axis 2: “Biotic”
Do not seek patterns as
you would with a
regression: axes are
orthogonal
(uncorrelated)
Litter
Pine
Sand
pH
Elev
OM
Grazing
Clay
Nitrogen
Axis 1: “Abiotic”
How many axes?
• “How many discrete signals can be
detected against a background of noise?”
• Typically we expect 2 or 3 gradients to be
sufficient, but if we know that 5
independent environmental gradients are
structuring the vegetation (water, light,
CO2, nutrients, grazers, etc.), then
perhaps 5 axes are justified
Two basic techniques
• Eigenanalysis methods- use information from
variance-covariance matrix or correlation matrix
(e.g., PCA)
– Appropriate for linear models since covariance is a
measure of a linear association
• Distance-based methods- use information from
distance matrix (e.g., NMS)
– Appropriate for nonlinear models since some distance
measures and ordination techniques can “linearize”
nonlinear associations
A summary table of ordination methods
Ecological Distance
Measures
Distance measures
• Distance = Difference = Dissimilarity
• Distance matrix is like a triangular mileage
chart on maps (symmetric)
• We are interested in the distances
between sample units (plots) in species
space
Distance measures
• In univariate species space (one species),
the distance between two points is their
difference in abundances
• We will examine two kinds of distance
measures:
– Euclidean distance, and
– Bray-Curtis (Sorenson) distance
Domains and Ranges
Distance
Domain of x
Range of d =f(x)
Euclidean
all
non-negative
Sorenson
x≥0
0<d<1
(0<d<100)
Which one works best?
“If species respond noiselessly to
environmental gradients, then we seek a
perfect linear relationship between
distances in species space and distances
in environmental space. Any departure
from that represents a partial failure of our
distance measure.”
McCune p. 51
Easy dataset (low beta diversity)
Figure 6.6
Difficult dataset (high beta diversity)
Intuitive property
Figure 6.7
NMS is able to linearize the relationship between distance in
species space and environmental distance because it is
based on ranked distances (stay tuned)
Theoretical basis
• Our choice is primarily empirical: we should select
measures that have been shown superior performance
• One important theoretical basis: ED measures distance
through uninhabitable, impossibly species rich space.
• In contrast, city-block distances are measured along the
edges of species space- exactly where the sample units
lie in the dust bunny distribution!
Nonmetric
Multidimensional Scaling
(NMS, NMDS, MDS, NMMDS,
etc.)
NMS
• Uses a distance/dissimilarity matrix
• Makes no assumptions regarding linear
relationships among variables
• Arranges plots in a space that best
approximates the distances in a distance
matrix
From a map to a distance matrix
Calculate distances
From a distance matrix to a map
NMS
Question: How well do the distances in the ordination match the
distances in the distance matrix?
Advantages of NMS
• Avoids the assumptions of linear relations
• The use of ranked distances tends to
linearize the relationship between
distances in species space and distances
in environmental space
• You can use any distance measure
Historical disadvantages of NMS
• Failing to find the best solution (low
“stress”) due to local minima
• Slow computation time
These concerns have largely been dealt with
given modern computer power
In a nutshell
• NMS is an iterative search for the best
positions of n entities on k dimensions
(axes) that minimizes the stress of the kdimensional configuration
• “Stress” is a measure of departure from
monotonicity in the relationship between
the original distance matrix and the
distances in the ordination diagram
Achieving monotonicity
Fig 16.2
The closer the points lie to a
monotonic line, the better the
fit and the lower the stress.
If S* = 0, then relationship is
perfectly monotonic
Blue = perfect fit, monotonic
Red = high stress, not monotonic
Instability
• Instability is calculated as the standard
deviation in stress over the preceeding 10
iterations
• Instabilities of 0.0001 are generally
preferred
sd = sqrt(var)
Mini Example
Landscape analogy for NMS
Global minimum
Local minimum
(strong, regular, geometric patterns emerge)
Reliability of Ordination
• Low stress and stable solutions
• Proportion of variance represented (R2)
• Monte Carlo tests
Variance represented?
• “Ode to an eigenvalue”
• NMS not based on partitioning variance,
so there is no direct method
• Calculate R2 for relationship between
Euclidean distances in ordination versus
Bray-Curtis distances in distance matrix
Axis
1
2
3
Increment
0.37
0.20
0.15
Cumulative R2
0.37
0.57
0.72
Monte Carlo test
• Has the final NMS configuration extracted stronger axes
than expected by chance?
• Compare stress obtained using your data with stress
obtained from multiple runs of randomized versions of
your data (randomly shuffled within columns)
• P-value = (1+n)/(1+N)
n = # of random runs with final stress less than or equal
to the observed minimum stress,
N = number of randomized runs
P-value = the proportion of randomized runs with stress less than or
equal to the observed stress
Monte Carlo tests
Autopilot mode in PC-ORD
Table 16.3 in McCune and Grace (2002)
PARAMETER
Quick
and
dirty
75
Medium Slow and
thorough
200
400
0.001
0.0001
0.00001
Starting number of axes
3
4
6
Number of real runs
5
15
40
Number of randomized
runs
20
30
50
Maximum number of
iterations
Instability criterion
Choosing the best solution
1. Select the appropriate number of
dimensions
2. Seek low stress
3. Use a Monte Carlo test
4. Avoid unstable solutions
1. How many dimensions?
One dimension is
generally not used,
unless the data is
known to be
unidimensional.
More than three
becomes difficult to
interpret.
elbow
Figure 16.3
Find the elbow and
inspect Monte Carlo
tests.
2. Seek low stress
•
•
•
•
•
<5 = excellent
5-10 = good
10-20 = fair, useable
20-30 = not great, still useable
>30 = dangerously close to random
Adapted from Table 16.4, p 132
A general procedure
• Carefully read pages 135-136
• In your papers, you should report the
information that is listed on page 136
• Autopilot mode works really well, but don’t
publish ordinations obtained using the
Quick and Dirty option! Be sure to publish
the parameter settings.
Interpreting NMS axes
• Two main/complementary approaches
– Evaluate how species abundances are
correlated with NMS axes
– Evaluate how environmental variables are
correlated with NMS axes
Overlays
• Overlays: flexible way to see whether a variable is
patterned on an ordination; not limited to linear
relationships
Axis 1
Overlays
Species versus Axes
Unimodal
pattern
Resist the
temptation to use
p-values when
examining these
relationships!
- nonlinear
- circular
reasoning
Linear
pattern
Environmental Variables
• Joint plots- diagram of radiating lines, where the angle
and length of a line indicate the direction and strength of
the relationship
PerMANOVA
The analysis of community
composition
• Continuous covariates
– Use ordination to produce a continuous response
variable (i.e., axis)
– Use covariance analysis (multiple regression, SEM)
to explain variance of the axis
• Categorical groups
– Ordination is not required (remember, ordination is
not the test)
– Permutational MANOVA (PerMANOVA): can use on
any experimental design
– MRPP (only one-way or blocked designs)
– ANOSIM (up to two factors, in R and PRIMER)
MANOVA
• Multivariate Analysis of Variance
• Traditional parametric method
• Assumes linear relations among variables,
multivariate normality, equal variances and
covariances
• Not appropriate for community data
PerMANOVA
• Permutational MANOVA
• Straightforward extension of ANOVA
• Decomposes variance in the distance
matrix
• No distributional assumptions
• Can still be sensitive to heterogeneous
variances (dispersion) among groups
• Anderson, M. 2001. Austral Ecology
ANOVA
• Compare variability within groups versus
variability among different groups
Elevation (m)
2800
2600
2400
2200
Decomposing an observation (yij)
yij  y..  ( yi.  y.. )  ( yij  yi. )

( yij  y.. ) 2  n ( yi.  y.. ) 2  
Variability of
observations about the =
grand mean
SStotal
SStotal
=
=
Variability of the ith
trt mean about the
grand mean
SSamong
SStreatment
+
+
+

( yij  yi. ) 2
Variability of
observations within
each treatment
SSwithin
SSerror
PROBLEM: WE CAN’T CALCULATE MEANS WITH SEMIMETRIC BRAY-CURTIS
ANOVA
• Compare variability within groups versus
variability among different groups
A simple 2-D case
Unknowable with
semi-metric BrayCurtis distances
The key link
• The key to this method is that “the sum of squared
distances between points and their centroid is equal to
(and can be calculated directly from) the sum of squared
interpoint distances divided by the number of points.”
Why is this important?
• Couldn’t use semimetric Bray-Curtis
distance in ANOVA context because
central locations cannot be found
• But we don’t have to calculate the central
locations anymore with this finding
• The analysis can proceed by using
distances in any distance matrix
One-way perMANOVA with two groups
Permuted p-values
P = (No. of Fπ >= F)
(Total no. of Fπ)
Fπ obtained with
randomly shuffled
data
Use at least 999
random permutations
I tend to use 9999
permutations
The link with ANOVA
• This F statistic is equal to Fisher’s original
F-ratio in the case of one variable and
when Euclidean distances are used
Example: grazing effects (one-way)
F = 36.6
Example: two-way factorial
Download