Analysis of temporal (stratigraphical) and spatial data

advertisement
NUMERICAL ANALYSIS OF
BIOLOGICAL AND
ENVIRONMENTAL DATA
Analysis of Temporal (Stratigraphic)
and Spatial Data
John Birks
ANALYSIS OF TEMPORAL AND SPATIAL DATA
Introduction
Temporal stratigraphic data
Single sequence
Partitioning or zonation
Sequence splitting
Rate-of-change analysis
Gradient analysis and summarisation
Analogue matching
Relationships between two or more sets of variables in same sequence
Two or more sequences
Sequence comparison and correlation
Multi-proxy studies
Hypothesis testing
Spatial geographical data
Spatial autocorrelation
Spatially constrained clusterings
Spatially constrained ordinations
Predictive models for spatial data
INTRODUCTION
Analysis of quadrats, lakes, streams, etc. Assume no autocorrelation,
namely cannot predict the values of a variable at some point in space from
known values at other sampling points.
PALAEOCOLOGY – fixed sample order in time.
strong autocorrelation – temporal autocorrelation
STRATIGRAPHICAL DATA
biostratigraphic, lithostratigraphic, geochemical, geophysical,
morphometric, isotopic
multivariate
continuous or discontinuous time series
ordering very important – display, partitioning, trends, interpretation
SPATIAL DATA
many types, spatial autocorrelation
spatial or geographical co-ordinates very important
raises problems of statistical inference as samples not independent
TEMPORAL STRATIGRAPHIC DATA
ANALYSIS OF SINGLE SEQUENCE
ZONATION OR PARTITIONING
Useful for:
1)
description
2)
discussion and interpretation
3)
comparisons in time and space
“sediment body with a broadly similar composition that differs
from underlying and overlying sediment bodies in the kind and/
or amount of its composition”.
CONSTRAINED CLUSTERINGS
1)
Constrained agglomerative procedures
CONSLINK
CONISS
2)
Constrained binary divisive procedures
Partition into g groups by placing g – 1 boundaries.
Number of possibilities
n g 1  n1 for g  2
Compared with non-constrained situation.
2
n 1
 1
Criteria – within-group sum-of-squares or variance
– within-group information
n
m
 p
i 1 k 1
ik
log pik
qik
SPLITLSQ
SPLITINF
3)
Constrained optimal divisive analysis
OPTIMAL
2 group ______________________________
n1
3 group
n2
n1
4 group
n2
4)
n1
n3
Variable barriers approach
BARRIER
All methods in one program:
ZONE
RIOJA
Pollen diagram
and numerical
zonation
analyses for
the complete
Abernethy
Forest 1974
data set.
Birks &
Gordon
(1985)
CONISS = constrained incremental sum-of-squares
(= constrained Ward's minimum variance)
OPTIMAL SUM OF SQUARES PARTITIONS OF
THE ABERNETHY FOREST 1974 DATA
Number of
groups g
(zones)
Percentage of
total sum-ofsquares
Markers
2
59.3
15
3
28.4
15
32
4
18.9
15
33
41
5
14.7
15
33
41
45
6
10.6
15
32
34
41
45
7
8.1
15
26
32
34
41
45
8
5.8
8
15
26
32
34
41
45
9
4.7
8
15
24
29
32
34
41
45
10
3.9
8
15
24
29
32
33
34
41
45
HOW MANY ZONES?
K D Bennett (1996)
Determination of the number of zones
in a bio-stratigraphical sequence. New
Phytologist 132, 155-170
Broken stick model
1 n 1
Pr  
n ik i
RIOJA (R)
BSTICK
Ioannina Basin
Tzedakis (1994)
Pollen percentage diagram plotted against depth. Lithostratigraphic column
is represented; symbols are based on Troels-Smith (1995).
Variance accounted for by
the nth zone as a proportion
of the total variance
(fluctuating curve) compared
with values from a brokenstick model (smooth curve):
(a) randomized data set,
(b) original data set.
Original data
Broken stick model
Zonation method: binary
divisive using the
information content
statistic.
Data set; Ioannina.
Bennett (1996)
Technical Point
Turns out that the binary divisive procedures SPLINTF and SPLITLSQ of
Gordon and Birks (1972) are an early implementation of De’ath’s
(2002) multivariate regression trees (MRT) discussed in the Modern
Regression lecture.
Both are MRTs where a vector of sample depths or ages is used as the
sole explanatory predictor variable
SPLINTF = distance-based MRT with information content as the
dissimilarity measure
SPLITLSQ = MRT with Euclidean distance as the distance measure
Advantage of MRT over SPLINTF/SPLITLSQ as a zonation procedure is
that the k-fold cross-validation in CARTs provides a simple way to
assess the number of zones into which the stratigraphical sequence
should be split.
MRT using the optimal partitioning approach still to be implemented.
mvpart (R)
SEQUENCE SPLITTING
Walker & Wilson (1978) J. Biogeog. 5, 1–21
Walker & Pittelkow (1981) J. Biogeog. 8, 37–51
SPLIT, SPLIT2
BOUND2
Need statistically ‘independent’ curves
Pollen influx (grains cm–2 year–1)
PCA or CA or DCA axes
Aitchison log-ratio transformation
Zik  log pik p 
i 

where
m
log pi    log pik m 

k 1 
CANOCO
LOGRATIO
Correlograms of
sequence splits with
charcoal, inorganic
matter and total
pollen influxes for
three sections of
the pollen record.
The vertical scales
give correlations;
the horizontal
scales give time lag
in years (assuming a
sampling interval of
50 years).
Technical Point
The sequence splitting of Walker and Wilson (1978) is a
precursor of regression trees within CART (see Modern
Regression lecture).
In a regression tree a quantitative response variable, in our
case a stratigraphical sequence of taxon A, is repeatedly
split so that at each partition the sequence is divided into
two mutually exclusive groups, each of which is
homogeneous as possible.
In the regression tree implementation, a vector of sample
depths or ages is used as the sole explanatory predictor
variable. The splitting is then applied to each group
separately until some stopping rule is reached.
Usually k-fold cross-validation is used to find the optimal
tree-size using cross-complexity (CC) pruning.
CC = Timpurity +  (Tcomplexity)
where Timpurity is the impurity of the current tree over all
terminal nodes; Tcomplexity is the number of terminal
leaves; and  is a real number >0
 is the tuning parameter that is minimised in CC
pruning. Represent trade-off between tree-size and
goodness-of-fit.
Small values of  give large trees; large values of  lead
to small trees.
Starting with full tree, search to identify the terminal
node that results in the lowest CC for a given value of .
As penalty  on tree complexity is increased, the tree that
minimises CC will become smaller and smaller until the
penalty is so great that a tree with a single node (i.e. the
original data) has the lowest CC:
Search produces a sequence of progressively smaller trees
with associated CC.
k-fold cross-validation is used to find the optimal value of
 that gives the minimal root mean squared error (RMSE).
Alternative is to select the smallest tree that lies within 1
standard error of the RMSE of the best tree.
rpart (R)
RATE OF CHANGE ANALYSIS
Amount of palynological compositional change per unit time.
Calculate dissimilarity between pollen assemblages of two adjacent
samples and standardise to constant time unit, e.g. 250 14C years.
Jacobson & Grimm (1986) Ecology 67, 958-966
Grimm & Jacobson (1992) Climate Dynamics 6, 179-184
RATEPOL
POLSTACK
(TILIA)
Graph of distance (number of
standard deviations) moved every
100 yr in the first three dimensions
of the ordination vs age. Greater
distance indicates greater change in
pollen spectra in 100yr.
Jacobson & Grimm (1986)
Jacobson & Grimm (1986)
GRADIENT ANALYSIS OF SINGLE SEQUENCE
Ordination methods
CA/DCA
joint plot
or PCA
biplot
Constrained CA or PCA
Sample summary
CA/DCA/PCA
Species arrangement
CCA or simple discriminants
CA = correspondence analysis
DCA = detrended correspondence analysis
PCA = principal components analysis
CCA = canonical correspondence analysis
VEGAN
CANOCO
PCA Biplot
74.6%
Gordon, 1982
Biplot of the Kirchner Marsh data; C2 = 0.746. The lengths of the Picea
and Quercus vectors have been scaled down relative to the other vectors.
Stratigraphically neighbouring levels are joined by a line.
CA Joint Plot
62%
Gordon, 1982
Correspondence analysis representation of the Kirchner Marsh data; C2 =
0.620. Stratigraphically neighbouring levels are joined by a line.
Stratigraphical plot
of sample scores on
the first
correspondence
analysis axis (left)
and of rarefaction
estimate of richness
(E(Sn)) (right) for
Diss Mere, England.
Major pollenstratigraphical and
cultural levels are
also shown. The
vertical axis is depth
(cm). The scale for
sample scores runs
from –1.0 (left) to +
1.2 (right).
Haberle & Bennett 2004
The 1st and 2nd axis of the Detrended Correspondence Analysis for Laguna Oprasa
and Laguna Facil plotted against calibrated calendar age (cal yr BP). The 1st axis
contrasts taxa from warmer forested sites with cooler herbaceous sites. The 2nd
axis contrasts taxa preferring wetter sites with those preferring drier sites.
Species arrangement
Percentage pollen and spore diagram from Abernethy Forest, Inverness-shire. The percentages
are plotted against time, the age of each sample having been estimated from the deposition
time. Nomenclatural conventions follow Birks (1973a) unless stated in Appendix 1. The
sediment lithology is indicated on the left side, using the symbols of Troels-Smith (1995). The
pollen sum, P, includes all non-aquatic taxa. Aquatic taxa, pteridophytes, and algae are
calculated on the basis of P +  group as indicated.
Pollen types re-arranged on the basis of the weighted average for depth
TRAN
ANALOGUE ANALYSIS
Modern training set
– similar taxonomy
– similar sedimentary environment
Compare fossil sample 1 with all modern samples, use appropriate
DC, find sample in modern set ‘most like’ (i.e. lowest DC) fossil
sample 1, call it ‘closest analogue’, repeat for fossil sample 2, etc.
Overpeck et al. (1985) Quat. Res. 23, 87–108
ANALOG
MATCH
MAT
ANALOGUE – R package
RIOJA
Compare fossil sample i
with modern sample j
Repeat for all
modern samples
Repeat for all
fossil samples
Calculate similarity
between i and j
Sij
Find modern sample with
highest similarity 'ANALOGUE'
? Evaluation
Dissimilarity
coefficients,
radiocarbon dates,
pollen zones, and
vegetation types
represented by the
top ten analogues
from the Lake West
Okoboji site.
Maps of squared chord distance values with
modern samples at selected time intervals
Plots of minimum
squared chorddistance for each
fossil spectrum at
each of the eight
sites.
Analogues and lake restoration
Flower et al. (1997)
A schematic representation of how
fossil diatom zones/samples in a
sediment core from an acidified
lake can be compared numerically
with modern surface sediment
samples collected from potential
modern analogue lakes. In this
space-for-time model the vertical
axis represents sedimentary
diatom zones defined by depth
and time; the horizontal axis
represents spatially distributed
modern analogue lakes and the
dotted lines indicate good floristic
matches (dij = <0.65), as defined
by the mean squared Chi-squared
estimate of dissimilarity (SCD, see
text).
Flower et al. (1997)
COMPARISON AND CORRELATION
BETWEEN TIME SERIES
Two or more stratigraphical sets of variables from same sequence.
Are the temporal patterns similar?
(1) Separate ordinations
Oscillation log - likelihood G-test or 2 test
(2) Constrained ordinations
Pollen data - 3 or 4 ordination axes or major patterns of
variation Y
Chemical data - 3 or 4 ordination axes X
Depth as a covariable
Does 'chemistry' explain or predict 'pollen'? i.e. is variance in Y
well explained by X?
Lotter et al. (1992) J. Quat. Sci. Pollen 16O/18O (depth)
34% 16% 12%
79%
12%
4%
1%
COMPARISON AND CORRELATION BETWEEN
TIME SERIES
Two or more stratigraphical sets of variables from same sequence.
Are the temporal patterns similar?
(1) Separate ordinations
Oscillation log - likelihood G-test or 2 test
(2) Constrained ordinations
Pollen data - 3 or 4 ordination axes or major patterns of
variation Y
Chemical data - 3 or 4 ordination axes X
Depth as a covariable
Does 'chemistry' explain or predict 'pollen'? i.e. is variance in Y
well explained by X?
Lotter et al. (1992) J. Quat. Sci. Pollen 16O/18O (depth)
Pollen, oxygen-isotope stratigraphy, and sediment composition of Aegelsee core AE-1
(after Wegmüller and Lotter 1990)
Pollen and oxygen-isotope stratigraphy of Gerzensee core
G-III (after Eicher and Siegenthaler 1976)
Is there a statistically significant relationship between the pollen
stratigraphy and the stable-isotope record?
Summary of the results from detrended correspondence analysis (DCA)
of late-glacial pollen spectra from five sequences. The percentage
variance represented by each DCA axis is listed.
Reduce pollen data to DCA axes. Use these then as ‘responses’
Site
No. of
samples
No. of
taxa
DCA Axis
1
2
3
4
Aegelsee AE-1
100
26
57.2
12.0
2.3
1.4
Aegelsee AE-3
54
32
44.3
3.3
1.5
1.4
Gerzensee G-III
65
28
37.6
4.0
1.2
0.9
Faulenseemoos
62
25
44.1
18.8
5.0
3.8
Rotsee RL-250
44
23
38.2
13.3
3.1
2.3
Results of redundancy analysis and partial redundancy analysis
permutation tests for the significance of axis 1 when oxygen isotopes and
depth are predictor variables, when oxygen is the only predictor, and when
oxygen isotopes are the predictor variable and depth is a covariable.
Site
Predictor
variable:  18O
and depth
Predictor
variable:  18O
Covariable:
depth
Predictor
variable:
 18O
Number of
response variables
(DCA axes)
Pollen DCA axes
Aegelsee AE-1
0.01a
0.01a
0.02a
2
Aegelsee AE-3
0.01a
0.16
0.20
1
Gerzensee G-III
0.01a
0.46
0.57
1
Faulenseemoos
0.01a
0.01a
0.01a
3
Rotsee RL-250
0.01a
0.21
0.08
2
a
Significant at p< 0.05
(Lotter et al. 1992)
MULTI-PROXY STUDIES
In multi-proxy studies (e.g. pollen, diatoms, chironomids,
etc. studied on the same core), important question is
‘are the major stratigraphical patterns of variation
(‘signal’) the same in all proxies?’
Laguna Facil, southern Chile
Massaferro et al. 2005 Quaternary Science Reviews 24:
2510-2522
Pollen and chironomids studied on the same core
Simplified each data-set to the first ordination axes of a
correspondence analysis (CA) and a principal components
analysis (PCA) for both data-sets
Chironomid stratigraphy
Massaferro et al. 2005
Pollen stratigraphy
Massaferro et al. 2005
Massaferro et al. 2005
Can detect similarities in both proxies and differences
1. Major change in both prior to 14,700 cal yr BP.
2. Changes in the chironomids tend to lag behind changes
in the pollen. Perhaps a chironomid response to
changes in vegetation (tree canopy and forest type) or
lake chemistry, resulting from changes in catchment
soils as a result of vegetational change.
3. At about 7200 cal yr BP, chironomids change before the
pollen. May be a response to climate change.
4. Strong correlations between the charcoal stratigraphy
and pollen and chironomid stratigraphies. Probable
importance of fire and/or vulcanism in influencing both
vegetational and limnological dynamics.
Massaferro et al. 2005
Can use ordination methods to summarise several palaeoecological
proxies and to compare with other proxies
Haberle et al. 2006
Lake Euramoo, NE Queensland,
last 800 years
Major changes between preEuropean period (A) and
European settlement (B)
Tested how well different proxies ‘predict’ or ‘explain’ (in
a statistical sense) other proxies
Only proxy that significantly predicted other proxies was
pollen that predicted changes in diatoms (25.4%) and
chironomids (15.4%)
Illustrates the importance of catchment and its vegetation
on the lake and its biota
Assessing Potential External 'Drivers' on an Aquatic
Ecosystem
Bradshaw et al. 2005 The Holocene 15: 1152-1162
Dalland Sø, a small (15 ha), shallow (2.6 m) lowland
eutrophic lake on the island of Funen, Denmark.
Catchment (153 ha) today
agriculture
77 ha
built-up areas 41 ha
woodland
32 ha
wetlands
3 ha
Nutrient rich – total P 65-120 mg l-1
Map of
Dalland Sø
Multi-proxy study to assess role of potential external 'drivers' or
forcing functions on changes in the lake ecosystem in last 7000 yrs.
Data:
No. of samples Transformation
Sediment loss-on-ignition %
560
None
Sediment dry mass accumulation
rate
560
Log (x + 1)
Sediment minerogenic matter
accumulation rate
560
Log (x + 1)
Plant macrofossil concentrations
280
Log (x + 1)
Pollen %
90
None
Diatoms %
118
None
Diatom inferred total P
118
None
Biogenic silica
84
Not used
Pediastrum %
90
None
Zooplankton
31
Not used
Terrestrial landscape or
catchment development
Bradshaw
et al. 2005
Aquatic ecosystem development
Bradshaw et al. 2005
DCA of pollen and diatom data separately to summarise major
underlying trends in both data sets
Pollen – high scores for trees, low
scores for light-demanding
herbs and crops
Diatom - high scores mainly
planktonic and large
benthic types, low scores
for Fragilaria spp. and
eutrophic spp. (e.g.
Cyclostephanos dubius)
Bradshaw et al. 2005
Major contrast between samples before and
after Late Bronze Age forest clearances
'Lake'
Prior to clearance,
lake experienced
few impacts.
After the clearance,
lake heavily
impacted.
'Catchment'
Bradshaw et al. 2005
Canonical Correspondence Analysis
Response variables:
Diatom taxa
Predictor variables:
Pollen taxa, LOI, dry mass and minerogenic accumulation rates,
plant macrofossils, Pediastrum
Covariable:
Age
69 matching samples
Partial CCA with age partialled out as a covariable. Makes
interpretation of effects of predictors easier by removing
temporal trends and temporal autocorrelation
Partial CCA all variables:
18.4% of variation in diatom data explained by Poaceae pollen,
Cannabis-type pollen, and Daphnia ephippia, the only three
independent and statistically significant predictors.
As different external factors may be important at different times, divided
data into 50 overlapping data sets – sample 1-20, 2-21, 3-22, etc.
Bradshaw
et al. 2005
CCA of 50 subsets from bottom to top and % variance explained
1. 4520-1840 BC Poaceae is sole predictor variable (20-22% of
diatom variance)
2. 3760-1310 BC LOI and Populus pollen (16-33%)
3. 3050-600 BC Betula, Ulmus, Populus, Fagus, Plantago, etc.
(17-40%)
i.e. in these early periods, diatom change influenced to some
degree by external catchment processes and terrestrial
vegetation change.
4. 2570 BC – 1260 AD Erosion indicators (charcoal, dry
mass accumulation), retting indicator Linum capsules,
Daphnia ephippia, Secale and Hordeum pollen (11-52%)
i.e. changing water depth and external factors
5. 160 BC – 1900 AD Hordeum, Fagus, Cannabis pollen,
Pediastrum boryanum, Nymphaea seeds (22-47%)
i.e. nutrient enrichment as a result of retting hemp,
also changes in water depth and water clarity
Bradshaw
et al. 2005
Strong link between inferred catchment change and within-lake development. Timing
and magnitude are not always perfectly matched, e.g. transition to Mediæval Period
ANALYSIS OF TWO OR MORE SEQUENCES
Regional zones, description of common features, interpretation,
detection of unique features.
Sequence comparison and correlation.
Sequence slotting
SLOTSEQ
FITSEQ
CONSSLOT
Combined scaling of two or more sequences.
CANOCO
SLOTSEQ
Slotting of the sequences S1 (A1, A2, ..., A10) and S2 (B1, B2, ..., B7), illustrating
the contributions to the measure of discordance  (S1, S2) and the 'length' of the
sequences, m(S1, S2).
The results of sequenceslotting of the Wolf Creek
and Horseshoe Lake pollen
sequences ( = 2.095).
Radiocarbon dates for the
pollen zone boundaries are
also given, expressed as
radiocarbon years before
present (BP).
Birks & Gordon (1985)
Comparison of oxygen-isotope records from Swiss lakes Aegelsee (AE-3), Faulenseemoos
(FSM) and Gerzensee (G-III) with the Greenland Dye 3 record (Dansgaard et al, 1982). LST
marks the position of the Laacher See Tephra (11,000 yr BP). Letters and numbers mark
the position of synchronous events (for details see text).
Lotter et al. (1992)
Psi values for pair-wise sequence slotting of the stable-isotope stratigraphy at
five Swiss late-glacial sites and the Dye 3 site in Greenland. Values above the
diagonal are constrained slotting, using the three major shifts shown in previous
figure; values below the diagonal are for sequence slotting in the absence of any
external constraints. The mean  18O and standard deviation for each sequence
is also listed.
CONSLOXY
FUGLA NESS, Shetland
Pollen diagram from Sel Ayre showing the frequencies of all determinable
and indeterminable pollen and spores expressed as percentages of total
pollen and spores (P).
Abbreviations: undiff. = undifferentiated, indet = indeterminable.
Comparison of Bjärsjöholmssjön and Färskesjön using principal component
analysis. The mean scores of the local pollen zones and the ranges of the sample
scores in each zone are plotted on the first and second principal components, and
are joined up in stratigraphic order. The Blekinge regional pollen assemblage
zones are also shown.
Birks &
Berglund
(1979)
Comparison of Färskesjön and Lösensjön using principal component analysis. The mean scores
of the local pollen zones and the ranges of the sample scores in each zone are plotted on the
first and second principal components, and are joined up in stratigraphic order. The regional
pollen assemblage zones are also shown.
Haberle & Bennett, 2004
The 1st and 2nd axis of the Detrended Correspondence Analysis for Laguna Oprasa
and Laguna Facil plotted against calibrated calendar age (cal yr BP). The 1st axis
contrasts taxa from warmer forested sites with cooler herbaceous sites. The 2nd
axis contrasts taxa preferring wetter sites with those preferring drier sites
Tzedakis & Bennett (1995)
Pollen percentage diagram of selected taxa plotted against depth.
Lithostratigraphic symbols are based on Troels-Smith (1995). For
correlations and ages see Tzedakis (1993, 1994).
Pollen percentage
diagrams of selected
arboreal taxa of the
Metsovon, Zista,
Pamvotis, and
Dodoni I and II forest
periods of Ioannina
249.
5e
7c
9c
11a + b + c
Tzedakis & Bennett (1995)
Solar insolation values of mid-month day for selected periods at latitude
39º40'N. Values are given for July and January extremes and July minus
January for each interglacial period calculated at thousand year intervals.
Values are expressed in cal cm2 day-1. In parentheses are percentage
differences from 10 ka values. Timing of extreme insolation excursions
also given. Data from a computer program written by N.G. Pisias, based on
Berger (1978). Chronology based on Imbrie et al. (1984) and Martinson et
al. (1987)
Tzedakis & Bennett, 1995
Combined plot of sample scores
on the first two principal
components for Metsovon, Zista,
Pamvotis, and Dodoni I forest
periods. Asterisks indicate the
base of the intervals considered.
Results of comparison of
vegetation and climatic signatures
of different interglacial periods.
'+' sign means similar and '-' means
different. First sign refers to
climate and second to vegetation
character.
Different climate, similar pollen
in one comparison
TEMPORAL DATA
- few (10–20) points
e.g. from monitoring
Rate of change
Gradient analysis (unconstrained, constrained)
Principal response curves
Variance partitioning
Trend analysis – regression against time,
Monte Carlo permutation testing
- many (>100) points
Time-series analysis – see Gavin Simpson’s lecture
HYPOTHESIS TESTING
Lake Development and Catchment Change
Assessing potential 'drivers' on aquatic ecosystems.
What determines changes in lake organisms and lake
sediments?
1. External climate forcing functions
2. Catchment forcing functions
3. Lake as isolated system that evolves through time with its
own internal dynamics
Birks et al.
2000
(a) Sägistalsee,
Bernese
Oberland, Swiss
Alps
Andy Lotter
A.F. Lotter et al. 2003
J. Paleolimnology 30: 253-342
Lotter &
Birks 2003
Age-depth model
Sedimentation rate
Lotter & Birks 2003
Wick et al. 2003
Wick et al. 2003
Heiri & Lotter 2003
Sägistalsee, Switzerland
Ideal study:
1. Critical ecological situation at tree-line today; sensitive
2. One core. Many proxies (pollen, macros, chironomids, cladocera,
grain size, sediment magnetics, sediment geochemistry)
3. Well dated; 18 AMS
14C
dates on terrestrial plant material
4. Well co-ordinated by A.F. Lotter
5. High quality data:
No. of
samples
No. of
taxa/variables
Pollen
212
203
Plant macros
372
53
Chironomids
82
30
Cladocera
112
7
Geochemistry
176
14
Grain-size
294
6
Magnetics
504
5
Data-set
6. Consistent numerical methodology on all proxies
7. Numerical methods used to test hypotheses about the
influence of climate and catchment processes on the
aquatic ecosystem in the perspective of the Holocene
time-scale. (Partial redundancy analysis with restricted
Monte Carlo permutation tests)
Of the catchment changes, the main ones appear to be the
spread of Picea abies at about 6300 cal BP and Bronze Age
and subsequent forest clearances and conversion to grazing
pastures.
8. Split proxy data into one predictor variable (plant
macrofossils as a reflection of catchment vegetation)
and several response variables (cladocera, chironomids,
pollen, sediment grain-size, magnetics, geochemistry)
Predictor variables:
Lotter &
Birks
2003
Hypotheses tested:
1. Climate has had a significant control on lake ecosystem changes
2. Catchment vegetation has played significant role on lake changes
"Responses"
(proxies)
Terrestrial
Pollen
Macrofossils
Lake biotic
Chironomids
Cladocera
Lake abiotic
Grain size
Magnetics
Geochemistry
Scale
Climate a significant predictor?
Catchment vegetation a
significant predictor?
Y
Y
-
-
Lake
Lake
N
N
Y
Y
Lake
Lake
Lake
*
Y
Y
(Y)
#
Catchment &
regional
Catchment
* Tested against insolation, central
European cold phases, & Atlantic IRD record
# Veg phases: Betula-Pinus cembra; Alnus-Pinus cembra; Picea abies ~
6300 cal BP; Pasture phases from Bronze Age to present
SPATIAL GEOGRAPHICAL DATA
Geographical co-ordinates X, Y
Spatial analysis
Legendre & Fortin (1989) Vegetatio 80: 107-138
Legendre (1993) Ecology 74: 1659-1673
Koenig (1999) Trends in Ecology & Evolution 14: 22-26
Borcard et al. (2004) Ecology 85: 1826-1832
STATISTICAL ANALYSIS
Random sample assumption
Spatial autocorrelation
Effect of spatial autocorrelation on tests of correlation
coefficients for randomly generated, positively
autocorrelated data
r
-1
Confidence interval 
of a correlation
coefficient

0
+1
True interval: r not significantly
different from zero
Confidence interval computed from
the usual tables r  0 ***
‘Liberal’ results – too many coefficients will be judged
statistically significant when, in reality, they are not
SPATIAL AUTOCORRELATION
Classical statistics assumes independence of observations.
Ecological variables very commonly show spatial structure in
the sample space.
Variable is autocorrelated when it is possible to predict values
of this variable at some points in space from the known values
at other sampling points whose spatial positions are known.
Correlation in relative mean
density of mountain hares
between eleven provinces in
Finland over 39 years (194685) plotted against distance
between centres of provinces.
HOW TO TEST FOR SPATIAL STRUCTURE?
Spatial autocorrelation coefficients – Moran's I
H0 – no spatial autocorrelation
Each value of the I coefficient is equal to
E(I) = -(n-1)-1  0
where E(I) is the expected I and n is the number of data points
H1 – there is significant spatial autocorrelation
The value of I is significantly different from E(I)


I(d)  nwij(y i  y )(y j  y ) W(yi  y )2 


I(d)  nwij(y i  y )(y j  y ) W(yi  y )2 
where y represents the values of the variables, all summations
are for i and j varying from 1 to n, the number of data points but
excluding where i = j. The wij's take the value 1 when the pair
(i,j) relates to distance class d (the one being computed) and is 0
otherwise, W is the sum of the wij's or the number of pairs (in the
whole square matrix of distances between points) taken into
account when computing coefficients for a given distance class.
I(d) is computed for each distance class d.
Moran's I usually -1 to +1 but can exceed these values.
Positive I suggests positive correlation
Negative I suggests negative correlation.
Can test for significance by standard errors and confidence
intervals or by randomisation tests.
Behaves like Pearson's correlation coefficient r as its numerator
is sum of cross-products of centred terms (covariance term),
comparing in turn the values found at all pairs of points in the
given distance class.
Sensitive to extreme values, like r is.
Plot a CORRELOGRAM where Moran's I is plotted against
distance (d).
All-directional correlogram – assume that the phenomenon is
isotropic, namely that the autocorrelation function is the same
whatever direction is considered.
Correlograms for
artificial data.
Black squares are
significant at  =
0.05
Legendre &
Fortin 1989
Moran's I correlogram for cross-validation residuals for transfer functions. See
low I in MAT and ANN, high I in WA and GLR (ML) (spatial autocorrelation not
sucked in by these methods), intermediate I in WAPLS
SPATIALLY CONSTRAINED CLUSTERINGS
Legendre
(1987)
In: Evolutionary Biogeography of the
Marine Algae of the North Atlantic
(eds. D.J. Garbary & R.R. Soult).
Springer
Legendre &
Legendre
(1984)
Can. J. Fish. Aquat. Sci. 41, 1781-1802
Andersson
(1988)
Vegetatio 74, 95-106
Openshaw
(1974)
Computer Applic. 3-4, 136-160
Webster & Burrough (1972)
J. Soil Sc. 23, 222-234
REGIONALISATION
REGULAR GRID
A)
Only group objects if they are adjacent
CONCLUST
DC matrix of objects
D
Adjacency matrix (1/0)
A
(adjacent if have side or corner in common)
Compare D and A. If not adjacent, flag as negative DC and ignore.
Generalised agglomerative strategy
7 methods
As fuse, update adjacency matrix
If Dab or Dbc positive, Dabc must be positive
Plot results as map for 10, 9, 8... 2 groups
CONCMAP
CONCSCR
printer
screen colours
Observations:
1)
Little difference in results between clustering methods (cf unconstrained ca).
Little difference with different DCs (within reason!).
2)
Faster than unconstrained ca.
3)
Spatial constraints with biogeographical data make little difference, i.e. data
strongly structured themselves.
IRREGULAR GRID
B)
Weight DC matrix between objects
Geog distance
Webster & Burrough (1972)
Dij  dij dmax .w
distance weighting
D 
inverse square

D

d
d
ij
max .w
Dijd  ij
1  w
exponential
d
ij
CONDCMAT
Weighting factor
1w

Dijd  Dij 1  e
 dij / w
where
w  w

Similar results to CONCLUST, but does not have to be grid pattern.
dij2
Andersson (1988) neighbour weighting 1/0 data for species (variable) analysis
NEIWEI
+
+
+
+
1
+
+
+
+
+
1 + 8 = 9 score
+
1
1 + 3 = 4 score
+
'pseudofrequency' scores
Scores
Species A
1 1
1 1
1
1
4 4
7 7
1
3
1 1 1 1 1
5 8 8 8 5
1 1 1 1 1
6 9 9 9 6
1 1 1 1 1
4 6 6 6 4
SPATIALLY CONSTRAINED ORDINATIONS
CCA or RDA
detect simple gradients using x and y co-ordinates
b1x  b2 y
Direction of gradient is tan–1 (b2/b1)
Complex gradients
quadratic
b1 x  b2 y  b3 x 2  b4 xy  b5 y 2 
b6 x 3  b7 x 2 y  b8 xy 2  b9 y 3
cubic
Trend-surface analysis
Can partial out spatial effects – remove effects of spatial autocorrelation.
CANOCO
CCA site scores
WA species scores
Maps obtained by
block kriging for the
sample scores, on
canonical axes 1
(top) and 2
(bottom), in the
species space (left)
and in the trendsurface geographic
space (right); values
multiplied by 100 for
mapping. Peaks are
shadowed. No
samples had been
taken from the
blanked area on the
left.
Axis 1
Axis 2
CCA site scores linear
combinations of env. variables
VARIANCE PARTITIONING INTO FOUR
ADDITIVE COMPONENTS
a) Non-spatial environmental variation
i.e. environmental effects after partialling geographical
variation
Local environmental
b) Spatially structured environmental variation
i.e. spatially covarying environmental variation
Regional environmental
c) Spatial variation not shared by environmental variables
i.e. spatial effects after partialling environmental
variables
Pure spatial
d) Unexplained
CCA
explanatory
vars
covariables  canonical
s
%
1) CCA
Envir
-
0.268
18.6
2) CCA
Geography
-
0.373
25.9
3) partial CCA
Envir
Geography
0.156
10.8
4) partial CCA
Geography
Envir
0.261
18.1
Total inertia 1.443
a) Non-spatial
(analysis 3)
b) Spatially covarying environmental
variation
(analyses 1-3)
c) Pure spatial
(analysis 4)
d) Unexplained
10.8%
7.8%
18.1%
63.3%
Variation partitioning of a species data table, showing that fraction (b) is the
intersection of the environmental and spatial components of the species
variation.
(a)
(b)
(c)
Environmental variance
(d)
Unexplained
Spatial structure
variance
Variation partitioning of the oribatid mites data matrix
100%
90%
Percent of variation
80%
43.0 %
70%
60%
50%
Undetermined
12.2 %
Space
40%
30%
31.0 %
Environment
20%
10%
Env + space
13.7 %
0%
Oribatids
Fraction A
Non-spatial
environmental
variation
13.7%
'Local environment'
'Pure environment'
independent of space
Fraction B
Spatially-structured
environmental
variation
31.0%
(Spatial component
of the environmental
influence)
Substrate moisture
content
Fraction C
Non-environmentally
explained
variation
12.2%
Spatial structure
independent of the
environmental variables
'Pure spatial'
Theoretical causal relationships between environmental variables (representing processes)
and community structure. Fractions (a), (b), (c) and (d) of the community data variation
refer to Figure 5. ECM: Environmental control model. BCM: Biotic control model. HD:
Historical dynamics. Asterisks * indicates factors not explicitly spelled out in the model.
Non-spatial
environmental
variation
Spatially
structured
env.
variation
Non-envir
spatial
variation
Unexplained
Fraction
Causal factor
Process
Effect
(a)
Environmental factor
ECM
- Community structure
(a)*
Non-spatially structured factor not
included in the analysis
ECM
- Env. variable in the analysis
- Non-spatial community var.
Historical events without spatial
structure at the study scale
HD
- Env. variable in the analysis
- Non-spatial community var.
(b)
Env. factor with spatial structure
ECM
- Community spatial structure
(b)*
Spatially structured env. factor not
included in the analysis
ECM
- Env. variable in the analysis
- Community spatial structure
Spatially structured historical events
HD
- Env. variable in the analysis
- Community spatial structure
Spatially structured factors not
included in the analysis
ECM
- Community spatial structure
Spatially structured historical events
HD
- Community spatial structure
Predation, competition, etc.
BCM
- Community spatial structure
Factor not included in the analysis, not
spatially structured (at study scale)
ECM
- Non-explained community var.
Biotic control factors not spatially
structured (at study scale)
BCM
- Non-explained community var.
Random variation, sampling error, etc.
Noise
- Non-explained community var.
(c)*
(d)*
Local
environment
Covariation
between
environment
and space
Spatial
Major limitation of this approach is that it is unsuitable for
spatial structures present at a WIDE range of different spatial
scales.
Principal co-ordinates analysis of neighbour matrices (PCNM).
Borcard & Legendre (2002) Ecological Modelling 153: 51-68
Borcard et al. (2004) Ecology 85: 1826-1832
Eigenvalue decomposition of a truncated matrix of geographic
distances between the sampling sites.
Eigenvalues corresponding to positive eigenvalues are used as
spatial descriptors in regression or canonical ordinations.
SPACEMAKER
PCNM (R)
spacemakeR
Borcard &
Legendre
(2002)
PCNM of linear transect of 100 samples, 1 m apart.
Set distance threshold at 1 m to retain only the closest neighbours:
replaced other distance by 1 m x 4 = 4 m.
Principal co-ordinates correspond to a series of sinusoids with
decreasing periods. Largest is n+1, smallest is ~3.
Borcard &
Legendre
(2002)
Ecological data – Adiantum tomentosum abundance along
transects in NE Peru. 260 adjacent 5 x 5 m subplots
(a) Fern (thick),
PCNM (thin line)
(b) very broad scale
(thick), broad
scale (thin line)
(c) medium scale
(d) fine scale
Oribatid mites and PCNM – irregular two-dimensional sampling
PCNM gives 43 variables with truncation distance of 1.012 m
Show coarse broad-scale patterns and fine-scale patterns
Forward selection in RDA retains 12 PCNM variables. Explains 45.1%
of variance (cf. 43.2% in simple RDA)
RDA Axis 1
22.6% variance – shrubs or no shrubs
R2 = 0.48
RDA Axis 2
8.4% variance – shrubs or hummocks
R2 = 0.11
RDA Axis 3
4.5% variance
R2 = 0.34
– areas of low water content and no shrubs
When use environmental variables and simple X-Y trend as covariables,
and RDA with PCNM variables, two significant axes remain. May reflect
unmeasured abiotic or biotic mechanisms, such as food sources.
Atlantic foraminifera & SST Telford & Birks (2005)
Matrix of PCNM variables created from matrix of
distances between N Atlantic sites truncated at
781 km, the minimum distance that links all sites
into a single network.
385 orthogonal PCNM representing space.
Forward selection in CCA retained 37 of these.
Represent large spatial patterns.
SST independent of space
1.8% variance
Covariation between SST & space 29.9% variance
Space independent of SST
42.5% variance
Unexplained
25.7%
Pure space explains most. Therefore there are
important unknown spatial structures in the data. If
only considering SST, expect strong spatial
autocorrelation in residuals of SST transfer function
models.
Lowest autocorrelation in MAT
and ANN residuals
Highest autocorrelation in WA
and GLR (= ML)
residuals
Highlights 'secret
assumption' of
transfer functions
PREDICTIVE MODELS FROM SPATIAL DATA
Nature management
– well explored areas, poorly explored areas
Lesotho bird atlas
Habitat variables  PCA axes
Logistic regression to model species occurrences and absences in
terms of habitat PCA
log
p
 b0  b1 x1  b2 x 2  b3 x 3  b4 x 4  b5 x 5
1  p 
PCA site scores
Wildlife management
GIS
recording
effort
Mt Graham red squirrel in relation to env vars
Logistic regression
Pereira & Itami (1991) Photogr. Engin. & Remote Sensing 57, 1475–1486
Summary of the overall logistic models.
The upper data are regression coefficients
with their standard errors in brackets.
Pied
crow
Ground
woodpecker
Cape
vulture
1
Cape
vulture
2*
PC1
-0.90
(0.28)
0.54
(0.18)
0.40
(0.14)
0.85
(0.28)
PC2
-0.14
(0.41)
-0.72
(o.29)
0.02
(0.22)
-0.15
(0.25)
PC3
-0.49
(0.35)
0.01
(0.28)
-0.31
(0.23)
-0.44
(0.27)
PC4
-0.34
(0.29)
-0.24
(0.29)
0.02
(0.29)
0.76
(0.48)
Effort
0.15
(0.09)
0.31
(0.14)
0.04
(0.03)
0.10
(0.04)
Constant
-2.43
(0.92)
-1.52
(0.84)
-0.75
(0.42)
-1.96
(0.79)
Deviance
33.95
45.21
62.73
48.88
Df
49
49
49
47
Pvalue+
0.95
0.63
0.09
0.40
* Cape vulture 2 excludes data for two squares
identified as having a disproportionate effect on the
model using all the data (Cape vulture 1).
+ The P-value is best interpreted as a measure of
standardized deviance, useful for comparing models
with differing degrees of freedom.
Distribution maps for three bird species in Lesotho
produced by logistic modelling of presenceabsence data. Higher probabilities of occurrence
are indicated by increasing circle size and actual
field records are shown as filled circles.
Hill (1991) J. Biogeogr. 18, 247–255
CCA
species data +/–
environmental data
log
max altitude
annual rainfall
mean temperature
geology
presence of coast
p
 b0  b1 x1  b2 x12  b3 x 2  b4 x 3  b5 x 4
1  p 
x1 – x4 are site scores in CCA
Predict distributions given simple environmental data.
Actual and predicted
distributions of species
using logit regression
with six parameters.
The species are Dipper
(Cinclus cinclus), Little
Ringed Plover
(Charadius dubius) and
Common Rockrose
(Helianthemum
nummularium). Circles
of increasing size signify
categories of
probability as follows:
1-4%; 5-10% 11-30%; 3150%; 51-75%; 76-100%.
Actual
DIPPER
Predicted
DIPPER
Actual LITTLE
RINGED
PLOVER
Predicted LITTLE
RINGED
PLOVER
Actual
ROCKROSE
Predicted
ROCKROSE
PREDICTION OF UPLAND PLANT COMMUNITY
DISTRIBUTION USING LOGISTIC REGRESSION
54 upland vegetation types recorded in 1,514 ten-kilometre grid
squares in the uplands of Scotland, England, and Wales.
Environmental variables from National Land Characteristics Data Bank.
Topography
13 variables
(22 possible)
Climate
18 variables
(29 possible)
Geology
19 variables
(29 possible)
Soil types
8 variables
(8 possible)
Land-use
2 variables
(22 possible)
Reduced 31 Topography + climate variables to 5 PCA axes (63.6%
variance) and 27 Geology + Soil type variables to 2 PCA axes (20.3%)
Used 5 PCA axes + their square terms, the 2 PCA axes, + Land-use
variables as predictors in logistic regression using the +/- of each
vegetation type as the response variable.
54 models
7
have rho (r2)
< 0.20
26
have rho
0.20 - 0.40
20
have rho
0.40 - 0.60
2
have rho
> 0.60
Mean rho values
Calcareous grassland
0.38
Heaths
0.41
Mires
0.26
Other grasslands
0.41
Woodland & scrub
0.40
Alpine snow-beds etc. 0.52
Poorest fits:
Heaths
1
Mires
5
Grasslands
1
Predicted and known
10km square distribution
of NVC U20 (Pteridium
aquilinum – Galium
saxatile community).
Predictions were not
made for lowland areas.
Predicted and known
10km square
distribution of NVC
U10 (Carex bigelowii –
Racomitrium
lanuginosum mossheath).
Predicted and known
10km square
distribution of NVC H13
(Calluna vulgaris –
Cladonia arbuscula
heath).
Predicted and known
10km square
distribution of NVC H9
(Calluna vulgaris –
Deschampsia flexuosa
heath) in the uplands.
Predicted and known
10km square
distribution of NVC M6
(Carex echinata –
Sphagnum
recurvum/auriculatum
mire).
Predicted and known
10km square
distribution of NVC
M10 (Carex dioica –
Pinguicula vulgaris
mire).
Predicted and known
10km square
distribution of NVC W19
(Juniperus communis –
Oxalis acetosella
woodland).
Salix herbaceaRacomitrium
heterostichum,
snow-bed
Cryptogramma
crispa-Athyrium
distentifolium,
snow-bed
Luzula sylvaticaGeum rivale, tallherb community
Saxifraga aizoidesAlchemilla glabra,
banks
Nardus strictaGalium saxatile,
grassland
Festuca ovinaAgrostis capillarisGalium saxatile,
grassland
Festuca ovinaAgrostis capillarisRumex acetosella,
grassland
Calluna vulgarisErica cinerea,
heath
Erica tetralixSphagnum
compactum,
wet heath
Erica tetralixSphagnum
papillosum, raised
and blanket mire
PREDICTING THE PROBABILITY OF SPECIES
OCCURRENCE USING SURVEY DATA
Le Duc et al. (1992) Watsonia 19: 97-105
Le Duc et al. (1992) Aspects of Applied Biology 29: 41-48
Firbank et al. (1998) Weed Research 35: 1-10
Plant recording
10 km grid squares
Tetrads
2 km grid squares
Impossible to record all tetrads, only record 3 (A, J, and W)
Convert tetrad data to probabilities of species occurrence,
introducing some spatial smoothing in the interpolation.
Layout of the
botanical monitoring
scheme of the BSBI.
Gaussian smoothing of
occurrence in tetrads.
Species occurrence
Probability of species occurrence
To predict species occurrence, need external predictors
(e.g. soil type, land-use classes) and logistic regression.
Veronica montana
(a) data
(b) estimated probability
(c) estimated probability
using soil groups
(d) estimated probability
using land-use classes
Soil type main predictor
Predicting weed distribution using tetrad data and
soil types.
Firbank et al. (1998)
 p 
  a  b x Smooth  c x Soil
log e
1 p 
Soil 16 classes
Alopecurus myosuroides
(a) tetrads
(b) smoothed probability
of occurrence
(c) prediction using (b) +
soils
(d) 10 km square map
(a) Elymus repens
(b) Legousia hybrida
(c) Papaver rhoeas
(d) Senecio jacobea
Species pool of cereal weeds greatest
in central and southern England. Does
not entirely coincide with distribution
of arable farming.
(a) grass weeds of cereals
(b) broad-leaved weeds
(c) distribution of arable land
PREDICTION OF FUTURE CHANGES - TROLLIUS EUROPAEUS
OBSERVED today
PREDICTED today
PREDICTED future
Watt et al. (1997)
Known distribution of globeflower
(Trollius europaeus)(data from the
Biological Records Centre)
Predicted current distribution using
Jan min. & July max. temp and
annual precipitation as independent
variables in a logistic regression.
Predicted distribution in 2050 using
the same model but imposing the
UK transient climate scenario for
2050.
KEY RESEARCHERS IN ANALYSIS OF
TEMPORAL PALAEOECOLOGICAL DATA
Steve Juggins
Ed Cushing
Eric Grimm
Bent Odgaard
Allan Gordon
Keith Bennett
Andy Lotter
KEY RESEARCHERS IN SPATIAL ANALYSIS
OF ECOLOGICAL DATA
Daniel
Borcard
Pierre Legendre
Mark Hill
Richard Telford
Marie-Josée Fortin
Download