Environment and species occurrence

advertisement
Advanced analytical approaches in
ecological data analysis
The world comes in fragments
Multivariate approaches to biodiversity
Variables
Sites
Environmental variable
matrix V
L
Sites
Site GPS location matrix D
Species
Sites
Species abundance matrix M
Spatial regression
Co-occurrence mapping
Regression tree
Impact analysis
The raw data
S
Achillea_pannonica
Agrostis_capillaris
Agrostis_stolonifera_agg.
Agrostis_vinealis
Ajuga_genevensis
G6-3
0
0.5
0
0
0
A2-2
0.1
0.5
0
0
0
C4-4
2
0.5
0
0.5
0
J4-4
0.5
0.5
0
0
0
D2-4
0
0
0
0
0
K7-2
0
0.5
0
0
0
K7-4
0
0
0
0
0
F1-3
0
0.5
0
0
0
M7-2
0.5
0.5
0
0
0.5
Plot
Longitude
Latitude
G6-3 A2-2 C4-4 J4-4
D2-4 K7-2 K7-4 F1-3 M7-2
317.78 187.24 237.32 322.62 217.79 388.38 382.38 226.3 412.75
266.85 307.27 299.92 188.9 259.69 209.6 209.6 221.79 177.88
S
CaCO3
Sand
pH
G6-3 A2-2 C4-4 J4-4
D2-4 K7-2 K7-4 F1-3
M7-2
0.95 0.11 0.85 1.53 1.93 0.58 0.58
0.38 0.63
85.66 81.31 74.42 74.24 74.24 83.45 83.45 78.45 82.15
8.69 8.01 7.97 8.05 8.08 8.23 8.23
8.25
8.4
Basic questions:
• Do soil characteristics influence
species abundances and
diversity?
• How do these relationships
change in time?
Starting hypotheses:
• Neighboured plots are similar in
species composition.
• CaCO3 is of major importance for plant
diversity.
• Species occurrences is not random
with respect to soil characteristics
Neighboured plots are similar in species composition
Mantel test
We calculate the Soerensen
(Dice) index of species
similarity and transform to a
distance matrix (D = 1 – S)
We calculate the distance
matrix of GPS data
CaCO3 is of major importance for plant diversity
The SAM
input file
Plot
Long
Lat
Year
Species
A3-2
A3-3
A3-4
A4-2
A4-3
B3-3
B4-2
B4-4
B5-2
B5-4
C1-1
C2-2
203.09
197.09
197.09
218.95
212.95
209.28
231.14
225.14
247
241
195.75
211.61
319.46
325.46
319.46
331.64
337.64
309.6
315.78
315.78
327.97
327.97
269.37
275.55
2006
2006
2006
2006
2006
2006
2006
2006
2006
2006
2006
2006
3
6
4
4
4
3
4
3
3
3
6
3
Species richness
Abundanc
e
0.7
1
0.8
0.8
1.2
0.7
0.8
1.5
0.7
0.7
1.8
0.3
CaCO3
CaCO3
Sand
pH
0.95
0.11
0.85
1.53
1.93
0.58
0.58
0.38
0.63
2.21
1.51
0.1
85.66
81.31
74.42
74.24
74.24
83.45
83.45
78.45
82.15
80.01
79.16
84.09
8.69
8.01
7.97
8.05
8.08
8.23
8.23
8.25
8.4
7.78
8.02
7.9
General linear models in the face of spatial autocorrelation
Species richness at sites of different area
35
5
Area
31
5
9
15
22
50
5
We did not
include the
spatial distance
into the
regression
7
5
17
41
Richness
60
1
Spatial autocorrelation is inevitable in
ecology
Species
35
5
5
7
17
41
1
y = 0.94x - 2.47
r2 = 0.93, P < 0.01
40
20
0
0
20
40
Area
60
Collinearity
35
5
7
5
17
41
1
Temperature Precipitation
8.9
56.5
10.9
799.5
8.4
343.5
1.2
305.2
8.3
952.3
15.0
286.3
5.6
651.5
3.2
572.1
0.5
836.6
3.4
399.0
0.2
984.3
5.7
655.6
13.7
269.6
9.0
561.8
18.5
457.8
Autocorrelation
Spatial
autocorrelation
Abundance
28.3
17.7
13.5
16.1
26.2
29.0
11.7
17.4
3.7
10.1
1.5
3.2
21.2
14.4
0.7
Spatial autocorrelation is inevitable.
All ecological field data sets have a spatial structure.
Aridity
0.15
0.94
0.94
0.00
0.75
0.69
0.59
0.11
0.83
0.45
0.56
0.11
0.26
0.56
0.94
Plot
A3-2
A3-3
A3-4
A4-2
A4-3
Species CaCO3
3
0.95
6
0.11
4
0.85
4
1.53
4
1.93
Sand
85.66
81.31
74.42
74.24
74.24
pH
8.69
8.01
7.97
8.05
8.08
𝑟2 𝑛 − 𝑘 − 1
𝐹=
1 − 𝑟2
𝑘
F increases proportionally to the degrees
of freedom n, that is to the number of
data points.
P decreases with increasing number of
data points (sample size).
1
Statistical
significance at
the 1% error
level
0.1
r2
Bivariate case
𝑟2
𝐹=
(𝑛 − 2)
1 − 𝑟2
0.01
0.001
Any statistical test will eventually
become significant if you only increase
the sample size.
y = 5.0x-0.96
0.0001
1
100
N
10000
1
7
41
7
17
5
7
5
7
35
𝑟2 𝑛 − 𝑘 − 1
𝑟 2 18 − 3 − 1
𝐹=
=
= 4.5
1 − 𝑟2
𝑘
1 − 𝑟2
3
Spatial
autocorrelation
7
17
𝑟2 𝑛 − 𝑘 − 1
𝑟2 4 − 3 − 1
𝐹=
=
=0
1 − 𝑟2
𝑘
1 − 𝑟2
3
Spatial autocorrelation reduces the
effective degrees of freedom.
35
7
Using spatially autocorrelated data we
artificially increase the degrees of
freedom and the F-score.
We get too often statistically significant
results.
7
17
35
7
What to do???
First, test for spatial autocorrelation:
Moran’s I
What to do???
Reduce the degrees of freedom
N = 15
Neff = 4
𝐹𝑒𝑓𝑓
4−4
=𝐹
=0
4
Neighbor joining cluster analysis
UPGMA cluster analysis
𝑡𝑒𝑓𝑓
4−4
=𝑡
=0
4
What to do???
Correct for the effects of spatial autocorrelation
What to do???
Correct for the effects of spatial autocorrelation
𝒀 = 𝑿𝑏 + 𝑐𝑳𝒂𝒕 + 𝑑𝑳𝒐𝒏𝒈 + 𝐸
𝒀 = 𝑿𝑏 + 𝑐1 𝑳𝒂𝒕 + 𝑐2 𝑳𝒂𝒕2 + 𝑐1 𝑳𝒐𝒏𝒈 + 𝑐2 𝑳𝒐𝒏𝒈2 + 𝐸
S
Euclidean
G6-3
distances
G6-3
0.0
A2-2
136.7
C4-4
87.0
J4-4
78.1
D2-4
100.3
A2-2
C4-4
J4-4
D2-4
136.7
0.0
50.6
179.8
56.5
87.0
50.6
0.0
140.0
44.7
78.1
179.8
140.0
0.0
126.5
100.3
56.5
44.7
126.5
0.0
𝒀 = 𝑿𝑏 + 𝑼𝑐 + 𝑬
Trend surface analysis is able
to capture broad scale trends
Eigenvalues
2140.4
-938.7
Eigenvectors
0.191
0.046
0.307
0.394
0.244
0.316
0.176
-0.146
0.236
0.284
𝜮𝑼 = 𝜆𝑼
(𝜮 − 𝜆𝑰)𝑼 = 0
Eigenvector regression or
eigenvector mapping
Autocorrelation models
𝑪𝒀 = 𝑪𝑿𝑏
Multiply Y and X by a
spatial corrective
𝑿𝑇 𝑪𝑿
−1
𝑤𝑖𝑗 =
𝑿𝑇 𝑪𝒀 = 𝑏
1
𝑑𝑖𝑗
𝛼
Spatial weights of C
The larger a the more variance goes into
space.
a = means no spatial effect (OLS).
Often all the whole
variance goes into the
spatial component
leaving no room for the
predictors.
r is an additional weight factor (r < 1).
r = 0 means no spatial effect (OLS).
r = 1 means all variance goes into space.
The input tab delimited text file for SAM
Plot Longitude Latitude
EV1
G6-3
317.78
266.85 0.01639
A2-2
187.24
307.27 -0.03864
C4-4
237.32
299.92 -0.02015
J4-4
322.62
188.9 0.04304
D2-4
217.79
259.69 -0.01349
K7-2
388.38
209.6 0.05755
K7-4
382.38
209.6 0.05561
F1-3
226.3
221.79 0.001452
M7-2
412.75
177.88 0.0756
I3-1
300.57
198.58 0.03283
No clear
spatial trend
in species
richness
S
15
17
16
16
13
21
15
16
13
12
CaCO3
0.95
0.11
0.85
1.53
1.93
0.58
0.58
0.38
0.63
2.21
Sand
85.66
81.31
74.42
74.24
74.24
83.45
83.45
78.45
82.15
80.01
pH
8.69
8.01
7.97
8.05
8.08
8.23
8.23
8.25
8.4
7.78
Do soil properties influence species richness?
Variables
Constant
CaCO3
Sand
pH
r2
P
Variables
Constant
Longitude
Latitude
CaCO3
Sand
pH
r2
P
Coeff.
6.14
-0.10
-0.02
1.27
0.02
0.08
Coeff.
6.13
0.01
0.01
-0.35
-0.06
1.00
0.04
0.007
Std.err.
5.62
0.37
0.04
0.50
OLS
t
1.09
-0.27
-0.42
2.52
p
0.28
0.79
0.67
0.01
Trend surface analysis
Std.err.
t
p
5.58
1.10
0.27
0.00
3.00
0.00
0.00
1.75
0.08
0.38
-0.91
0.36
0.05
-1.31
0.19
0.51
1.97
0.05
The dependence of richness on pH
vanishes after accounting for spatial
structure.
R^2
0.00
0.00
0.00
0.02
R^2
0.00
0.02
0.00
0.00
0.00
0.02
Eigenvector mapping
Variables Coeff. Std.err.
t
p
Constant 6.34
5.62
1.13
0.26
EV1
8.08
5.72
1.41
0.16
CaCO3
-0.19
0.38
-0.51
0.61
Sand
-0.01
0.04
-0.26
0.80
pH
1.16
0.51
2.27
0.02
2
r
0.02
P
0.06
R^2
0.00
0.01
0.00
0.00
0.02
The Hühnerwasser catchment is divided into
a western and an eastern part with different
sand soil content and pH.
Trend surface analysis captures this gradient.
Download