Advanced analytical approaches in ecological data analysis The world comes in fragments Multivariate approaches to biodiversity Variables Sites Environmental variable matrix V L Sites Site GPS location matrix D Species Sites Species abundance matrix M Spatial regression Co-occurrence mapping Regression tree Impact analysis The raw data S Achillea_pannonica Agrostis_capillaris Agrostis_stolonifera_agg. Agrostis_vinealis Ajuga_genevensis G6-3 0 0.5 0 0 0 A2-2 0.1 0.5 0 0 0 C4-4 2 0.5 0 0.5 0 J4-4 0.5 0.5 0 0 0 D2-4 0 0 0 0 0 K7-2 0 0.5 0 0 0 K7-4 0 0 0 0 0 F1-3 0 0.5 0 0 0 M7-2 0.5 0.5 0 0 0.5 Plot Longitude Latitude G6-3 A2-2 C4-4 J4-4 D2-4 K7-2 K7-4 F1-3 M7-2 317.78 187.24 237.32 322.62 217.79 388.38 382.38 226.3 412.75 266.85 307.27 299.92 188.9 259.69 209.6 209.6 221.79 177.88 S CaCO3 Sand pH G6-3 A2-2 C4-4 J4-4 D2-4 K7-2 K7-4 F1-3 M7-2 0.95 0.11 0.85 1.53 1.93 0.58 0.58 0.38 0.63 85.66 81.31 74.42 74.24 74.24 83.45 83.45 78.45 82.15 8.69 8.01 7.97 8.05 8.08 8.23 8.23 8.25 8.4 Basic questions: • Do soil characteristics influence species abundances and diversity? • How do these relationships change in time? Starting hypotheses: • Neighboured plots are similar in species composition. • CaCO3 is of major importance for plant diversity. • Species occurrences is not random with respect to soil characteristics Neighboured plots are similar in species composition Mantel test We calculate the Soerensen (Dice) index of species similarity and transform to a distance matrix (D = 1 – S) We calculate the distance matrix of GPS data CaCO3 is of major importance for plant diversity The SAM input file Plot Long Lat Year Species A3-2 A3-3 A3-4 A4-2 A4-3 B3-3 B4-2 B4-4 B5-2 B5-4 C1-1 C2-2 203.09 197.09 197.09 218.95 212.95 209.28 231.14 225.14 247 241 195.75 211.61 319.46 325.46 319.46 331.64 337.64 309.6 315.78 315.78 327.97 327.97 269.37 275.55 2006 2006 2006 2006 2006 2006 2006 2006 2006 2006 2006 2006 3 6 4 4 4 3 4 3 3 3 6 3 Species richness Abundanc e 0.7 1 0.8 0.8 1.2 0.7 0.8 1.5 0.7 0.7 1.8 0.3 CaCO3 CaCO3 Sand pH 0.95 0.11 0.85 1.53 1.93 0.58 0.58 0.38 0.63 2.21 1.51 0.1 85.66 81.31 74.42 74.24 74.24 83.45 83.45 78.45 82.15 80.01 79.16 84.09 8.69 8.01 7.97 8.05 8.08 8.23 8.23 8.25 8.4 7.78 8.02 7.9 General linear models in the face of spatial autocorrelation Species richness at sites of different area 35 5 Area 31 5 9 15 22 50 5 We did not include the spatial distance into the regression 7 5 17 41 Richness 60 1 Spatial autocorrelation is inevitable in ecology Species 35 5 5 7 17 41 1 y = 0.94x - 2.47 r2 = 0.93, P < 0.01 40 20 0 0 20 40 Area 60 Collinearity 35 5 7 5 17 41 1 Temperature Precipitation 8.9 56.5 10.9 799.5 8.4 343.5 1.2 305.2 8.3 952.3 15.0 286.3 5.6 651.5 3.2 572.1 0.5 836.6 3.4 399.0 0.2 984.3 5.7 655.6 13.7 269.6 9.0 561.8 18.5 457.8 Autocorrelation Spatial autocorrelation Abundance 28.3 17.7 13.5 16.1 26.2 29.0 11.7 17.4 3.7 10.1 1.5 3.2 21.2 14.4 0.7 Spatial autocorrelation is inevitable. All ecological field data sets have a spatial structure. Aridity 0.15 0.94 0.94 0.00 0.75 0.69 0.59 0.11 0.83 0.45 0.56 0.11 0.26 0.56 0.94 Plot A3-2 A3-3 A3-4 A4-2 A4-3 Species CaCO3 3 0.95 6 0.11 4 0.85 4 1.53 4 1.93 Sand 85.66 81.31 74.42 74.24 74.24 pH 8.69 8.01 7.97 8.05 8.08 𝑟2 𝑛 − 𝑘 − 1 𝐹= 1 − 𝑟2 𝑘 F increases proportionally to the degrees of freedom n, that is to the number of data points. P decreases with increasing number of data points (sample size). 1 Statistical significance at the 1% error level 0.1 r2 Bivariate case 𝑟2 𝐹= (𝑛 − 2) 1 − 𝑟2 0.01 0.001 Any statistical test will eventually become significant if you only increase the sample size. y = 5.0x-0.96 0.0001 1 100 N 10000 1 7 41 7 17 5 7 5 7 35 𝑟2 𝑛 − 𝑘 − 1 𝑟 2 18 − 3 − 1 𝐹= = = 4.5 1 − 𝑟2 𝑘 1 − 𝑟2 3 Spatial autocorrelation 7 17 𝑟2 𝑛 − 𝑘 − 1 𝑟2 4 − 3 − 1 𝐹= = =0 1 − 𝑟2 𝑘 1 − 𝑟2 3 Spatial autocorrelation reduces the effective degrees of freedom. 35 7 Using spatially autocorrelated data we artificially increase the degrees of freedom and the F-score. We get too often statistically significant results. 7 17 35 7 What to do??? First, test for spatial autocorrelation: Moran’s I What to do??? Reduce the degrees of freedom N = 15 Neff = 4 𝐹𝑒𝑓𝑓 4−4 =𝐹 =0 4 Neighbor joining cluster analysis UPGMA cluster analysis 𝑡𝑒𝑓𝑓 4−4 =𝑡 =0 4 What to do??? Correct for the effects of spatial autocorrelation What to do??? Correct for the effects of spatial autocorrelation 𝒀 = 𝑿𝑏 + 𝑐𝑳𝒂𝒕 + 𝑑𝑳𝒐𝒏𝒈 + 𝐸 𝒀 = 𝑿𝑏 + 𝑐1 𝑳𝒂𝒕 + 𝑐2 𝑳𝒂𝒕2 + 𝑐1 𝑳𝒐𝒏𝒈 + 𝑐2 𝑳𝒐𝒏𝒈2 + 𝐸 S Euclidean G6-3 distances G6-3 0.0 A2-2 136.7 C4-4 87.0 J4-4 78.1 D2-4 100.3 A2-2 C4-4 J4-4 D2-4 136.7 0.0 50.6 179.8 56.5 87.0 50.6 0.0 140.0 44.7 78.1 179.8 140.0 0.0 126.5 100.3 56.5 44.7 126.5 0.0 𝒀 = 𝑿𝑏 + 𝑼𝑐 + 𝑬 Trend surface analysis is able to capture broad scale trends Eigenvalues 2140.4 -938.7 Eigenvectors 0.191 0.046 0.307 0.394 0.244 0.316 0.176 -0.146 0.236 0.284 𝜮𝑼 = 𝜆𝑼 (𝜮 − 𝜆𝑰)𝑼 = 0 Eigenvector regression or eigenvector mapping Autocorrelation models 𝑪𝒀 = 𝑪𝑿𝑏 Multiply Y and X by a spatial corrective 𝑿𝑇 𝑪𝑿 −1 𝑤𝑖𝑗 = 𝑿𝑇 𝑪𝒀 = 𝑏 1 𝑑𝑖𝑗 𝛼 Spatial weights of C The larger a the more variance goes into space. a = means no spatial effect (OLS). Often all the whole variance goes into the spatial component leaving no room for the predictors. r is an additional weight factor (r < 1). r = 0 means no spatial effect (OLS). r = 1 means all variance goes into space. The input tab delimited text file for SAM Plot Longitude Latitude EV1 G6-3 317.78 266.85 0.01639 A2-2 187.24 307.27 -0.03864 C4-4 237.32 299.92 -0.02015 J4-4 322.62 188.9 0.04304 D2-4 217.79 259.69 -0.01349 K7-2 388.38 209.6 0.05755 K7-4 382.38 209.6 0.05561 F1-3 226.3 221.79 0.001452 M7-2 412.75 177.88 0.0756 I3-1 300.57 198.58 0.03283 No clear spatial trend in species richness S 15 17 16 16 13 21 15 16 13 12 CaCO3 0.95 0.11 0.85 1.53 1.93 0.58 0.58 0.38 0.63 2.21 Sand 85.66 81.31 74.42 74.24 74.24 83.45 83.45 78.45 82.15 80.01 pH 8.69 8.01 7.97 8.05 8.08 8.23 8.23 8.25 8.4 7.78 Do soil properties influence species richness? Variables Constant CaCO3 Sand pH r2 P Variables Constant Longitude Latitude CaCO3 Sand pH r2 P Coeff. 6.14 -0.10 -0.02 1.27 0.02 0.08 Coeff. 6.13 0.01 0.01 -0.35 -0.06 1.00 0.04 0.007 Std.err. 5.62 0.37 0.04 0.50 OLS t 1.09 -0.27 -0.42 2.52 p 0.28 0.79 0.67 0.01 Trend surface analysis Std.err. t p 5.58 1.10 0.27 0.00 3.00 0.00 0.00 1.75 0.08 0.38 -0.91 0.36 0.05 -1.31 0.19 0.51 1.97 0.05 The dependence of richness on pH vanishes after accounting for spatial structure. R^2 0.00 0.00 0.00 0.02 R^2 0.00 0.02 0.00 0.00 0.00 0.02 Eigenvector mapping Variables Coeff. Std.err. t p Constant 6.34 5.62 1.13 0.26 EV1 8.08 5.72 1.41 0.16 CaCO3 -0.19 0.38 -0.51 0.61 Sand -0.01 0.04 -0.26 0.80 pH 1.16 0.51 2.27 0.02 2 r 0.02 P 0.06 R^2 0.00 0.01 0.00 0.00 0.02 The Hühnerwasser catchment is divided into a western and an eastern part with different sand soil content and pH. Trend surface analysis captures this gradient.