Multivariate analysis in community ecology Gerry Quinn Deakin University Data sets in community ecology • Multivariate abundance data • Sampling or experimental units – plots, cores, panels, quadrats …… – usually in hierarchical spatial or temporal structure • Abundances recorded for multiple taxa in each unit – simple counts, densities, % cover, presence-absence …… • Environmental variables recorded in each unit – pH, salinity, temperature, nutrients, sediment load, elevation ….. Typical aims • Examine spatial and temporal patterns in species composition – assemblage/community “structure”, more than simply biodiversity (e.g. taxon richness/diversity) – test formal hypotheses about spatial and temporal differences in composition • Relate patterns to unit (or higher) level environmental predictors – typical linear model type question • Determine which taxa are most important in “driving” the patterns – which taxa most typify differences across spatial and temporal hierarchies Why multivariate? • Individual taxa of main interest – concern over multiple univariate hypothesis testing (Type 1 error rates) – referees and editors won’t accept paper with 50-100 ANOVAs • Community (assemblage) structure interest – recognition of limitations of univariate biodiversity (richness, diversity, evenness) measures – hypotheses about community/assemblage composition • Most multivariate analyses in community ecology also incorporate univariate (individual taxa or environmental predictors) models Forest bird communities • Does bird community composition vary between forest types? – 5 types (box-ironbark, river redgum, Gippsland manna gum etc.) plus mixed • Maximum bird abundance (across 4 seasons) beechworthonline.com.au – 102 species across 37 sites • Mac Nally (1989) Swift parrot - Wikipedia Estuary nematode communities • Does nematode community composition vary between sites and with environmental variables? • Nematode abundance (6 seasonal “replicates) – 182 species across 19 “sites” • Environmental variables Exe estuary - Wikipedia – 6 (sediment particle size, % organic matter etc.) at each site • Clarke & Warwick (1993) Marine nematodes http://www.ipm.iastate.edu Site Sp1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Sp2 90 54 47 52 0 5 8 3 51 0 0 0 0 1 0 0 0 0 1 Sp3 187 158 117 27 0 0 14 18 2 0 0 0 0 0 0 0 0 0 0 Sp4 90 66 28 6 0 0 145 35 206 0 0 0 0 0 0 0 0 0 0 Sp5 23 51 97 72 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Sp6 123 22 9 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 Sp7 28 10 26 3 0 0 4 17 1 0 0 0 0 0 0 0 0 0 0 etc. 5 etc. 5 etc. 3 etc. 1 etc. 0 etc. 1 etc. 120 etc. 94 etc. 76 etc. 0 etc. 0 etc. 0 etc. 0 etc. 0 etc. 0 etc. 0 etc. 0 etc. 0 etc. 0 etc. Part WTab H2S Shore % size depth depth height organ Salinity 0.06 0 2.167 4 6.43 24.833 0.06 0 3.183 3 7.06 22.833 0.06 0 1.817 2 7.99 17.833 0.06 0 2.02 1 7.15 16.2 1.275 20 20 5 0.24 10 0.562 3.417 2.95 4 0.37 76.6 0.06 0 2.167 3 1.98 76 0.177 0 2.683 2 2.22 81.2 0.06 0 2.66 1 5.88 71.2 0.451 20 20 5 0.09 10 0.205 4.417 7.25 4 0.39 88 0.528 20 20 3 0.09 88 0.598 20 20 2 0.06 88 0.769 0 20 1 0.09 88.5 0.468 14.917 20 5 0.06 89 0.837 6.333 20 4 0.04 90.875 0.797 6.75 20 3 0.06 91.667 1.141 3.667 20 2 0.07 89.4 0.223 0 20 1 0.09 90.833 Impact assessment • Does sessile marine animal community composition vary between sewage impact and control sites? – 3 control and 1 impact locations – 4 randomly chosen times – replicate sites and photographic quadrats at each location • Percent cover of 58 taxa • Classical “beyond” BACI design – split-plot type linear model • Terlizzi et al (2005) http://www.conisma.it/total/t_aim.html Three broad approaches • Eigenanalyses – distance measure implied • Distance-based analyses – distance measure explicit and user-selected • Multi-species linear models – combine taxon-specific univariate (linear) models – no distance measure required Eigenanalysis methods • Principal components analysis (PCA) – implied Euclidean distance • Correspondence analysis (CA) – implied chi-square distance • Canonical correspondence analysis (CCA/CANOCO) – constrains ordination based on linear modelling with environmental variables • Strengths – biplots of sample and species ordinations – CCA provides measures of fit with covarying environmental variables Cajo ter Braak Rodents in habitat fragments SITE Florida Sandmark 34street Balboaterr Katesess Altalajolla Laurel Canon Zena AREA 25 84.1 53.8 51.8 25.6 32.1 9.7 8.7 8.5 DISTX AGE 2100 914 1676 243 822 121 1554 1219 2865 etc. Bolger et al (1997) 50 20 34 34 16 14 79 58 36 RRATTUS MMUS 0 0 0 0 0 0 0 0 3 PCALIF 13 1 4 4 2 1 11 16 8 PEREM 3 57 36 53 63 48 0 0 0 1 65 0 1 21 35 0 0 0 RMEGAL NFUSC NLEPID 1 2 9 16 2 9 5 30 11 16 12 8 0 0 0 0 0 0 0 8 0 0 0 12 0 0 0 PFALLAX MCALIF 0 2 0 18 0 2 0 0 0 0 3 0 3 0 2 0 0 0 Rodent data – CA biplot Axis 2 Rr Acuna El mac 54th Street Baja Zena 32nd Street Sth Oakcrest Axis 1 Florida Mm 7 fragments Rodent data – CCA triplot Axis 2 Mc Pe Sandmark 34th Street Nl Area Laurel Balboa Mm Dist Spruce Age Axis 1 El mac Edison Acuna 54th Street Montanosa Rr Issues • Both methods “compress” distances at ends of axes (socalled arch or horseshoe effect) Comp 2 – detrended CA brute force “fix” for this effect • CA and CCA implicitly upweight rarer taxa by use of chisquare distance • No choice of distance measure Comp 1 PCA bird community data Distance-based methods • Include principal coordinates analysis (PCoA), multidimensional scaling (MDS), generalised dissimilarity modelling (GDS) • Hypothesis testing – compare groups using multi-response permutation procedure (MRPP), analysis of similarities (ANOSIM), permutational multivariate ANOVA (PERMANOVA) – relate to environmental variables with Mantel test, BIO-ENV Marti Anderson John Curtis Bob Clarke Distance-based methods • Strengths – flexibility of distance/dissimilarity measure, standardisation and transformation – consistency in that ordination and subsequent analyses based on original dissimilarities – some dissimilarities can be “decomposed” into relative taxon contributions (similarity percentages - SIMPER) nMDS – bird community data PERMANOVA – bird community data nMDS – subtidal reef data PERMANOVA – subtidal reef data Issues • Flexible choice of distance/dissimilarity measure – ecologists nearly always default to Bray-Curtis – does B-C represent ecological differences of interest? • Modelling dissimilarities tricky – appropriate probability distributions – permutation procedures usually applied – robustness for complex models? – PERMANOVA only partitions SS not likelihoods – lack of independence – rely on permutation robustness • Limited predictive capacity • Distance-based methods cannot easily separate location and dispersion effects • Location vs dispersion • Warton et al (2012) Location vs dispersion • Transformation of abundances may help BUT many taxa have very skewed distributions • Issue recognised by PRIMER/PERMANOVA – “we can consider the homogeneity of dispersions to be included as part of the general null hypothesis of "no differences" among groups being tested by PERMANOVA (even though the focus of the PERMANOVA test is to detect location effects)” (PERMANOVA manual p.22) • On going debate PRIMER/PERMANOVA vs mvabund “Univariate” linear model approach • Fit separate generalised linear models to each taxon – based on –ve binomial distribution (over-dispersed counts) • Testing overall group or covariate effects – sum likelihood ratio (LR) tests across taxa – use permutation (resampling) methods to generate test statistic • Relative taxon contribution to patterns – LR statistic as measure of strength of individual taxon contributions • Strengths – linear models framework, univariate predictive capacity – handles mean-variance relationship • Issues – not an “ordination” method David Warton Methods in community ecology • Journals searched 2011-2012 – Austral Ecology – Oikos • Analyses of community/assemblage (species abundance incl. pres-abs data) – 62 papers found • Methods used – – – – – overall multivariate “philosophy” choice of dissimilarity measure (if relevant) transformation/standardisation used modeling (hypothesis testing) method choice of “ordination” plot Multivariate approach Approach Eigenanalysis Distance-based Combined taxonspecific linear models # papers 15 % papers 24 47 0 76 0 Eigenanalyses Approach MANOVA / DFA # papers 3 PCA Correspondence analysis (incl. detrended) 0 8 Constrained (canonical) correspondence analysis 4 Majority of “ordinations” based on biplots, many with vectors fitted for environmental predictors (triplots) Distance-based Dissimilarity measure Bray-Curtis Sorensen Jaccard Gower # papers 31 4 2 2 Distance/dissimilarity • Why do ecologists default to Bray-Curtis? – Faith et al (1987 – Vegetatio) strongly recommended B-C as robust indicator of ecological gradients – ranges between 0 (identical samples) and 1 (no species in common) – handles joint absences (taxa missing from both samples) – default in PRIMER/PERMANOVA, PC-ORD • Does B-C represent patterns ecologists are really interested in? Distance-based Approach Comparing groups ANOSIM / PERMANOVA / dbRDA MRPP ANOVA on MDS axis scores # papers 24 6 2 Majority of “ordinations” based on non-metric MDS, 3 papers used cluster analysis Distance-based Approach Relating to env predictors BIO-ENV/ Relate Mantel tests Regression/correlation with MDS axis scores Generalised dissimilarity modelling Determining taxa driving group differences SIMPER # papers 24 6 2 1 9 Transformations • Transformations of abundances common in ecology – log (y+1) or square/fourth root – original PRIMER program had 4th root as default! • Most common reason - to reduce the influence of most abundant (dominant) taxa and give relatively greater weighting to rarer taxa – each taxon will be affected differently depending on its distribution? – effects on interaction terms almost never considered • Issues of unequal dispersions almost never raised in ecological papers – “it is not at all difficult to understand that transformations will also affect relative dispersions in multivariate space” (PERMANOVA manual p. 97) Standardisations None Sample • Invertebrate assemblages in lake (Quinn et al 1996) • Four site-season combinations • nMDS on Bray-Curtis • Four standardisations: • • • • None By sample totals By taxa totals Double • Bray-Curtis vs Canberra Taxa Double To Bayes or not to Bayes…. Bayesian approaches • Detecting transitions between upslope and riparian vegetation – management of stream riparian zones • Based on plant assemblage data (% cover) along transects away from stream – pairwise Canberra distances between quadrats along each transect • Aim - to find the model with the highest probability of being the break between riparian and upslope vegetation – usual MCMC estimation of models Acheron River Bayes factors > 10 Higher elevation sites Lower elevation sites Mac Nally et al (2008) Plant Ecology Bayesian approaches • Maybe more robust than ML for complex models – already being used for variance estimation and confidence (credible) intervals in some mixed model software • Straightforward(?) under mvabund generalised linear model approach – select suitable probability distributions for parameters – use uninformative prior if appropriate • More difficult with distance-based methods – but can be adapted (see Mac Nally 2005 Divers & Distr) – other examples using MDS and clustering (Oh & Raftery 2007 J Comp Graph Stat) focus on graphical representation (“ordination”) Questions for discussion • Is the confounding of location and dispersion a “fatal” flaw for distance-based measures? – more direct comparisons between distance-based and linear model approaches needed • Comparison to other new methods – generalised dissimilarity modelling (Ferrier et al 2007) – gradient forests (Ellis et al 2012) • If distance-based measures are used: – what does Bray-Curtis actually measure ecologically? • What do multivariate models actually predict? Questions for discussion • Should ecologists re-think their use of transformations? – NOT just a multivariate issue! • How do ecologists determine optimum sample sizes for community ecology – power characteristics will vary between taxa in linear models approach – power for distance-based permutation analyses?