Community assembly through trait selection (CATS): Modelling from incomplete information Bill.Shipley@USherbrooke.ca A seminar in three parts The ecological concept The maximum entropy formalism Edwin Jaynes The CATS model Part 1 The ecological concept Trait-based filtering Regional species pool: determined by traits + history Immigration rate: determined by abundance in region + traits abiotic filters (determined by traits) biotic filters (determined by traits) Local community Relative abundance in local community A plant strategy is a suite of physiological, morphological or phenological traits of individuals that affect probabilities of survival, reproduction or immigration and that is systematically associated with particular environmental conditions The most common trait values in a local community will be possessed by those individuals having the greatest probabilites of survival, reproduction and immigration. S relative abundance~ probability of passing filters Philip Grime ∑p t i i = t wm community mean trait value i =1 This average trait value will reflect the selective advantage/disadvantage of this trait in passing through the various abiotic and biotic filters A B C D E species tA tB tC td tE traits of species Two consequences of Grime’s ideas for community assembly… Philip Grime 1. If we know the values of particular abiotic variables determining the filters, then we should be able to predict the « typical » values of the functional traits found in this local community; i.e. community-weighted means. 2. If we know the « typical » values of the functional traits that are found in this community, and we know the actual trait values of the species in a regional species pool, then we should be able to predict which of these species will be dominants, which will be subordinates, and which will be rare or absent. The three basic parts of the CATS model metacommunity metacommunity q : distribution of relative abundance of species in the metacommunity Trait (t) λ: selection on trait in the local community (greater probability of passing if smaller…) p : distribution of relative abundance of species in the local community Local community π‘ = ππ=1 ππ π‘π π‘ = {π‘1 , … , π‘π } Trait (t) Grimes’s plant strategy Measuring abundances Local abundance Abundance of plant species can be measured as - numbers of stems - biomass or indirect measures of this (cover, dbh …) Units of observation are ambiguous - numbers of what? (ramets ≠ genets, biomass units?) Units of observations are never independent Most values will be zero (species present in region but not in local site) Measuring abundances Meta-community abundances Sometimes no information Sometimes vague information (very common, common, rare) Sometimes more quantitative information Part 2 The maximum entropy formalism Edwin Jaynes Jaynes, E.T. 2003. Probability theory. The logic of science. Cambridge U Press. How can we quantify learning? If the sun does set tonight (and our previous information told us it would, p=0.99999) then we will have learned almost nothing new (almost no new information) If the sun doesn’t set tonight (and our previous information told us it would, 1-p=0.00001) then we will have learned something incredible (lots of new information) The amount of learning (new information): Historical log2, we will use loge=ln Claude Shannon 1 I = lπ = −lπ(π) π Average information content (i.e. new information) Specify all of the logically possible states in which some phenomenon can exist (i=1,2,…,S) Based on what we know before observing the actual state, assign values between 0 and 1 to each possible state: p=(p1, p2, …, pS) πΌπ = −ππππ The amount of new information that we will learn if state i occurs is The average amount of new information that we will learn is S I ο½ ο₯ pi I i i ο½1 S S S i ο½1 i ο½1 i ο½1 I ο½ ο₯ pi I i ο½ ο₯ pi ο¨ ο ln ο¨ pi ο© ο© ο½ οο₯ pi ln pi Information entropy Information content and uncertainty New information = information that we don’t yet possess = uncertainty Information entropy measures the amount of new information we will gain once we learn the truth Information entropy therefore measures the amount of information that we don’t yet possess Information entropy is a measure of the degree of uncertainty that we have about the state of a phenomenon Maximum uncertainty = maximum entropy A betting game: game A I bring you, blindfolded, to a field outside Sherbrooke. I tell you that there is a plant at your feet that belongs to either species A or species B. You have assign probabilities to the two possible states (species A or B) and will get that proportion of $1,000,000 once you learn the species name. What is your answer? S I ο½ ο ο₯ pi ln pi ο½ ο(0.5 ln(0.5) ο« 0.5 ln ο¨ 0.5 ο©) ο½ 0.69 i ο½1 A betting game: game B New game with some new clues: (i) The site is a former cultivated field that was abandoned by a farmer last year. (ii) Species A is an annual herb. Species B is a climax tree. You have assign probabilities to the two possible states and will get that proportion of $1,000,000. What is your answer? S I ο½ ο ο₯ pi ln pi ο½ ο(0.9 ln(0.9) ο« 0.1ln ο¨ 0.1ο©) ο½ 0.33 i ο½1 If you changed your bet in this second game then these new clues provided you with some new, but incomplete, information before you learned the answer. 0.6 0.7 Maximum entropy (maximum uncertainty) at p=(0.5,0.5) 0.4 0.3 0.2 Amount of remaining uncertianty, given the ecological clues 0.1 entropy 0.5 Amount of information contained in the ecological clues 0.0 0.2 0.4 0.6 0.8 1.0 p(species A) p(annual herb)=0.9 p(climax tree)=0.1) Relative entropy (Kullback-Leibler divergence) − ππ ln ππ ππ Relative to a “prior” or reference distribution (q) Let qi=1/S and S be the fixed number of unordered, discrete states; q is a uniform distribution − ππ ln ππ 1 π =− ππ lnππ + lnπ = − ππ lnππ + constant Information entropy α relative entropy given a uniform distribution Maximizing relative entropy = maximizing entropy Jaynes, E.T. 2003. Probability theory. The logic of science. Cambridge U Press. Edwin Jaynes The maximum entropy formalism in an ecological context: A three-step program The maximum entropy formalism Edwin Jaynes in an ecological context Jaynes, E.T. 2003. Probability theory. The logic of science. Cambridge U Press. Step 1: specifying the relative abundances in the regional metacommunity (the prior distribution) Specify a reference (q, prior) distribution that encodes what you know about the relative abundances of each species in the species pool before obtaining any information about the local community. “All I know is that there are S species in the meta-community” “All I know is that there are S species in the meta-community and I know their relative abundances in this meta-community” 1 ππ = π qi The maximum entropy formalism Edwin Jaynes in an ecological context Jaynes, E.T. 2003. Probability theory. The logic of science. Cambridge U Press. Step 2: quantifying what we know about trait-based community assembly in the local community π‘1 = 23.3 π‘2 = 147.1 ππ‘π. “I have measured the average value of my functional traits (community-weighted traits values)” y 3 2 1 “ I have measured the environmental conditions, and know the function linking these environmental conditions to the community-weighted trait values” 4 5 6 y=2.5+0.23x 2 4 6 x 8 10 The maximum entropy formalism Edwin Jaynes in an ecological context Jaynes, E.T. 2003. Probability theory. The logic of science. Cambridge U Press. Step 3: choose a probability distribution which agrees with what we know, but doesn’t include any more information (i.e. don’t lie). π π, Choose values of p that agree with what we know: ππ π=1 π π‘1 = ππ π‘π1 = 23.3 π=1 π π‘2 = ππ π‘21 = 147.1 π=1 And that maximizes the remaining uncertainty − ππ ln ππ ππ ππ‘π. The maximum entropy formalism Edwin Jaynes in an ecological context Jaynes, E.T. 2003. Probability theory. The logic of science. Cambridge U Press. The solution is a general exponential distribution qi: Prior distribution of species i ππ = ππ π π − π=1 π ππ π − tij: Trait value of trait j of species i ππ π‘ππ π π=1 ππ π‘ππ π=1 λj:The amount by which a one-unit increase in trait j will result in a proportional change in relative abundance (pi) The maximum entropy formalism Edwin Jaynes in an ecological context Jaynes, E.T. 2003. Probability theory. The logic of science. Cambridge U Press. Practical considerations Except for maxent models with only a few constraints, we need numerical methods in order to fit them to data. I use a proportionality between the maximum likelihood of a multinomial distribution and the λ values of the solution to the maxent problem (Improved Iterative Scaling), available in the maxent() function of the FD library in R. Della Pietra, S. Della Pietra, V., Lafferty, J. 1997. Inducing features of random fields. IEEE Transactions Patern Analysis and Machine Intelligence 19:1-13 Laliberté, E., Shipley, B. 2009. Measuring functional diversity from multiple traits, and other tools for functional ecology R. package, Vienna, Austria By permuting trait vectors relative to species’ observed relative abundances, one can develop permutation tests of significance concerning model fit Shipley, B. 2010. Ecology 91:2794-2805 CATS There are now many empirical applications of this model in many places around the world, and applied at many geographical scales. Two examples… SHIPLEY et al. 2011. A strong test of a maximum entropy model of trait-based community assembly. Ecology 92: 507–517. Daniel Laughlin’s PhD thesis 96 1m2 quadrats containing the understory herbaceous plants of ponderosa pine forests (Arizona USA). Quadrats were distributed across seven permanent sites within a 120 km2 landscape between 2000-2500 m altitude. Available information 12 environmental variables measured in each quadrat 20 functional traits measured per species Relative abundance of each species in each quadrat If species is present, and not rare (>10%), CATS predicts its abundance well After ~7 traits, mostly redundant information 1e+00 1e-01 1e-02 Predicted relative abundance 0.6 0.4 0.0 1e-04 0.2 Predicted relative abundance 0.0 0.2 0.4 0.6 0.8 1.0 Observed relative abundance Significant predictive ability by 3 traits r= 0.58 1e-03 r= 0.97 0.8 1.0 If species is present, but rare (<10%), CATS can’t distinguish degrees of rarity 1e-04 1e-02 1e+00 Observed relative abundance If species is absent (X) then CATS will predict it to be rare If we only know the environmental conditions of a site, and the general relationship between community-weighted traits and the environment, how well can CATS do? Actual measured community-weighted value for this site = 9.5 6 4 2 CWM trait 8 Predicted value for community-weighted trait, given that we know the environmental value is 7 = 7.61 General relationship: Y=2.5+0.73X 2 4 6 environment X 8 10 The best possible prediction given the environment would be obtained with 79 separate generalized additive (form-free) regressions – one for each species in the species pool – of relative abundance vs. the environmental variables. 20 15 10 5 Abundance of species S 25 gam(S~X) 0 2 4 6 environment X 8 10 1.0 Explained variance, 3 types of model R 2 0.6 0.8 Maxent using observed CWM 0.4 Using 79 gam models~environment 0.2 Maxent using predicted CWM~environment 0.0 significant at p<1/1000 0 5 10 Number of traits in model 15 20 Second example: tropical forests in French Guiana Shipley, Paine & Baraloto. 2012. Quantifying the importance of local niche-based and stochastic processes to tropical tree community assembly. . Ecology 93: 760-769 The unified neutral theory of biodiversity and biogeography The per capita probabilities of immigration from the meta-community, and the per capita probabilities of survival and reproduction of all species are equal (demographic neutrality). Subsequent population dynamics in the local community are determined purely by random drift. Stephen Hubbell Local relative abundance ~meta-community relative abundance + drift « Neutral prior » = meta-community relative abundances 1. Fit model using traits but a uniform prior (i.e. no effect of meta-community immigration) 2. Fit model permuting traits but with a neutral prior (i.e. no effects of local trait-based selection, but contribution from immigration) 3. Fit model with traits and with neutral prior(i.e. locat trait-based selection plus immigration from meta-community Partition the total variance explained into: 1. 2. 3. 4. That due only to immigration That due only to local trait-based selection That due jointly to immigration and traits (correlations with meta-community Unexplained variation due to demographic stochasticity We did this at three spatial scales: 1, 0.25 and 0.01 ha. Figure 1 Traits Dispersal Traits x Dispersal Demographic stochasticity a) Site-level trait means 0.6 0.5 0.4 Trait-based selection Proportion of the total information explained 0.3 Demographic stochasticity Dispersal limitation from metacommunity 0.2 0.1 0.0 0.05 0.10 0.20 0.50 1.00 0.20 0.50 1.00 b) Metacommunity trait means 1.0 0.8 0.6 0.4 0.2 0.0 0.05 0.10 Spatial scale (ha) Practical considerations. How do we fit this model to empirical data? A trick involving likelihood A total of A independent allocations of individuals or units of biomass into each of the S species in the species pool the number of independent allocations to species i P ο¨ a; A, p ο© ο½ A! S ο ai ! S ai p ο i ο¨ λ, Ti , qi ο© i ο½1 i ο½1 The probability of a single allocation going to species i (i.e. the probability of sufficient resources being captured to produce one individual or unit of biomass for species i); a function of its traits (Ti), the strength of the trait on selection (λ), and its meta-community abundance (qi) The likelihood: L (p; a, A) ο½ S A! S οa ! i ο½1 S ai p ( λ , T , q ) ο½ C p ο ο ik (λ, Ti , qi ) i i i ο½1 ai i i ο½1 i In practice we can never know this, since neither individuals or resources (biomass) are ever allocated independently… Taking logarithms, dividing both sides by A, and re-arranging, one obtains ln ο¨ L ο¨ p; a, Aο© ο© ο ln(C ) A S S ai ο½ ο₯ ln ο¨ pi ο¨ λ, Ti , qi ο© ο© ο½ ο₯ oi ln ο¨ pi ο¨ λ, Ti , qi ο© ο© i ο½1 A i ο½1 S ln ο¨ L ο¨ p; a, Aο© ο© ~ ο₯ oi ln ο¨ pi ο¨ λ, Ti , qi ο© ο© i ο½1 Maximising this… Will maximise the likelihood of the unknown multinomial distribution k But we already know that pi ο¨ ο¬ , Ti , qi ο© ο½ qi e ο ο₯ ο¬ j tij j ο½1 k S ο₯ qie ο ο₯ ο¬ j tij j ο½1 i ο½1 ο¦ S ο¦ k οΆοΆ S ln ο¨ L ο¨ p | a, A ο© ο© ~ οο₯ο₯ oi ο¬ j tij ο ο₯ oi ln ο§ ο₯ exp ο§ οο₯ ο¬ j tij ο· ο· ο« ο₯ oi ln ο¨ qi ο© ο§ i ο½1 ο· i ο½1 j ο½1 i ο½1 ο¨ j ο½1 οΈ οΈ i ο½1 ο¨ S k S Choose values of λ that maximise this function, given the vector of traits (ti) for each species in the species pool and their meta-community abundances (qi): ο¦ S ο¦ k οΆοΆ S ln ο¨ L ο¨ p | a, A ο© ο© ~ οο₯ο₯ oi ο¬ j tij ο ο₯ oi ln ο§ ο₯ exp ο§ οο₯ ο¬ j tij ο· ο· ο« ο₯ oi ln ο¨ qi ο© ο§ i ο½1 ο· i ο½1 j ο½1 i ο½1 ο¨ j ο½1 οΈ οΈ i ο½1 ο¨ S k S The dual solution (Della Pietra et al. 1997): the values of λ that maximise the relative entropy in the maximum entropy formalism are the same as the values that maximise the likelihood of a multinomial distribution. Della Pietra, S. Della Pietra, V., Lafferty, J. 1997. Inducing features of random fields. IEEE Transactions Patern Analysis and Machine Intelligence 19:1-13. Improved Iterative Scaling algorithm ο maxent() in the FD package of R. Laliberté, E., Shipley, B. 2009. Measuring functional diversity from multiple traits, and other tools for functional ecology R. package, Vienna, Austria Permutation tests for the CATS model π π, Choose values of p that agree with what we know: And that maximizes the remaining uncertainty − ππ ln ππ ππ ππ π=1 π π‘j = ππ π‘πj = π=1 π ππ π‘πj π=1 H0: trait values (tij) are independent of the observed relative abundances (oi) Permutation tests for the CATS model S 1. Calculate f ο¨ o,p,q ο© ο½ ο₯ oi ln ο¨ pi ο¨ ο¬ , Ti , q ο© qi ο© i ο½1 The smaller the value, the better the fit 2. Randomly permute the vector of trait values (T*i) between the species (so traits are independent of observed relative abundances) ο¨ o,p,q ο© ο½ ο₯ oi ln ο¨ pi ο¨ ο¬ , Ti* , q ο© S 3. Calculate f * i ο½1 qi ο© 4. Repeat steps 2 & 3 a large number of times. 5. Count the number of times f*> f. This is an estimate of the null probability. Shipley, B. 2010. Ecology 91:2794-2805 maxent.test() function in the FD library.