ele12489-sup-0001-supinfo

advertisement
Integrating Macroecological Metrics and Community Taxonomic Structure
John Harte, Andrew Rominger, Wenyu Zhang
SUPPLEMENTARY MATERIAL
Here we fill in mathematical details of how the results in the third column of Table 1 are derived
from the definitions in Table 2. Then we briefly discuss two possible extensions of the theory
presented here. For completeness, however, and at the risk of repetition, we first summarize the
ideas that lead to the constraints and defining equations for the metrics in Table 2.
Constraints and Defining Equations
The Extended Ecological Structure Function. An “ecological structure function”, denoted
R(n, ε|S0, N0, E0), is the core of ASNE (Harte et al., 2008). R is a joint conditional distribution
over abundance (n) and metabolic rate (ε) defined so that R·dε is the probability that if a species
is picked at random from the species pool, then it has abundance n, and if an individual is picked
at random from that species, then its metabolic energy requirement is in the interval (ε, ε + dε).
To construct AGSNE, we augment the list of state variables, A0, S0, N0, E0, by adding G0, the
number of genera in area A0. In analogy to R, a new joint, conditional probability distribution,
Q(m,n,ε|G0, S0, N0, E0), can be defined by:
Pick a genus at random from the pool of genera; then Qdε is the probability it has m species
and, if you pick one of those species, that it has n individuals, and that if you pick one of
those individuals from that species, that it has metabolic rate in the interval (ε, ε + dε).
Note that Q, the ecological structure function in the extended theory, is a function of the discrete
variables, m and n, and a continuous variable ε. For notational simplicity we will use the term
‘distribution’ regardless of whether the independent variable is discrete or continuous, with the
understanding that in the latter case a probability density function is intended. Also, for
notational convenience, in what follows we replace integrals over ε with sums over discrete
values of ε, but in the actual calculations we use integrals over continuous values of ε. As in
ASNE, we define the unit of energy such that ε = 1 is the lowest metabolic rate among the N0
individuals. We leave the limits off of summations, which are understood to range from 1 to S0,
N0, E0 for m, n, and ε respectively. Finally we do not show the state variables on which
distributions are contingent unless they are needed for clarity.
The Constraints. The constraints on Q follow immediately from the definitions of the state
variables and of Q and are listed in Table 2.
 m 
S0
  mQ(m, n,  )
G0 m, n,
(S-1)
 nG 
N0
  mnQ(m, n,  )
G0 m , n ,
(S-2)
  G 
E0
  mnQ(m, n,  )
G0 m , n ,
(S-3)
Here < > indicates expectation value, <m> is the average number of species per genus, <nG> is
the average number of individuals per genus and εG is the average metabolic rate per genus. In
similar notation, <nS> = N0/S0 = (G0/S0) <nG>, and <ε> = E0/N0 = (G0/N0)<εG>.
The Metrics. The MaxEnt solution for the AGSNE ecological structure function, Q, is readily
found using the same methods as in Harte et al. (2008) and explained in more detail in Harte
(2011):
Q(m, n,  ) 
1
e 1m e 2mne 3mn .
Z (1 , 2 , 3 )
(S-4)
Z, a normalizing factor, is evaluated from
Z (1 , 2 , 3 ) 
 e 
 1m   2 mn  3 mn
e
e
(S-5)
m, n,
Macroecological metrics extended to higher taxa describe the probability distributions of
metabolic energy rates, abundances, and species richness. Each metric is obtained directly from
Q either as a marginal distribution, or a conditional distribution obtained as the ratio of other
metrics in conformity with the identity: P(x|y) = P(x,y)/P(y).
The distribution of species richness, m, over genera, Γ(m), is given by:
Γ (m)   Q(m, n,  ) .
(S-6)
n ,
The distribution of abundances over species, 𝜑(𝑛), is given by:
j (n) =
å mQ(m, n, e )
m,e
å mQ(m, n, e )
=
G0
å mQ(m, n, e )
S0 m,e
(S-7)
m,n,e
where the second equality follows from Eq. 1.
The distribution of abundances over species belonging to a genus with m species, φ(n|m), is
given by:
 Q(m, n,  )
 ( n | m) 
.
Q(m, n,  )


(S-8)
,n
Using Eq. S-6, this can be re-expressed as:
 ( n | m) 
Q ( m, n ,  )


Γ ( m)
(S-9)
For later use we note that the average abundance of a species in any genus with m species is
given by
 n | m 
 nQ(m, n,  )
n,
Γ (m)
(S-10)
The distribution of metabolic rates over all individuals, Ψ(ε), is given by:
y (e ) =
å mnQ(m, n, e )
m,n
å mnQ(m, n, e )
=
G0
å mnQ(m, n, e )
N 0 m,n
(S-11)
m,n,e
where the second equality follows from Eq. S-2.
The distribution of metabolic rates over all individuals in a species with abundance n that is in a
genus with m species is given by:
Θ( | m, n) 
Q(m, n,  )
 Q(m, n,  )

Using Eq. S-9, this can be re-expressed as
(S-12)
Θ( | m, n)  Q(m, n,  ) /[ Γ (m)   (n | m)]
(S-13)
The ratios of state variables that appear in front of some of the summation signs above, and the
factors of m or mn that appear in the summands, arise from the way Q is defined; derivations
follow closely the example given in Box 7.2 of Harte (2011).
The last metric we introduce is an expression for the distribution of metabolic rates, 𝜀, over the
individual in a species selected at random from the pool of all species in genera with m species:
𝜉(𝜀|𝑚) = ∑𝑛 𝛩(𝜀|𝑚, 𝑛)𝜑(𝑛|𝑚).
(S-14)
The λi are Lagrange multipliers that are determined numerically from the values of the state
variables.
Deriving the Lagrange multipliers and the results in column 3 of Table 1
Equations (S-1 - S-14) are exact and can be solved numerically, but with some approximations
that are justified for most data sets we have examined, we can derive useful closed-form
expressions for the metrics from them. Upon inspection, these expressions permit an
understanding of the nature of the predicted patterns. The first assumption is that the number of
genera in the data set is sufficiently large so that terms of order exp(-G0) can be ignored
compared to terms of order 1. In practice, G0 > 5 (exp(-G0) < 0.01) is adequate. With that
assumption alone, most of the equations can be solved analytically but some of the resulting
expressions for the metrics are quite complicated. Further simplification occurs if E0 >> N0 >>
S0 >> G0 and N0 >> S0G0, where in practice the double inequality means a factor of at least five
difference between left and right hand sides. In the results shown below, we use an equal sign (=)
to indicate that only the first assumption (large enough G0) is needed, and an inexact equality
sign () if the inequalities above are assumed as well. In every census data set we use here for
theory testing these assumptions are justified, and in most data sets we have encountered they are
also satisfied.
The mathematical steps that we do not describe here are tedious but straightforward, involving
nothing more than summations, integrations, and Taylor series expansions. Throughout, we
make frequent use of:
∑𝑀
𝑚=1
𝑒 −𝜆𝑚
𝑚
1
1
= ln(1−𝑒 −𝜆 ) ≈ ln(𝜆)
(S-15)
where the equality is strictly correct only as exp(-M)  0 and the approximate equality is valid
if  << 1. In our use of Eq. S-15,  will be of order G0/S0 and M will equal S0 and so exp(-M) is
of order exp(-G0), which we are assuming is very small.
The summation and integration over n and ε in Eq. S-5 are readily carried out and result in:
e  m ( 1n )
1
e  1m
1
Z (1 , 2 , 3 )  
 
ln(
)
3 m,n mn
3 m m
1  e  m
1
(S-16)
where β ≡ λ2 + λ3.
To obtain expressions for the Lagrange multipliers as functions of the state variables, we have to
evaluate Eqs. S-1 – S-3. From Eq. S-1:
 m 
S0
1
1

e 1m ln(
)

G0 Z3 m
1  e  m
(S-17)
while, from Eq. S-2:
 nG 
N0
e  m ( 1  n )
1
e  1me  m



G0 m, n Z3
Z3 m 1  e  m
From Eq.S-3, and doing the integral over ε, we have:
(S-18)
  G 
E0
e  m ( 1  n )
1
1
1 N

( 2
 )  0
G0 m, n
Z
3 mn 3 3 G0
(S-19)
The terms 1/λ3 and N0/G0 in the final equality arise from use of Eqs.S-16 and S-18, respectively.
Hence, the third Lagrange multiplier is given by
λ3 = G0/(E0 – N0).
(S-20)
To determine without further approximation the values of λ1 and β, Eqs. S-17 – S-19 have to be
evaluated numerically. If N0 >> S0 >> G0 >> 1, and in addition, N0 >> S0G0, then the
summation in Eq. S-16 can be approximated as:
Z3  ln(
1

) ln(
1
1
),
(S-21)
Under the same assumptions, Eq. S-17 simplifies to:
ln(
1
)
S0
1

 m 


,
G0 Z31  ln( 1 )
1
(S-22)
1
and Eq, S-18 simplifies to:
 nG 
N0
1

G0  ln( 1 )

(S-23)
We note that from Eq. S-22, λ1 < G0/S0 << 1, from Eq. S-23, β < G0/N0 << 1, and because N0>>
S0 by assumption, we have β << λ1. We also note that βm will be < 1 if S0G0 < N0 because m < S.
We can use the more exact Eqs. S-16 – S-18 when testing theory, but Eqs S-21 – S-23 provide a
way to obtain analytically initial guesses for λ1, β, and Z for numerical evaluation and also can be
used to provide initial insight into the behavior of the metrics derived below.
Next, we evaluate the metrics Γ, 𝜑, and Ψ. The sum and integral over n and ε in the derivation
of Γ(m) are straightforward, resulting in:
Γ ( m) 
1
)
1  e  m
Z3m
e  1m ln(
(S-24)
where we have assumed that βmN0 >> 1, which will always be true if G0 >> 1. If βm << 1,
which is the case if N0 >> G0S0, then Eq. S-21 is valid, and Eq. S-24 simplifies to:
e  1m
Γ ( m) 
1
m ln( )
(S-25)
1
The difference between the m-dependence of the exact (Eq. S-24) and approximate (Eq.S-25)
distributions is that the former falls slightly less rapidly with increasing m because ln(1/(1-e-βm))
is a slowly decreasing function of m. Numerical comparisons of Eqs. S-24 and S-25 for realistic
combinations of the state variables indicate that they differ by at most a few percent.
Turning to the species abundance distribution, j (n) , the sum over m and integral over ε in Eq. S7 result in:
j (n) =
G0 e-[ l1+b n]
S0 Z l3n ×[1- exp(-(l1 + b n))]
If the approximations leading to Eqs. S-21 - S-23 are valid, Eq. S-26 simplifies to:
(S-26)
j (n) »
l1e-( l +b n)
1
(S-27)
1
n ln( )(1- e-( l1+b n) )
b
For small n such that βn << λ1 << 1, which is roughly equivalent to n << N0/S0, Eq. S-27 can be
approximated by:
j (n) »
e- b n
1
n ln( )
(S-28)
b
Numerical evaluation of Eqs. 26 and 28 for realistic choices of state variables, show that the two
expressions differ by no more than 3 or 4% over the range of values of n. For much larger n,
such that βn >> λ1, or N0/G0 >> n >> N0/S0, Eq. S-27 is approximately:
j (n) »
l1e- b n
1
n ln( )(1- e- b n )
»
b
N 0 l1e- b n
G0 n 2
(S-29)
where the second approximate equality holds if βn << 1. In general, however, for large n the
expression in Eq. S-27 is needed because for n > N0/G0, βn is not small.
The distribution of abundances over the set of species in all the genera with m species, φ(n|m),
can also be evaluated from Eq. S-21, giving:
𝜑(𝑛|𝑚) =
𝑒 −𝛽𝑚𝑛
𝑛ln(
1
)
1−𝑒−𝛽𝑚
≈
𝑒 −𝛽∙𝑚𝑛
𝑛ln(
1
)
𝛽𝑚
(S-30)
From Eq. S-30, the average abundance of the species in all the genera with m species is given by
 n | m   n
n
e  mn
e  m

1
1
n ln(
) (1  e  m ) ln(
)
 m
1 e
1  e  m
(S-31)
If βm << 1, which will be the case for all m if G0S0 << N0, or at least for small m if G0 << N0,
then Eq. S-31 can be approximated by
< n | m >»
e- b m
b m ln(
1
)
bm
»
𝑁0
(S-32)
𝐺0 𝑚
where we have used Eq. S-23 and the inequality βm << 1 in the approximations in Eq. S=32.
To derive the variance of n at a given value of m, we need to calculate
< n 2 | m >= å n
n
2
e- b mn
1
n ln(
)
1- e- b m
(S-33)
which in the same approximation as above leads to
variance(n|m) = <n |m> - <n|m> ≈
2
2
1
𝛽
𝐺02 𝑚2
𝑁02 ln( )
(S-34)
From Eq. S-30 we can also derive an approximate value for the expected abundance of the
species, in a genus with m species, with the maximum abundance. To do this we first note that
𝑚𝑎𝑥
∑𝑛𝑛=1
𝜑(𝑛|𝑚) = 1 −
1
2𝑚
(S-35)
Here we have used the fact that on a rank-abundance curve, each of the m species has on average
a share of 1/m of total probability and thus the sum from nmax to N0 is 1/(2m). To approximate
the sum in Eq. S-35, we can no longer use Eq. S-15 because we cannot assume that
exp(-βmnmax) 0. But we also cannot assume that βmnmax ≈ 0, in which case the summation
would yield a logarithm of nmax. Instead, we use a relatively accurate numerically-derived
approximation that is discussed in more detail in Harte (2011, Box C.1) and is valid if
βmnmax is of order 1:
𝑚𝑎𝑥
∑𝑛𝑛=1
𝑒 −𝛽𝑚𝑛
𝑛
𝑛1.55
𝑚𝑎𝑥
≈ 0.643 ln(0.408+(𝛽𝑚𝑛
𝑚𝑎𝑥 )
1.55
)
(S-36)
Combining Eqs. S-30, S-35, and S-36, we arrive at
𝑛𝑚𝑎𝑥 ≈
0.56
(S-37)
𝛽𝑚(1−𝛽𝑚)0.643
Because 𝛽𝑚 <<1, the m-dependence of this expression is ~ 1/m.
Turning to the distribution of metabolic rates, Ψ(ε), we simplify the notation by defining
γ(ε) ≡ λ2 +ελ3
(S-38)
The summation over n in Eq. S-11 can be carried out, giving
Ψ ( ) 
G0
N0 Z
me 1m (e  ( ) m  e  ( ) mN0  (1  e  ( ) m ) N 0e  ( ) mN0 )

(1  e  ( ) m ) 2
m1
S
(S-39)
The terms with exp(-mγ(ε)N0) can be neglected if λ2N0 >> 1, which is true if G0 >> 1. In that
case, Eq. S-39 simplifies to
Ψ ( ) 
G0
N0Z
me 1m e  ( ) m

 (  ) m 2
)
m (1  e
(S-40)
Using Eq. S-21, this can be approximated as:
𝛹(𝜀) ≈
𝛽𝜆3
1
ln( )
𝜆1
∑𝑚
𝑚𝑒 −(𝜆1 +𝛾(𝜀))𝑚
(1−𝑒 −𝛾(𝜖)𝑚 )
2
(S-41)
Although the summation in Eq. S-40 or S-41 is not expressible in finite form, if γ(ε)m << 1 over
the range of m-values that contribute significantly to the sum, which roughly speaking will be the
case if ε << E0/S0G0, and β +λ1 << 1, then using Eqs. S-15 and S-22, this simplifies to
Ψ ( ) 
3
 2 ( )
(S-42)
The distribution of metabolic rates over all individuals in a species with n individuals that is in a
genus with m species, as expressed in Eq. S-12, is readily evaluated:
Θ( | m, n)  3mne 3 mn( 1)
(S-43)
Finally, using Eqs. S-30 and S-43 we can evaluate Eq. S-14:
𝜉(𝜖|𝑚) =
𝜆3 ∙𝑚
1
)
1−𝑒−𝛽∙𝑚
ln(
𝑒 −𝑚(𝜆3 (𝜖−1)+𝛽∙𝑚)
∙ 1−𝑒 −𝑚(𝜆3 (𝜖−1)+𝛽∙𝑚)
(S-44)
At large , this can be approximated by:
𝜉(𝜖|𝑚) ≈
𝜆3 𝑚
1
)
𝛽𝑚
ln(
𝑒 −𝜆3 𝑚𝜀
(S-45)
We note that this is an increasing function of m for values of λ3m << 1, which is generally the
case for realistic values of the state variables.
Taking the mean of Eq. S-45, we obtain the approximate result:
< 𝜀|𝑚 >≈
1
(S-46)
1
)
𝛽𝑚
(𝜆3 𝑚)ln(
and for the variance we obtain:
< 𝜀 2 |𝑚 > − < 𝜖|𝑚 >2 ≈
1
1
𝑚2 ln( )𝜆23
𝛽
1
(2 −
1
1
𝛽
)
ln( )
(S-47)
In ASNE, the analog metric to 𝜉(𝜖|𝑚) 𝑖s (), defined in Table 1 in the text, and given by
() = n (|n)(n). We note that this function was incorrectly defined and derived in Harte
(2011) and incorrectly defined and tested in Newman et al. (2015), cited in the reference section
of the text.
Two Possible Extensions
The genera-area relationship (GAR). In the ASNE version of METE, the species-area
relationship (SAR) is derived from the expression:
𝑆(𝐴) = 𝑆0 ∑𝑛0 𝜑(𝑛0 )[1 − 𝛱(0|𝑛0 , 𝐴, 𝐴0 )]
(S-48)
where Π(0|n0, A, A0) is the probability that if a species has n0 individuals in area A0, then it has 0
individuals in area A and S0 is the number of species in area A0. Both functions  and  are
predicted from the theory.
In the extended theory, the genera-area relationship (GAR) can be derived if Π(0|m, n0, A, A0) is
known. A genus will be found in an area A if at least one individual of at least one species in
that genus is found in area A. Hence we expect the GAR to be flatter than the SAR. We can
derive the GAR using the conditional abundance distribution φ(n|m) as a weighting function and
noting that a genus with m species will not be found in area A only if each of the m species in
that genus is not in A. For such a genus, the probability of not occurring in an area A is the
product of probabilities Π(0|m,n,A,A0) that each species in the genus does not occur in the area.
Determining the m-dependence of Π(0) and deriving the explicit functional form of the GAR
awaits further analysis.
Additional taxonomic categories. The methods used above to extend METE by adding an
additional taxonomic category, genus, can be generalized to an arbitrary number of nested
categories. For example, the natural extension of the entities R(n,ε|S0, N0, E0) in ASNE and
Q(m,n,ε|G0, S0, N0, E0) in AGSNE is, for AFGSNE, T(l,m,n,ε | F0, G0, S0, N0, E0) defined by
Pick a family at random from the pool of families; then T dε is the probability it has l
genera, and if you pick one of those genera at random, that it has m species, and if you
pick one of those species at random that it has n individuals, and if you pick one of those
individuals, it has a metabolic rate in the interval (ε, ε+dε).
Paralleling Eq. S-4, the MaxEnt solution is of the form,
𝑇(𝑙, 𝑚, 𝑛, 𝜀) =
𝑒 −𝜆1 𝑙 𝑒 −𝜆2 𝑙𝑚 𝑒 −𝜆3 𝑙𝑚𝑛 𝑒 −𝜆4 𝑙𝑚𝑛𝜀
𝑍(𝜆1 ,𝜆2 ,𝜆3 ,𝜆4 )
(S-49)
Generalizing further, the constraints imposed by an entire taxonomic tree can be included within
the MaxEnt framework. We further note that the constraints could arise from knowledge of
structure of the phylogenetic, rather than the taxonomic, tree. The full analysis of such entire
trees also awaits further analysis.
Download