ele12048-sup-0001-Supplementary-information

advertisement
J. Chave – Pattern and scale in ecology
p. 1
1
SUPPLEMENTARY INFORMATION
2
For “The problem of pattern and scale in ecology: what have we learned in 20 years?”
3
Jérôme Chave
4
5
6
SI1. Measuring the modularity of a network
The search for modules in complex systems has often been equated to that of
7
‘communities’, group of actors who interact more than expected by chance (Freemann 1977).
8
Although theory has a long history (de Solla Price 1965, Freemann 1977), many methods to
9
measure patterns of modularity have been developed in the wake of internet social networks
10
11
in the early 2000s.
Modularity may be given a formal definition, as follows. In any network, biological or
12
not, a module is defined as a relatively autonomous subset of components (nodes), such that
13
any two components are highly connected, but such that a component within the module is
14
loosely connected with any component outside the module. One measure of modularity,
15
denoted by Q, measures the excess of links within modules in relative to the number expected
16
by chance, summed over all modules (Girvan & Newman 2002). We study a graph composed
17
of N nodes which are related by K undirected links. If we assign each node to one of M
18
predefined communities, we may define the fraction 𝐸𝑖𝑗 of links connecting nodes in
19
𝑀
𝑀
community i to nodes in community j (such that ∑𝑀
𝑖=1 ∑𝑗=1 𝐸𝑖𝑗 = 1). Then, π‘Žπ‘– = ∑𝑗=1 𝐸𝑖𝑗 is
20
the fraction of links joining at least one node in community i. If the links are randomly placed
21
between nodes, irrespective of the community, then 𝐸𝑖𝑗 = π‘Žπ‘– π‘Žπ‘— . Modularity then is 𝑄 =
22
2
∑𝑀
𝑖=1(𝐸𝑖𝑖 − π‘Žπ‘– ).
23
24
The concept of betweenness centrality is another useful to quantify the notion of
modularity (Freeman 1977, Girvan & Newman 2002). It measures how important an edge is
J. Chave – Pattern and scale in ecology
p. 2
25
as a connection route between a priori defined modules. Formally, let πœŽπ‘–π‘— be the number of
26
shortest path between node i and j, with πœŽπ‘–π‘— ≥ 1. We focus on the node k, and ask how many
27
of these shortest paths connecting i and j go through an edge e. Let πœŽπ‘–π‘—π‘’ be the number of these
28
paths connecting {𝑖, 𝑗} and going through edge e. The betweenness of edge e, 𝐡𝑒 , is defined as
29
the sum, over all pairs {𝑖, 𝑗}, of the ratio πœŽπ‘–π‘—π‘’ /πœŽπ‘–π‘— (Girvan & Newman 2002, Newman & Girvan
30
2002). Ma & Zeng (2003) used such an approach to understand the design principles and the
31
patterns of organization in large metabolic networks, by unravelling the most central
32
metabolites of the networks based on its topology. Salathé & Jones (2010) investigated the
33
spread of disease in networks with community structure, and found that community structure
34
has a major impact on disease dynamics. Using the ideas of betweenness centrality, they
35
found that in networks with strong community structure, immunization interventions targeted
36
at individuals bridging communities are more effective than those simply targeting highly
37
connected individuals.
38
With the rapid development of high-throughput DNA sequencing technologies, it is
39
becoming obvious that species need to be defined statistically. In the past, microbiologists
40
have used 16S prokaryotic ribosomal DNA sequences to infer the similarity among microbial
41
strains, and this has resulted in astounding discoveries, including altogether new lineages
42
(Pace 1997). Early such example is the discovery of the cyanobacteria Prochloroccocus
43
marinus in the late 1980s, and that of picoplankton Ostreococcus tauri in 1994. Traditionally,
44
techniques of species discovery have been predicated on the assumption that 16S DNA
45
sequences with 97% similarity or more delimit species. However, the problem of sequence
46
clustering is often far more complex than a simple 97% threshold (Meyer & Paulay 2005), as
47
the rapid development of DNA-based identification of eukaryotic organisms, the large
48
programs of DNA barcoding for a wide range of clades, have offered plenty of opportunities
49
to verify (Moritz & Cicero 2004, Knowles & Carsten 2007).
J. Chave – Pattern and scale in ecology
p. 3
50
Instead, techniques for modularity discovery should be implemented, and these are not
51
unrelated to the multivariate analyses invented by Fisher (1936) to analyse Anderson’s (1936)
52
Iris flower dataset. One intriguing such technique to cluster data into natural groups (modules,
53
operational taxonomic units, or others) is that developed based on the physics of diluted
54
magnets (Wisemann et al. 1998). Another application has made use of the so-called Markov
55
clustering (Enright et al. 2002), used for defining molecular taxonomic units on fungi using
56
sequence data (Zinger et al. 2009). With the explosion of environmental genomics
57
approaches, there is no doubt that these sequence clustering techniques will become crucial to
58
avoid statistical artefacts in the process of diversity discovery and community detection.
59
60
References
61
Anderson, E. (1936). The species problem in Iris. Ann. Miss. Bot. Garden, 23, 457-509.
62
de Solla Price, D.J. (1965). Networks of scientific papers. Science, 169, 510-515.
63
Enright, A. J., van Dongen, S. & Ouzounis, C.A. (2002). An efficient algorithm for large-
64
65
66
scale detection of protein families. Nucleic Acids Res., 30, 1575-1584.
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Ann.
Eugenics, 7, 179-188.
67
Freeman, L.C. (1977). A measure of centrality based on betweenness. Sociometry, 40, 35-41.
68
Girvan, M. & Newman, M.E.J. (2002). Community structure in social and biological
69
70
71
72
73
networks. Proc. Natl. Acad. Sci. USA, 99, 7821–7826
Ma, H.-W. & Zeng, A.-P. (2003). The connectivity structure, giant strong component and
centrality of metabolic networks. Bioinformatics, 19, 1423-1430.
Meyer, C.P. & Paulay, G. (2005). DNA barcoding: error rates based on comprehensive
sampling. PLoS Biol., 3, e422.
J. Chave – Pattern and scale in ecology
74
75
76
77
78
79
80
81
82
p. 4
Newman, M.E.J. & Girvan, M. (2004). Finding and evaluating community structure in
networks. Phys. Rev. E, 69, 026113.
Pace, N.R. (1997). A molecular view of microbial diversity and the biosphere. Science, 276,
734-740.
Salathé, M. & Jones, J.H. (2010). Dynamics and control of diseases in networks with
community structure. PLoS Comp. Biol., 6, e1000736.
Wiseman, S., Blatt, M. & Domany, E. (1998). Superparamagnetic clustering of data. Phys.
Rev. E, 57, 3767–3783
Zinger, L., Coissac, E., Choler, P. & Geremia, R.A. (2009). Assessment of microbial
83
communities by graph partitioning in a study of soil fungi in two alpine meadows. Appl.
84
Environm. Microbiol., 75, 5863-5870.
85
86
J. Chave – Pattern and scale in ecology
87
p. 5
SI2 Coarsening dynamics in discrete spatial models.
88
Durrett & Levin (1994) famously illustrated the importance of being spatial and
89
discrete by modelling three classical biological problems using four models: individual-based
90
versus mean-field models, and spatial versus non-spatial models. Here I offer a simplified
91
version alng the same lines of reasoning with spatial models which both exhibit domain
92
growth dynamics. The spatial dynamics of these models is illustrated in Fig. 7 of the main
93
text.
94
The voter model is the first model. It is usually defined using electors regularly placed
95
on a grid and who may take one of two possible opinions (Clifford & Sudbury 1973), but here
96
I translate the same model in terms of a species coexistence model. In a square lattice, every
97
site is occupied with an individual of two possible species. The total number of individuals is
98
N. The dynamics follows how each site changes its species occupancy as a result of local
99
interactions. During an infinitesimal time step, one randomly chosen individual dies, and it is
100
replaced by the offspring of a randomly chosen neighbour (of four possible neighbours). A
101
macroscopic time step consists of N such draws (so that each individual is chosen once on
102
average). As time t goes by, clusters occupied by the same species grow in size, and the
103
number of neighbouring pairs with different species – henceforth, zones of tension – declines
104
proportionally to 1/ln⁑(𝑑) (Fig 7), as may be shown based on the duality of the voter model
105
with a system of coalescing random walks. Now assume that, with some probability π‘š > 0,
106
some vacated sites can be occupied by offspring produced anywhere in the lattice. Adding
107
even a small amount of long-distance dispersal, the domain growth depicted in Fig 7 is
108
destroyed, and the model no more shows any coarsening.
109
A second toy model is the majority model (known as the zero-temperature Ising model
110
in physics, Krapivski et al. 2010). It is defined similarly that the voter model, but the local
111
update rules are slightly different: all four neighbours send an offspring into the vacated cell,
J. Chave – Pattern and scale in ecology
p. 6
112
and the majority always wins. In case of a tie, the winner is chosen at random. In this model,
113
isolated clusters are at a disadvantage and the number of zones of tension declines much faster
114
than in the voter model, as 1/√𝑑. Importantly, the domain growth behaviour of this model is
115
not altered by the addition of a probability π‘š > 0 of long-distance dispersal (up to a point
116
π‘š = π‘šπ‘ where a critical transition occurs).
117
These two models are lattice-based, and they could be approximated by non-spatial
118
models such as in the example offered by Durrett & Levin (1994) or following methods
119
developed in Bolker & Pacala (1997). In fact, for these particular cases, much is known about
120
the macroscopic behaviour of the models (for overviews, see Hinrichsen 2000 & Krapivski et
121
al. 2010).
122
123
124
Bolker, B. & Pacala S.W. (1997). Using moment equations to understand stochastically
driven spatial pattern formation in ecological systems. Theor. Pop. Biol., 52, 179-197.
125
Clifford, P. & Sudbury, A.W. (1973). A model for spatial conflict. Biometrika, 60, 581-588.
126
Durrett, R. & Levin, S.A. (1994). The importance of being discrete (and spatial). Theor. Pop,
127
128
129
130
131
132
Biol., 46, 363-394.
Hinrichsen, H. (2000). Non-equilibrium critical phenomena and phase transitions into
absorbing states. Adv. Phy., 49, 815-958
Krapivski, P.R., Redner, S. & Ben-Naim, E. (2010). A Kinetic View of Statistical Physics.
Cambridge University Press, 2010.
Download