JEC_1795_sm_AppendixS3

advertisement
Appendix S3 Supplemental methods, results and discussion pertaining to the assembly
of plant communities along an environmental gradient in the Serengeti ecosystem.
Supplementary Methods
Laboratory methods – Accessions of each species were collected from Serengeti and
voucher specimens were deposited at NHT; however, because numerous specimens of
each species were collected as part of the study, the voucher specimen deposited at NHT
may not be the same individual from which DNA was extracted. DNA was extracted
from leaves using the DNeasy Plant Mini Kit (Qiagen, Valencia, California, USA). The
polymerase chain reaction (PCR) was performed using Eppendorf Mastercycler gradient
or Mastercycler personal thermal cyclers in 50 µl volumes with the following reaction
components: 1 µL template DNA (~ 10-100 ng), 1X ExTaq buffer (PanVera / TaKaRa,
Madison, Wisconsin, USA), 200 µmol/L each dNTP, 3.0 mmol/L MgCl2, 0.1 µmol/L
each primer, and 1.25 units ExTaq (PanVera / TaKaRa). Reactions included bovine
serum albumin at a final concentration of 0.2 µg/µL, which is known to improve
amplification from difficult templates.
The PCR parameters for the psbA-trnHGUG intergenic spacer region were preceded by
an initial denaturation phase of 80°C for 5 min followed by 35 cycles of denaturation at
94°C for 30 s, annealing of primers at 50°C for 30 s, and extension of new strands at
72°C for 1 min). Each run also included a final extension at 72°C for 10 min. The
following primers were used for both PCR and sequencing: trnHGUG (CGC GCA TGG
TGG ATT CAC AAT CC) (Tate & Simpson 2003) and psbA (GTT ATG CAT GAA
CGT AAT GCT C) (Sang et al., 1997). Because the average length of this region is
relatively short (~550 bp), only the trnH primer was used in sequencing in most cases.
PCR and external sequencing primers for the rpL16 intron, rpL16F71 (GCT ATG CTT
AGT GTG TGA CTC GTT G) and rpL16R1516 (CCC TTC ATT CTT CCT CTA TGT
TG), are from Shaw et al. (2005). The PCR thermal cycling parameters for this gene
region were preceded by template DNA denaturation at 80°C for 5 min and followed by a
final extension step of 5 min at 65C. The PCR cycling conditions were 30 cycles of
denaturation at 95C for 1 min, primer annealing at 50C for 1 min, followed by a ramp
of 0.3C/sec to 65C, and primer extension at 65C for 4 min.
PCR products were checked on 1% agarose gels before being cleaned with ExoSAPIT (USB, Cleveland, Ohio, USA). In a few species, polyA/T regions prohibited clean
sequencing of the rpL16 intron with the external primers. Additionally, we could not get
the internal sequencing primers of Zhang (2000) to work; therefore, four internal
sequencing primers were created based on the alignment of all of the sequences in a
preliminary data set containing species from all three subfamilies. The sequences of the
internal primers were: rpL16A (ATA GTT GTA GCA ACT GCA), rpL16B (TGC AGT
TGC TAC AAC TAT), rpL16C (GAT TCA AAC CTA ACC ATT), and rpL16D (AAT
GGT TAG GTT TGA ATC). Primers rpL16A and rpL16C both sequence toward
rpL16R1516 and primers rpL16B and rpL16D sequence toward rpL16F71. All DNA
sequencing was performed with the ABI Prism BigDye Terminator Cycle Sequencing
Ready Reaction Kit, v. 3.1 (Perkin-Elmer / Applied Biosystems, Foster City, California,
USA) and electrophoresed and detected on an ABI Prism 3100 automated sequencer
(University of Tennessee Molecular Biology Resource Facility). Sequencher 4.7 (Gene
1
Codes) was used to edit the DNA strands. All sequences listed in Appendix 2 are being
deposited in GenBank (accession numbers are pending).
Data analysis and phylogenetic assessment – Alignment of DNA sequences was initially
performed with ClustalX (Thompson et al. 2001), with subsequent manual adjustment by
eye and cpDNA region concatenation in Mesquite v. 2.6 (Maddison & Maddison, 2009).
Variable positions in the data matrix were double checked against the original
chromatogram files using Sequencher 4.8 (GeneCodes Corp., Ann Arbor, Michigan,
USA) to make sure that all base calls were true at all variable positions. In all cases,
alignment of potentially informative positions was unambiguous. Because indels have
been shown to provide approximately one-third of the potentially informative characters
in a cpDNA data set (Shaw et al. 2005), indels were coded as binary characters except in
the case of one poly-A/T run toward the trnH end of psbA-trnH and six separate polyA/T regions in the rpL16 intron. All of these poly-A/T regions were omitted from the
data set as they may be PCR artifact and not reflective of the phylogenetic history of the
group or because alignment of these regions was ambiguous.
Parsimony and Bayesian analyses were conducted for the same taxon sample
using the concatenated data set. Analysis of phylogenetic relationships was first
conducted using the optimality criterion of maximum parsimony. Searches for mostparsimonious trees were executed in PAUP* v. 4.0 b10 (Swofford 2002) by a heuristic
search with tree bisection-reconnection (TBR) branch swapping and 1000 random
sequence addition replicates with the “collapse zero-length branches” option in effect.
Bootstrap support (Felsenstein 1985) was estimated with 1000 replications of heuristic
search and simple taxon addition with the constraint of ten million rearrangements per
replicate. Both the consistency and retention indices (CI and RI, respectively) were used
to assess the amount of homoplasy present in the data set. The maximum likelihood
model, GTR + I + G was selected by Mr.Modeltest2 (Nylander 2004), which is a
simplified version of MODELTEST 3.06 (Posada & Crandall 1998), using the Akaike
information criterion (AIC). Bayesian analyses were performed with MrBayes 3.1.2
(Huelsenbeck & Ronquist 2001). No a priori assumptions about tree topology were
made. The Monte Carlo Markov Chain (MCMC) process was set to run ten million
generations with two runs of four chains each. Aristida adoensis (subf. Arundinoideae)
was chosen as the outgroup for all analyses because this lineage is a sister to all other
grass species within this study (Matthews et al. 2000; Zhang 2000).
Characterization of the cpDNA data set – For the analyses, sequence data were obtained
from two different noncoding cpDNA regions, the psbA-trnH intergenic spacer and the
rpL16 intron. As expected, given the complete linkage among cpDNA regions, data
derived from different cpDNA regions were congruent and thus were combined into a
single data set. The combined cpDNA data set consisted of 2006 aligned nucleotide
positions (with several large gaps opened to conservatively align regions that were very
divergent or contained repetitive sequence motifs). Within the combined data set there
were 289 parsimony informative characters (including multi-bp indels coded as binary
characters) and 175 parsimony uninformative characters. The aligned data set of psbAtrnH consisted of consisted of 480 bp from the psbA end of the ~550 bp intergenic spacer
region. Within the aligned matrix of this region there were 54 parsimoniously
2
informative characters (including multi-bp indels coded as binary characters), 28 variable
but parsimony uninformative characters, and 398 constant characters. The aligned data
set of the rpL16 intron consisted of 1526 characters, with several multi-bp gaps opened
up to conservatively call some regions where alignment was impossible. Within this
aligned data set there were 235 parsimony informative characters, 147 variable but
parsimony uninformative characters, and 1144 constant characters.
Details on a priori model and structural equation modeling (SEM) – Structural
equation modeling is a multivariate statistical approach that allows one to tests observed
data against an a priori model specified by the analyst (Grace 2006). Our analysis was
driven by a set of a priori models that incorporated environmental covariation and the
effects, both direct and indirect, of environmental variable and traits on mean
phylogenetic distance. The four general a priori models (A-D) driving the analysis are
shown in Fig. 1. In the actual analysis the boxes labeled ‘environmental factors’ and
‘trait distance’ are actually represented by multiple indicators: rainfall, proportion dry
season rainfall and elevation in the case of ‘environmental variables’ and average
distance in SLA and average distance in maximum plant height in the case of ‘trait
distance’. The presence of a solid arrow in Fig. 1 suggests that direct effects from one
variable to another should be present under that particular model; no hypotheses are made
about the direction (i.e. positive or negative) of paths because they depend on specifics
such as the particular environmental variable and evolutionary trait relationships. The
simplest model (not shown) is a null model, in which none of our selected variables
explain variation in MPD. Model A occurs when the relationship between environmental
factors and MPD is indirect; under this model, the effect of environmental variation on
MPD is mediated through the direct effect of species richness on MPD. Model B occurs
when environmental variables, but not trait relationships or species richness, directly
explain variation in MPD. Reasonable explanations for this model include the
measurement of incorrect traits or a spatial/historical biogeographic effect which
modifies the pool of regionally available species. Model C occurs when environmental
variation controls trait relationships and traits directly affect MPD. Model C is most
consistent with support for phylogenetic community assembly theory. Finally, Model D
represents some combination or composite of the previous models.
Because all our measures of environmental variation occur at the site level rather
than at the scale of the 1m2 plots (i.e. average annual rainfall, proportion dry season
rainfall and elevation), the data were aggregated to the site level for the SEM analysis.
Therefore, the SEM analysis was conducted using an n = 133 sites rather than 1330
individual 1 m2 plots. This is desirable because within-site variation in MPD is
essentially unexplainable within the context of an SEM in which no within-site data are
available. Analysis of all 1330 plots separately would artificially inflate the samples
sizes and create a case of pseudoreplication that would invalidate the assessment of
individual path strengths.
Evaluation of the a priori models (Fig. 1.) with SEM was based on maximum
likelihood procedures and was conducted in AMOS version 17.0 (Arbuckle 2007).
Models were fit iteratively using a model-pruning strategy in which non-significant paths
( > 0.05) were trimmed from the model until only significant paths remained. In
addition, modification indices with a threshold value set to 4.0 were used in AMOS to
3
identify missing paths that substantially improved the model. Model fit was assessed
using the χ2 statistic and the associated P-value. A model was deemed adequate when the
observed and expected covariances were not significantly different from one another
based on the critical P-value of  > 0.05.
Supplementary Results
Phylogenetic results – The topology of the trees generated with the Bayesian and
maximum parsimony analyses were congruent; therefore, we are only presenting the
Bayesian tree (Fig. 2). In the maximum parsimony analysis, the heuristic search found
112 equally parsimonious trees with 838 steps and consistency and retention indices of C
= 0.70 and R = 0.89, which indicate a small amount of homoplasy in the data set.
Bootstrap values are shown below the branches in Fig. 2. The posterior probability
support values, generated in MrBayes, were consistently higher for equivalent tree
branches compared to the parsimony analysis, which is commonly the case. Posterior
probabilities are shown above the branches in Fig. 2. One difference between topologies
generated by the separate analyses was that in the parsimony analysis Sporobolus rangei
was shown to be sister to the clade containing S. kentrophyllus through S. fimbriatus. A
second difference between the two analyses was that in the maximim parsimony analysis
Enteropogon macrostachyus was sister to the clade containing E. paspaloides through C.
roxburgiana + C. gayana.
Comparing the statistical fit of the various SE models – The final accepted model
demonstrated a superior fit to the alternative models (Fig. 1) and was most consistent
with Model C (2 = 5.88, df = 5, p = 0.32, AIC = 51.88, BIC = 118.36; where AIC is
Akaike’s Information Criterion and BIC is the Bayesian information criterion). To assess
the fit of Model A the path between grass species richness and MPD was kept in the
model despite being non-significant (p = 0.35), which yielded a poor fit to the data (2 =
36.79, df = 7, p < 0.001, AIC = 78.79, BIC = 139.49). For Model B, all paths were
significant, but no paths were included from trait distances to MPD (2 = 22.16, df = 6, p
= 0.001, AIC = 66.16, BIC = 129.75). Model D was tested by including paths from trait
distances, grass species richness and environmental factors directly to MPD. This model
included several non-significant paths, including from rainfall --> MPD (p = 0.37),
proportion dry season rainfall --> MPD (p = 0.37), and grass species richness --> MPD (p
= 0.08), yielding a model with a lower chi-square but larger AIC and BIC then Model C
(2 = 1.65, df = 2, p = 0.44, AIC = 53.65, BIC = 128.80).
Supplementary Discussion
Comparison of the phylogeny with previous work – The phylogeny (Fig. 2) was highly
consistent with other published works, as described below. All of the species included in
this study were from three subfamilies within plant family Poaceae, subfamilies
Arundinoideae, Chloridoideae, and Panicoideae, and most species were from the latter
two subfamilies. Within subfamily Arundinoideae, only Aristida adoensis was sampled
and this species was used to root the rest of the phylogeny, which consisted of species
from subfamilies Chloridoideae and Panicoideae. Our results support earlier works that
showed the Chloridoideae and Panicoideae to be monophyletic subfamilies (Grass
Phylogeny Working Group 2009). As is apparent in Fig. 2, the species of subf.
4
Panicoideae segregate into two tribes, tribe Paniceae and tribe Andropogoneae. The
relationships shown among the taxa of tribes Andropognoeae and Paniceae are perfectly
consistent with the results of Aliscioni et al. (2005) who used chloroplastic ndhF
sequences to derive a molecular phylogeny of Panicum. Within subfamily Chloridoideae
our results are also largely congruent with other molecular studies (Hilu & Alice 2001,
Roodt-Wilding & Spies 2006). Species within this subfamily segregate into two clades
that strongly agree with taxonomic relationships shown by Roodt-Wilding & Spies
(2006) in which the tribes Eragrostideae and Cynodonteae are paraphyletic and therefore
referred to as the ‘Eragrostideae’ and ‘Cynodonteae.’ For example, in agreement with
Roodt-Wilding and Spies (2006), we show that Dactyloctenium, Oropetium, Leptochloa,
Eleusine and Tragus, which are all members of tribe Eragrostideae, segregate with tribe
Cynodonteae species (Fig. 2).
REFERENCES
Aliscioni, S.S., Giussani, L.M., Zuloaga, F.O. & Kellogg, E.A. (2003) A molecular
phylogeny of Panicum (Poaceae: Paniceae): tests of monophyly and phylogenetic
placement within Panicoideae. American Journal of Botany, 90, 796-821.
Arbuckle, J.L. (2007) Amos 17.0 update to the Amos user’s guide. Smallwaters
Corporation, Chicago, Illinois, USA.
Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the bootstrap.
Evolution, 39, 783-791.
Grass Phylogeny Working Group, Barker, N.P., Clark, L.G., Davis, J.I., Duvall, M.R.,
Guala, G.F., Hsiao, C., Kellogg, E.A. & Linder, H.P. (2001) Phylogeny and
Subfamilial Classification of the Grasses (Poaceae). Annals of the Missouri
Botanical Garden, 88, 373-457.
Grace, J.B. (2006) Structural equation modeling and natural systems. Cambridge
University Press.
Hilu, K.W. & Alice, L.A.. 2001. A phylogeny of Chloridoideae (Poaceae) based on
matK sequences. Systematic Botany, 26, 386-405.
Huelsenbeck, J.P. & Ronquist F. (2001) MrBayes: Baysian inference of phylogenetic
trees. Bioinformatics, 17, 754-755.
Matthews, S., Tsai, R.C. & Kellogg, E.A. (2000) Phylogenetic structure in the grass
family (Poaceae): Evidence from the nuclear gene phytochrome B. American
Journal of Botany, 87, 96-107.
Madison, W.P. & Madison, D.R. (2008) Mesquite: a modular system for evolutionary
analysis, ver. 2.5, website: http://mesquiteproject.org/mesquite/.
Nylander, J.A.A. (2002) Testing models of evolution—MrModeltest 1.1b. Computer
program and documentation distributed by author, website:
http://www.ebc.uu.se/systzoo/staff/nylander.html.
Peterson, P.M., Romaschenko, K. & Johnson, G. (2010) A classification of the
Chloridoideae (Poaceae) based on multi-gene phylogenetic trees. Molecular
Phylogenetics and Evolution, 55, 580-598.
Posada, D. & Crandall, K.A. (1998) MODELTEST: testing the model of DNA
substitution. Bioinformatics, 14, 817-818.
5
Roodt-Wilding, R. & Spies, J.J. (2006) Phylogenetic relationships in southern African
chloridoid grasses (Poaceae) based on nuclear and chloroplast data. Systematics
and Biodiversity, 4, 401-415.
Sang, T., Crawford, D.J. & Stuessy, T.F. (1997) Chloroplast DNA phylogeny, reticulate
evolution, and biogeography of Paeonia (Paeoniaceae). American Journal of
Botany, 84, 1120-1136.
Shaw, J., Lickey, E.B., Beck, J.T., Farmer, S.B., Liu, W., Miller, J., Siripun, K.C.,
Winder, C.T., Schilling, E.E. & Small, R. (2005) The tortoise and the hare II:
relative utility of 21 noncoding chloroplast DNA sequences for phylogenetic
analysis. American Journal of Botany, 92, 144-166.
Shaw, J., Lickey, E.B., Schilling, E.E. & Small, R. (2007) The tortoise and the hare III:
comparison of whole chloroplast genome sequences to choose noncoding regions
for phylogenetic studies in angiosperms. American Journal of Botany, 92, 142166.
Swofford, D.L. (2002) PAUP*: phylogenetic analyses using parsimony (*and other
methods) version 4.0b10. Sinauer Associates, Sunderland, Massachusetts, USA.
Tate J.A. & Simpson, B.B. (2003) Paraphyly of Tarasa (Malvaceae) and diverse origins
of the polyploid species. Systematic Botany, 28, 723-737.
Thompson, J.D., Higgins, D.G. & Gibson, T.J. (2001) ClustalX. Program available at
ftp://ftp-igbmc.u-strasbg.fr/pub/clustalx/.
Zhang, W. (2000) Phylogeny of the grass family (Poaceae) from rp/16 intron sequence
data. Molecular Phylogenetics and Evolution, 15, 135-146.
6
Download