ESM Methods - Proceedings of the Royal Society B

advertisement
Electronic Supplementary Material
Methods
(a) Species sampling
Zalmoxidae were collected over multiple collecting trips mostly by P.P.S., including
Indonesia (2006), New Caledonia (2007), Fiji (2008), Palau (2010), the Philippines
(2010), and Australia (2011). Numerous museum collections of Zalmoxidae throughout
their known range were included, with the exception of Mauritius and the Seychelles
Islands. Data collected in a previous systematic study (1) were accessed from GenBank
and/or updated with new sequences. Collected specimens were preserved in 96% EtOH
and stored at -80 ºC. The list of specimens, including voucher numbers, GenBank
accession codes, and collection details, is found in electronic supplementary material,
table S1.
(b) Molecular methods
Total DNA was extracted from the legs of animals using Qiagen’s DNEasy tissue kit
(Valencia, CA, USA). Purified genomic DNA was used as a template for PCR
amplification. Molecular markers consisted of two nuclear ribosomal genes (18S and 28S
rRNA), two nuclear protein-encoding genes (histones H3 and H4), and two mitochondrial
protein-encoding genes (cytochrome c oxidase subunit I and cytochrome b). Primer
sequences and fragment lengths are as in a previous study [1].
Polymerase chain reactions (PCR), visualization by agarose gel electrophoresis, and
direct sequencing were conducted as described in a previous study [2]. Chromatograms
obtained from the automatic sequencer were read and sequences assembled using the
sequence editing software Sequencher (Gene Codes Corporation, Ann Arbor, MI,
USA). Sequence data were edited in Se-Al ver. 2.0a11 [3].
(c) Phylogenetic analyses
Maximum likelihood (ML) and Bayesian inference (BI) analyses were conducted on
static alignments, which were inferred as follows. Sequences of ribosomal genes were
aligned using MUSCLE ver. 3.6 [4] with default parameters, and subsequently treated
with GBlocks v. 0.91b [5] to cull positions of ambiguous homology. Sequences of protein
encoding genes were aligned using MUSCLE ver. 3.6 with default parameters as well,
but alignments were confirmed using protein sequence translations prior to treatment
with GBlocks ver. 0.91b. The size of data matrices for each gene prior and subsequent to
treatment with GBlocks ver. 0.91b is provided in electronic supplementary material, table
S2.
ML analysis was conducted using RAxML ver. 7.2.7 [6] on 40 CPUs of a cluster at
Harvard University, FAS Research Computing (odyssey.fas.harvard.edu). For the
maximum likelihood searches, a unique GTR model of sequence evolution with
corrections for a discrete gamma distribution (GTR + ) was specified for each data
partition, and 500 independent searches were conducted. Nodal support was estimated via
the rapid bootstrap algorithm (1000 replicates) using the GTR-CAT model [7]. Bootstrap
resampling frequencies were thereafter mapped onto the optimal tree from the
independent searches.
BI analysis was performed using MrBayes ver. 3.1.2 [8] with a unique model of sequence
evolution with corrections for a discrete gamma distribution and a proportion of invariant
sites specified for each partition, as selected in Modeltest ver. 3.7 [9,10] under the Akaike
Information Criterion [11]. Model implementation for each dataset in indicated in
electronic supplementary material, table S3. Default priors were used starting with
random trees, and four runs, each with three hot and one cold Markov chains, were
performed until the average deviation of split frequencies reached <0.01 (4  107
generations). After burn-in samples were discarded, sampled trees were combined in a
single majority consensus topology, and the percentage of nodes was taken as posterior
probabilities.
Parsimony analyses were based on a direct optimization (DO) approach [12] using the
program POY ver. 4.1.2 [13]. Tree searches were performed using the timed search
function in POY, i.e., multiple cycles of (a) building Wagner trees, (b) subtree pruning
and regrafting (SPR), (c) tree bisection and reconnection (TBR), (d) ratcheting [14], and
(e) tree-fusing [15,16], on 40 CPUs of a cluster at Harvard University, FAS Research
Computing (odyssey.fas.harvard.edu). Timed searches of 24 hours were run for the
individual and combined analyses of all molecules under a mixed parameter set, such that
ribosomal genes were weighted using the parameter set 3221 (indel opening cost = 3;
indel extension cost = 1; transversions = transitions = 2) and protein-encoding genes were
weighted using the parameter set 121 (indel cost = 2; transversion cost = 2; transition cost
= 1). The design of this parameter set follows previous exploration of Opiliones datasets
[17].
Two iterative rounds of tree-fusing, taking all input trees from the timed search, were
conducted for the combined analysis of molecular data under the mixed parameter set.
Thereafter, the input trees from the timed search and the optimal trees from tree-fusing
were subjected to a 6-hour timed search as before. After a third round of tree-fusing, all
previous input trees, the optimal trees from tree-fusing, and the optimal trees from the
short timed search were subjected to another 24-hour timed search. Finally, the trees from
each previous step were subjected to 20 rounds of tree fusing under the mixed parameter
set to check for heuristic stability [18]. Nodal support for the optimal parameter set was
estimated via jackknifing (250 replicates) with a probability of deletion of e-1 [19].
(d) Estimation of divergence times
Ages of clades were inferred using BEAST ver. 1.6.1 [20,21]. We specified a unique
GTR model of sequence evolution with corrections for a discrete gamma distribution and
a proportion of invariant sites (GTR +  + I) for each partition (as with BI analysis).
Divergence time calibration drew upon a previous study of the suborder Laniatores [22],
wherein molecular dating was conducted using the same methodology and constrained
using fossil taxa. In the present study, we took the 95% HPD intervals to constrain three
nodes: the superfamily Zalmoxoidea, the superfamily Samooidea, and the split between
the two superfamilies (the root). We used normal distribution priors for the three nodes to
characterize the calibrations, upon observation of Gaussian posterior distributions for
these nodes’ age estimates from the previous study [22].
An uncorrelated lognormal clock model was inferred for each partition, and a Yule
speciation process was assumed for the tree prior. We selected the uncorrelated
lognormal model because its accuracy is comparable to an uncorrelated exponential
model, but it has narrower 95% highest posterior density intervals. Additionally, the
variance of the uncorrelated lognormal model can better accommodate data that are
already clock-like [20]. Priors were sequentially optimized in a series of iterative test
runs; the command files are available upon request from the authors. Four Markov chains
were run for 108 generations, sampling every 104 generations. Convergence diagnostics
were assessed using Tracer ver. 1.5 [23].
However, use of “secondary” calibrations (i.e., transitive use of divergence time
estimates across studies), and in particular with errorless point calibrations, has been
criticized for engendering spurious estimates [24, 25]. Additionally, normal distribution
priors can sometimes be inappropriate for dating analyses, particularly if the position of a
calibrator along a branch length is unknown [26]. To test the appropriateness of the
secondary calibrations we employed from the previous study, we constructed a 228-taxon
dataset, combining all 147 focal taxa with those employed in our previous study [22],
wherein all known families of Laniatores, as well as representatives of the other three
suborders, are sampled. We estimated divergence times in BEAST with the same number
of Markov chains and generations as for the 147-taxon dataset. Subsequent to alignment
and treatments with GBlocks ver. 0.91b (as above), the resulting 228-taxon matrix was
smaller than the original dataset (6172 versus 6563 nucleotide positions), owing to
sequence variability across Opiliones. We used fossil taxa to calibrate divergence times,
as follows. We constrained the age of Eupnoi to 410 Ma using the crown group Devonian
harvestman Eophalangium sheari [27]; a normal distribution with a standard deviation of
5 Myr was applied to this node to account for uncertainty in estimation of the fossil age.
Dyspnoi were constrained using the Carboniferous fossils Eotrogulus fayoli and
Nemastomoides elaveris [27] Given that the most recent common ancestor of Dyspnoi
could be older than the age of these fossils (each represents an extant superfamily), we
used a lognormal distribution prior for the root of Dyspnoi, permitting its age to predate
the Carboniferous taxa (mean in real space of 300 Ma, offset of 2). We subsequently
compared estimates from this dataset to the 147-taxon dataset.
Tree files of estimated ages and 95% highest posterior density (HPD) intervals for both
the 147-taxon dataset and the 228-taxon dataset have been deposited in TreeBase.
(e) Ancestral range reconstruction
To maintain comparability between the BEAST, ML, BI, and DO topologies, we used
divergence time estimates from the 147-taxon dataset for biogeographic analyses.
Likelihood analysis of range evolution was conducted using the dated phylogeny and the
DEC model as implemented in the program Lagrange [28,29]. We coded the ranges of
terminals as 14 areas. We implemented three models: (a) an unconstrained model; (b) a
stepping-stone model, wherein spatial (but not temporal) information was incorporated;
and (c) a stratified model, wherein seven spans of geological time were delimited and the
relationships of the areas were recorded during each span of time. Geological events used
to delimit the time spans follow Hall [30] and Sanmartín & Ronquist [31]. The maximum
number of areas in ancestral ranges was held at two, a convention that reflects empirical
observations of Zalmoxidae species, the majority of which are narrowly distributed
endemics. Dispersal constraints were set to 1.0 (if landmasses were connected), 0.1 (if
landmasses were disjunct), or 0 (if landmasses did not exist). The list of areas and the
dispersal constraint matrices of the stratified model are provided in electronic
supplementary material, table S4 (Python scripts specifying dispersal constraint matrices
are available upon request from the authors).
Bayesian analysis of range evolution was conducted using the program RASP [32]. As an
ultrametric tree is not required (the analysis does not account for time), we analyzed all
four topologies using RASP. The ranges of terminals were coded in the same manner as
for the DEC model. Two runs of 106 generations were run, sampling every 103
generations, such that the average deviation of split frequencies reached <0.001. After
burn-in samples were discarded, frequencies of the ancestral areas reconstructed at all
nodes were combined from the two runs.
(f) Analysis of diversification rate
Temporal shifts in diversification rate were examined with the R package LASER [33].
The dated phylogeny of the Zalmoxidae subtree was isolated from the BEAST topology
and pruned such that species represented by multiple terminals were subsequently
represented by a single specimen. Multiple diversification models were fitted to the dated
phylogeny of Zalmoxidae and the fit of the alternative models was compared using the
Akaike Information Criterion [11]. Diversification parameters were computed using the
best-fitting model amongst two constant rate and six variable rate models.
References
1 Sharma, P. P. 2012 New Australasian Zalmoxidae (Opiliones: Laniatores) and a new
case of male polymorphism in Opiliones. Zootaxa.
2 Sharma, P. & Giribet, G. 2009 Sandokanid phylogeny based on eight molecular
markers—The evolution of a southeast Asian endemic family of Laniatores
(Arachnida, Opiliones). Mol. Phylogenet. Evol. 52, 432-447.
(doi:10.1016/j.ympev.2009.03.013)
3 Rambaut, A. E. 1996 Se-Al sequence alignment editor. University of Oxford, UK
Program and documentation available from: <http://evolve.zoo.ox.ac.uk/software>
4 Edgar, R. C. 2004 MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic Acids Res. 32, 1792-1797. (doi:10.1093/nar/gkh340)
5 Castresana, J. 2000 Selection of conserved blocks from multiple alignments for their
use in phylogenetic analysis. Mol. Biol. Evol. 17, 540-552.
6 Stamatakis, A. 2006 RAxML-VI-HPC: maximum likelihood-based phylogenetic
analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688-2690.
(doi:10.1093/bioinformatics/btl446)
7 Stamatakis, A., Hoover, P. & Rougemont, J. 2008 A rapid bootstrap algorithm for the
RAxML Web servers. Syst. Biol. 57, 758-771. (doi:10.1080/10635150802429642)
8 Ronquist, F. & Huelsenbeck, J. P. 2005 Bayesian analyses of molecular evolution
using MrBayes. In Statistical Methods in Molecular Evolution. (Ed Nielsen, R.) New
York, NY: Springer.
9 Posada, D. 2005 Modeltest 3.7. Program and documentation available from:
<http://darwin.uvigo.es/>
10 Posada, D. & Crandall, K. A. 1998 Modeltest: testing the model of DNA substitution.
Bioinformatics 14, 817-818. (doi:10.1093/bioinformatics/14.9.817)
11 Posada, D. & Buckley, T. 2004 Model selection and model averaging in
phylogenetics: advantages of Akaike information criterion and Bayesian approaches
over likelihood ratio tests. Syst. Biol. 53, 793-808.
(doi:10.1080/10635150490522304)
12 Wheeler, W. C. 1996 Optimization alignment: the end of multiple sequence
alignment in phylogenetics? Cladistics 12, 1-9. (doi:10.1111/j.10960031.1996.tb00189.x)
13 Varón, A., Vinh, L. S. & Wheeler, W. C. 2010 POY version 4: phylogenetic analysis
using dynamic homologies. Cladistics 26, 72-85. (doi:10.1111/j.10960031.2009.00282.x)
14 Nixon, K. C. 1999 The parsimony ratchet, a new method for rapid parsimony
analysis. Cladistics 15, 407-414. (doi:10.1006/clad.1999.0121)
15 Goloboff, P. A. 1999 Analyzing large data sets in reasonable times: solutions for
composite optima. Cladistics 15, 415-428. (doi:10.1111/j.1096-0031.1999.tb00278.x)
16 Goloboff, P. A. (2002) Techniques for analyzing large data sets. In Techniques in
Molecular Systematics and Evolution. (Eds Desalle, R., Giribet, G., Wheeler, W. C.)
pp. 70-79. Basel, Switzerland: Brikhäuser Verlag.
17 Sharma, P. P., Vahtera, V., Kawauchi, G. Y. & Giribet, G. 2011 Running WILD: The
case for exploring mixed parameter sets in sensitivity analysis. Cladistics 27, 538549. (doi:10.1111/j.1096-0031.2010.00345.x)
18 Giribet, G. 2007 Efficient tree searches with available algorithms. Evol. Bioinform. 3,
341-356.
19 Farris, J. S., Albert, V. A., Källersjö, M., Lipscomb, D. & Kluge, A. G. 1996
Parsimony jackknifing outperforms neighbor-joining. Cladistics 12, 99-124.
(doi:10.1111/j.1096-0031.1996.tb00196.x)
20 Drummond, A. J., Ho, S. Y. W, Phillips, M. J. & Rambaut, A. 2006 Relaxed
phylogenetics and dating with confidence. PLoS Biology 4, e88.
(doi:10.1371/journal.pbio.0040088)
21 Drummond, A. J. & Rambaut, A. 2007 BEAST: Bayesian evolutionary analysis by
sampling trees. BMC Evol. Biol. 7, 214. (doi:10.1111/j.1096-0031.2009.00296.x)
22 Sharma, P. P. & Giribet, G. 2011 The evolutionary and biogeographic history of the
armoured harvestmen—Laniatores phylogeny based on ten molecular markers, with
the description of two new families of Opiliones (Arachnida). Invertebr. Syst. 25,
106-142. (doi:10.1071/IS11002)
23 Rambaut, A. & Drummond, A. J. 2009 Tracer v. 1.5. Program and documentation
available from: <http:// beast.bio.ed.ac.uk/Tracer/>
24 Shaul, S. & Graur, D. 2002 Playing chicken (Gallus gallus): methodological
inconsistencies of molecular divergence date estimates due to secondary calibration
points. Gene 300, 59-61. (doi:10.1016/S0378-1119(02)00851-X)
25 Graur, D. & Martin, W. 2004. Reading the entrails of chickens: molecular timescales
of evolution and the illusion of precision. Trends Genet. 20, 80-86.
(doi:10.1016/j.tig.2003.12.003)
26 Ho, S. Y. W. & Phillips, M. J. 2009 Accounting for calibration uncertainty in
phylogenetic estimation of evolutionary divergence times. Syst. Biol. 58, 367-380.
(doi: 0.1093/sysbio/syp035)
27 Dunlop, J. A. 2007 Paleontology. In Harvestmen: The Biology of Opiliones. (Eds R.
Pinto-da-Rocha, G. Machado & G. Giribet) pp. 247-265. Cambridge, MA: Harvard
University Press.
28 Ree, R. H., Moore, B. R., Webb, C. O. & Donoghue, M. J. 2005 A likelihood
framework for inferring the evolution of geographic range on phylogenetic trees.
Evolution 59, 2299-2311. (doi:10.1554/05-172.1)
29 Ree, R. H. & Smith, S. A. 2008 Maximum likelihood inference of geographic range
evolution by dispersal, local extinction, and cladogenesis. Syst. Biol. 57, 4-14.
(doi:10.1080/10635150701883881)
30 Hall, R. 2002 Cenozoic geological and plate tectonic evolution of SE Asia and the
SW Pacific: computer-based reconstructions and animations. J. As. Earth Sci. 20,
353-434. (doi:10.1016/S1367-9120(01)00069-4)
31 Sanmartín, I. & Ronquist, F. 2004 Southern Hemisphere biogeography inferred by
event-based models: plant versus animal patterns. Syst. Biol. 53, 216-243.
(doi:10.1080/10635150490423430)
32 Yu, Y., Harris, A. J. & He, X.-J. 2011 RASP (Reconstruct Ancestral State in
Phylogenies) 2.0 beta. Program and documentation available from:
<http://mnh.scu.edu.cn/soft/blog/RASP/>
33 Rabosky, D. L. 2006 LASER: A maximum likelihood toolkit for detecting temporal
shifts in diversification rates from molecular phylogenies. Evol. Bioinform. 2, 247250.
Download