Descriptions of the Computer Programs Used in the Analyses

advertisement
1
Descriptions of the Computer Programs Used in the Analyses
PAUP* (v. 4.0b10; Swofford, 2002) implements a range of deterministic strategies for
exploring tree space. Four different strategies were evaluated. First, a series of “successive
approximations” was used to estimate both the ML tree and the parameter values of the
substitution model (Sullivan et al., 2005). The specific implementation was inspired by Foster
(2003) and Sullivan (2005). The neighbor-joining tree based on absolute differences was used as
the starting tree, followed by one round of NNI branch swapping, two rounds of SPR swapping
and one round of TBR swapping. Between each branch-swapping round the substitution-model
parameter values were optimized, and were then fixed at these values for the subsequent tree
optimizations. The ApproxLim option was set to the default value (5%) for the NAD4 data, to
2% for the Isospora data and 1% for the HIV data.
Second, the default PAUP* strategy was used, based on the stepwise-addition starting tree
followed by branch swapping. Three analyses were performed for each data set, using different
branch-swapping options: NNI based on ten random addition sequences, SPR based on three
random addition sequences, and a single TBR (this latter is the PAUP* default option). In all
cases the parameters for the nucleotide-substitution model were fixed at the values determined
for the optimal tree found using the successive-approximations strategy above, and using the
same ApproxLim percentages.
Third, the quartet-puzzling strategy was used, with the default options. Fourth, the star
decomposition strategy was used for the NAD4 and Isospora data sets only, with the default
options. This latter analysis is impractical for the HIV data set (e.g. the analysis of the Isospora
data set took 8.5 weeks on the fastest computer used). In both cases the parameters for the
nucleotide-substitution model were fixed at the values determined for the optimal tree found
using the successive-approximations strategy.
Tree-Puzzle (v. 5.2; Schmidt et al., 2002) uses quartet puzzling to explore tree space
deterministically, while IQPNNI (v. 3.0; Vinh and von Haeseler, 2004) extends this idea by
adding branch swapping and a stopping criterion. PhyNav (v. 1.0; Vinh et al., 2005) creates
subsets larger than quartets, which are then stitched together. These programs all have roughly
the same user options. For Tree-Puzzle the GTR+G model was used with five categories for the
discrete gamma distribution, and GTR+G+I was used for the other two programs. For TreePuzzle, the nucleotide-substitution rates were fixed at the values determined for the optimal tree
found using the successive-approximations strategy, while all other parameter values were
estimated by the program.
PhyML (v. 2.4.4; Guindon and Gascuel, 2003) uses a series of heuristics to explore a part
of tree space deterministically. Three versions of each analysis were run, using respectively the
NNI branch-swapping strategy described by Guindon and Gascuel (2003), the SPR branchswapping strategy described by Hordijk and Gascuel (2005) and the hybrid strategy described by
Hordijk and Gascuel (2005). All parameter values were estimated by the program.
RAxML-VI (v. 1.0; Stamatakis et al., 2005) has a range of strategies for stochastically
exploring parts of tree space. As recommended in the instructions, two different strategies were
used for each analysis. First, ten hill-climbing runs were performed based on random-additionorder parsimony starting trees. Second, five simulated-annealing runs were performed based on
random-addition-order parsimony starting trees, with the analysis time-period being set to four
times the average time of the hill-climbing runs. All other default options were used, with the
parameter values of the GTR+G model being estimated by the program. Note that RAxML-VIHPC (Stamatakis, 2006), designed specifically for much larger data sets (e.g. >1,000 sequences),
employs a different set of search strategies to those evaluated here, which are actually much
closer to those implemented in RAxML-V (Stamatakis, 2005).
GARLI (v. 0.93, 0.942, 0.95; Zwickl, 2006) is basically a development of the GAML
2
version 1 (Lewis, 1998) and 2 (Brauer et al., 2002) programs, and uses a genetic algorithm to
explore tree space stochastically. The default options were used for all analyses, with all
parameter values being estimated by the program. Version 0.93 was used for analysing the main
three data sets, version 0.942 for the ancillary analyses, and 0.95 for the HKY analyses (see
below). Ten analyses were run for each data set, each starting from a random tree. In addition,
for each of the main data sets a single analysis was run starting from the neighbor-joining tree.
TreeFinder (v. May 2006; Jobb et al., 2004) currently uses an unspecified algorithm to
explore tree space deterministically (the algorithm described by Jobb et al., 2004 was used by
earlier versions). The default options were used for all analyses, except that ten starting trees
were created for each analysis using the “random walk” option with the neighbor-joining tree as
the centre tree. All parameter values were estimated by the program.
MultiPhyl (v. 1.0.6; Keane, 2006) uses a similar series of heuristics to PhyML in order to
explore a part of tree space deterministically. The default options were used for all analyses, with
SPR branch swapping. All parameter values were estimated prior to the analysis and then fixed
during the tree search. This program was run as a distributed analysis (i.e. using multiple
processors in a heterogeneous computer system) via the online service provided by the
Heterogeneous Distributed Computing group at the National University of Ireland, Maynooth
(http://www.cs.nuim.ie/distributed/multiphyl.php).
DPRml (Keane et al., 2005) is basically an extension of the fastDNAml program (Olsen et
al., 1994) that accepts a wider range of nucleotide-substitution models, using a series of
heuristics to explore tree space deterministically. It thus represents the older style of search
strategy, which the more recent programs are intended to supplant. This program was also run as
a distributed analysis via the facilities of the Heterogeneous Distributed Computing group, using
the default options.
REFERENCES
Brauer, M. J., M. T. Holder, L. A. Dries, D. J. Zwickl, P. O. Lewis, and D. M. Hillis. 2002.
Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference.
Mol. Biol. Evol. 19:1717–1726.
Foster, P. G. 2003. Likelihood in molecular phylogenetics. Unpublished notes used for
Molecular Systematics course. Natural History Museum, London, U.K. July 2001;
September 2003.
(http://www.ch.embnet.org/CoursEMBnet/PHYL03/Slides/unix_like_pfoster.pdf;
(http://bioinf.ncl.ac.uk/molsys/data/like.pdf)
Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large
phylogenies by maximum likelihood. Syst. Biol. 52:696–704.
Hordijk, W., and O. Gascuel. 2005. Improving the efficiency of SPR moves in phylogenetic tree
search methods based on maximum likelihood. Bioinformatics 21:4338–4347.
Jobb, G., A. von Haeseler, and K. Strimmer. 2004. Treefinder: a powerful graphical analysis
environment for molecular phylogenetics. BMC Evol. Biol. 4:18.
Keane, T. M. 2006. Computational methods for statistical phylogenetic inference. Ph.D. thesis,
The National University of Ireland Maynooth, Ireland.
Keane, T. M., T. J. Naughton, S. A. A. Travers, J. O. McInerney, and G. P. McCormack. 2005.
DPRml: distributed phylogeny reconstruction by maximum likelihood. Bioinformatics
21:969–974.
Lewis, P. O. 1998. A genetic algorithm for maximum-likelihood phylogeny inference using
nucleotide sequence data. Mol. Biol. Evol. 15:277–283.
Olsen, G. J., H. Matsuda, R. Hagstrom, and R. Overbeek. 1994. FastDNAml: a tool for
construction of phylogenetic trees of DNA sequences using maximum likelihood. Comp.
3
Appl. Biosci. 10:41–48.
Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. Tree-Puzzle: maximum
likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics
18:502–504.
Stamatakis, A. 2005. An efficient program for phylogenetic inference using simulated annealing.
Page 198b in Proceedings of the 19th international parallel and distributed processing
symposium (IPDPS’05), and the 4th international workshop on high performance
computational biology (HiComB’05). IEEE Press, Piscataway NJ.
Stamatakis, A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with
thousands of taxa and mixed models. Bioinformatics 22:2688–2690.
Stamatakis, A., T. Ludwig, and H. Meier. 2005. RAxML-III: a fast program for maximum
likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456–463.
Sullivan, J. 2005. Maximum likelihood methods for phylogeny estimation. Meth. Enzymol.
395:757–779.
Swofford, D. L. 2002. PAUP*. Phylogenetic analysis using parsimony (*and other methods).
Sinauer Associates, Sunderland MA.
Vinh, L. S., H. A. Schmidt, and A. von Haeseler. 2005. PhyNav: a novel approach to reconstruct
large phylogenies. Pages 386–393 in Classification, the ubiquitous challenge: proceedings
of the 28th annual conference of the Gesellschaft für Klassifikation e.V. (C. Weihs and W.
Gaul, eds). Springer-Verlag, Heidelberg.
Vinh, L. S., and A. von Haeseler. 2004. IQPNNI: moving fast through tree space and stopping in
time. Mol. Biol. Evol. 21:1565–1571.
Zwickl, D. J. 2006. Genetic algorithm approaches for the phylogenetic analysis of large
biological sequence datasets under the maximum likelihood criterion. Ph.D. thesis, The
University of Texas at Austin, U.S.A.
4
Maximum-Likelihood Computer Programs Available
Computer programs currently available that implement the maximum-likelihood criterion for evaluating phylogenetic trees based on
nucleotide data, and which were considered for inclusion in the evaluation.
_______________________________________________________________________________________________________
Program
Version
Reference
Internet access
_______________________________________________________________________________________________________
Programs used
DPRml
GARLI
IQPNNI
MultiPhyl
PAUP*
PhyML
PhyNav
RAxML-VI
Tree-Puzzle
TreeFinder
1.0
0.951 a
3.0.1b
1.0.6
4.0b10
2.4.4
1.0
1.0 c
5.2
May 2006
Keane et al. (2005)
Zwickl (2006)
Vinh and von Haeseler (2004)
Keane (2006)
Swofford (2002)
Guindon and Gascuel (2003)
Vinh et al. (2005)
Stamatakis et al. (2005)
Schmidt et al. (2002)
Jobb et al. (2004)
http://www.cs.nuim.ie/distributed/
http://www.bio.utexas.edu/faculty/antisense/garli/Garli.html
http://www.cibiv.at/software/iqpnni/
http://www.cs.nuim.ie/distributed/multiphyl.php
http://paup.csit.fsu.edu/
http://atgc.lirmm.fr/phyml/
http://www.cibiv.at/software/phynav/
http://icwww.epfl.ch/~stamatak/index-Dateien/Page443.htm
http://www.tree-puzzle.de/
http://www.treefinder.de/
Programs not implementing the reversible GTR+G substitution model
FastDNAml
MetaPIGA
MOLPHY
NHML
Phylip
POY
TrExML
SSA
SEMPHY
1.1.1a
1.02
2.3b3
3
3.66
3.0.11
1.0
1.0
2.0
Olsen et al. (1994)
Lemmon and Milinkovitch (2002)
Adachi and Hasegawa (1996)
Galtier and Gouy (1998)
Felsenstein (1989)
Wheeler (2006)
Wolf et al. (2000)
Salter and Pearl (2001)
Friedman et al. (2002)
http://geta.life.uiuc.edu/~gary/programs/fastDNAml.html
http://www.ulb.ac.be/sciences/ueg/html_files/MetaPIGA.html
http://www.ism.ac.jp/ismlib/softother.e.html
www.genetix.univ-montp2.fr/nhml.htm
http://evolution.genetics.washington.edu/phylip.html
http://research.amnh.org/scicomp/projects/poy.php
http://whitetail.bemidjistate.edu/trexml/trexml.man.html
http://www.stat.unm.edu/~salter/software/ssa/ssa.html
http://compbio.cs.huji.ac.il/semphy/
5
Programs not designed for extensive tree searches
APE d
1.8-4
Paradis et al. (2004)
http://cran.r-project.org/src/contrib/Descriptions/ape.html
DAMBE
4.5.20
Xia and Xie (2001)
http://dambe.bio.uottawa.ca/dambe.asp
HyPhy
0.99beta
Kosakovsky Pond et al. (2005)
http://www.hyphy.org/
PAML
3.15
Yang (1997)
http://abacus.gene.ucl.ac.uk/software/paml.html
PHASE
2.0
Jow et al. (2002)
http://www.bioinf.manchester.ac.uk/resources/phase/
P4
0.83
Foster (2006)
http://www.nhm.ac.uk/research-curation/projects/software/p4.html
_______________________________________________________________________________________________________
a
Current release, but version 0.93 was used for the principal analyses and 0.942 for the other analyses.
Current release, but version 3.0 was used for most of the analyses.
c
The current release is RAxML-VI-HPC v2.2.0 (Stamatakis, 2006), which is actually quite a different program.
d
PhyML can be used in conjunction with APE.
b
References
Adachi, J., and M. Hasegawa. 1996. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci.
Monogr. 28:1–150.
Foster, P. G. 2006. P4, a python package for phylogenetics. Distributed by the author. Department of Zoology, Natural History Museum,
London, U.K. July 2006.
Friedman, N., M. Ninio, I. Pe'er, and T. Pupko. 2002. A structural EM algorithm for phylogenetic inference. J. Computat. Biol. 9:331–353.
Galtier, N., and M. Gouy. 1998. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA
sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15:871–879.
Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol.
52:696–704.
Jobb, G., A. von Haeseler, and K. Strimmer. 2004. Treefinder: a powerful graphical analysis environment for molecular phylogenetics. BMC
Evol. Biol. 4:18.
Jow, H., C. Hudelot, M. Rattray, and P. Higgs. 2002. Bayesian phylogenetics using an RNA substitution model applied to early mammalian
evolution. Mol. Biol. Evol. 19:1591–1601.
Keane, T. M. 2006. Computational methods for statistical phylogenetic inference. Ph.D. thesis, The National University of Ireland Maynooth,
Ireland.
Keane, T. M., T. J. Naughton, S. A. A. Travers, J. O. McInerney, and G. P. McCormack. 2005. DPRml: distributed phylogeny reconstruction by
maximum likelihood. Bioinformatics 21:969–974.
6
Kosakovsky Pond, S. L., S. D. W. Frost, and S. V. Muse. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676–679.
Lemmon, A. R., and M. C. Milinkovitch. 2002. The metapopulation genetic algorithm: an efficient solution for the problem of large phylogeny
estimation. Proc. Nat. Acad. Sci. U.S.A. 99:10516–10521.
Olsen, G. J., H. Matsuda, R. Hagstrom, and R. Overbeek. 1994. FastDNAml: a tool for construction of phylogenetic trees of DNA sequences
using maximum likelihood. Comp. Appl. Biosci. 10:41–48.
Paradis, E., J. Claude, and K. Strimmer. 2004. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290.
Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. Tree-Puzzle: maximum likelihood phylogenetic analysis using quartets
and parallel computing. Bioinformatics 18:502–504.
Stamatakis, A., T. Ludwig, and H. Meier. 2005. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic
trees. Bioinformatics 21:456–463.
Swofford, D. L. 2002. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland MA.
Vinh, L. S., H. A. Schmidt, and A. von Haeseler. 2005. PhyNav: a novel approach to reconstruct large phylogenies. Pages 386–393 in
Classification, the ubiquitous challenge: proceedings of the 28th annual conference of the Gesellschaft für Klassifikation e.V. (C. Weihs
and W. Gaul, eds). Springer-Verlag, Heidelberg.
Vinh, L. S., and A. von Haeseler. 2004. IQPNNI: moving fast through tree space and stopping in time. Mol. Biol. Evol. 21:1565–1571.
Wheeler, W. C. 2006. Dynamic homology and the likelihood criterion. Cladistics 22:157–170.
Wolf, M. J., S. Easteal, M. Kahn, B. D. McKay, and L. S. Jermiin. 2000. TrExML: a maximum likelihood approach for extensive tree-space
exploration. Bioinformatics 16:383–394.
Xia, X., and Z. Xie. 2001. DAMBE: data analysis in molecular biology and evolution. J. Hered. 92:371–373.
Zwickl, D. J. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum
likelihood criterion. Ph.D. thesis, The University of Texas at Austin, U.S.A.
7
Input File for the Ratchet(Nixon) Analyses
#NEXUS
[
This is a fully commented setup file that can be used to implement the
likelihood ratchet using the PAUPRat program of Derek Sikes and Paul Lewis:
http://www.ucalgary.ca/~dsikes/software2.htm
Sikes, D.S. & Lewis, P.O. 2001. Beta software, version 1.
PAUPRat: PAUP* implementation of the parsimony ratchet.
Distributed by the authors. Department of Ecology and Evolutionary
Biology, University of Connecticut, Storrs, USA. June 2001.
You need to obtain a copy of the PAUPRat program, and to read its
Instruction manual. Basically, you first run this setup file through
PAUPRat, and then you run the PAUPRat output file through PAUP*.
The input files for PAUP* are your data file and the control file created
by PAUPRat from this setup template. The output files from PAUP* are:
lratchet.log - a text file with the results
lratchet.tre - a treefile with the optimal trees from each iteration
model.out
- a text file with the model parameter values used
lratchet.tmp - a temporary file that you can discard.
]
[
The original idea for the parsimony ratchet was by Kevin Nixon:
Nixon, K.C. 1999. The parsimony ratchet: a new method for rapid parsimony
analysis. Cladistics 15: 407-414.
]
[
The original version of the likelihood ratchet was by Rutger Vos:
Vos, R.A. 2003. Accelerated likelihood surface exploration: the likelihood
ratchet. Systematic Biology 52: 368-373.
The original setup file (June 2002) was downloaded from:
http://www.sfu.ca/~rvosa/likelihoodratchet
]
[
Modifications (November 2006) were by David Morrison, to implement the
'ratchet' part of the procedure, as this was missing from the Vos version
(which generates a new starting tree for each iteration). Also, the
strategy now provides a series of initial "successive approximations" to
estimate both the starting tree and the substitution-model parameter values.
Finally, the tree-search strategy has been optimized for maximum-likelihood
analyses of up to 150 sequences.
]
[
The successive approximations were based on the ideas of:
Sullivan, J., Abdo, Z., Joyce, P. & Swofford, D.L. 2005. Evaluating
the performance of a successive-approximations approach to parameter
optimization in maximum-likelihood phylogeny estimation. Molecular
Biology & Evolution 22: 1386-1392.
The specific implementation was inspired by:
Sullivan, J. 2005. Maximum likelihood methods for phylogeny estimation.
Methods in Enzymology 395: 757-779.
and by Peter Foster:
http://bioinf.ncl.ac.uk/molsys/data/like.pdf
http://www.ch.embnet.org/CoursEMBnet/PHYL03/Slides/unix_like_pfoster.pdf
8
Foster, P.G. 2001. Likelihood in Molecular Phylogenetics. Unpublished
notes used for Molecular Systematics course. Natural History Museum,
London, UK. July, 2001; September 2003.]
]
[
There are seven settings in this setup file that you might need to change:
(1) You must modify the 'nchar' command to match your data set.
(2) The default number of re-weighting iterations is 10, and the
percentage of characters to re-weight is 25. This produces 11 trees (the
initial tree plus 10 attempts to change island). You can change these
values using the 'nreps' and 'pct' commands (e.g. nreps=20 pct=15).
(3) The default re-weighting scheme treats all of the characters as
equal. You can change this using the 'wtmode' command.
(4) The default substitution model is GTR+G+I (general time reversible, with
gamma-distributed site-to-site variation and a proportion of invariable
sites). If you want to use a different model, then you need to change all of
the 'LScores' and 'LSet' commands. Note that the complexity of the model
does not affect the speed of the tree searches (since the model is fixed for
all searches), but does affect the speed of model estimation during the
initial successive approximations.
(5) The default tree-search strategy is SPR (subtree-prune-regraft) (the
PAUP* default is TBR, intended for parsimony searches). If you want to use
a different strategy, then you need to change all of the 'Swap=spr'
commands. If you want separate strategies for the re-weighted and
unweighted searches, then you need to change the commands labelled
'rewtdcmd' and 'normcmd', respectively.
(6) During the tree search the log-likelihood scores are not fully optimized
unless they are within 2% of the current optimum value (the PAUP* default is
5%, intended for <50 sequences). If you want to use a different strategy,
then you can change this using the 'ApproxLim' command (e.g. ApproxLim=1 for
data sets with larger negative log-likelihoods). Note that this value can
make a big difference to how long the ratchet takes to run; even a change in
value of 0.01% can be important for large data sets (multiple genes for >100
sequences). For a discussion, see:
Rogers J.S. & Swofford, D.L. 1998. A fast method for approximating maximum
likelihoods of phylogenetic trees from nucleotide sequences. Systematic
Biology 47: 77-89.
(7) Only one tree is saved during the re-weighted tree search, on the
principle that the optimal tree does not necessarily have to be found for
this search (only for the unweighted search). If you do want to find the
optimal tree, then you need to change the 'MulTrees=no' command. Also, you
might like to consider using the 'RearrLimit' or 'TimeLimit' commands if you
wish to prevent unduly long re-weighted searches.
]
[Start of instructions. Don't change.]
begin pauprat;
[Enter the number of characters after nchar= on the following line.]
dimensions nchar=10922;
[Enter the number of iterations after 'nreps=' and the fraction of
characters drawn after 'pct=' on the following line. The default
values seem to work, but you can always use more replicates and a
greater percentage (probably up to 35%, as for the parsimony
ratchet) if you expect a very complex landscape, or if you have a small
9
data set and/or a very fast computer. 'Seed=0' sets a randomly chosen
random-number seed, but you can pre-specify a particular seed if you
want exact repetition of the characters chosen for re-weighting.]
set seed=0 nreps=10 pct=25;
[Choose the weighting mode. The choices are: additive, multiplicative,
uniform. Typically, the default works fine unless you are using a
weighting scheme (i.e. a 'WtSet' command) based on codon positions, in
which case you might want to try 'mult'.]
set wtmode=uniform;
[Don't change this unless you want a lot of output.]
set terse;
[Opening message.]
startcmd
startcmd
startcmd
startcmd
startcmd
startcmd
"[!* * * * * * * * * * * * * * * * * *
"[!* ----- Likelihood Ratchet v2 ----"[!*
David A. Morrison
"[!*
Sveriges Lantbruksuniversitet
"[!*
November, 2006
"[!* * * * * * * * * * * * * * * * * *
*]";
*]";
*]";
*]";
*]";
*]";
[Record the current time.]
startcmd "Time";
[The *.log file stores PAUP*'s display buffer.]
startcmd "Log File=lratchet.log";
[Automatically increase the 'maxtrees' setting. Don't change.]
startcmd "Set Increase=auto";
[Get the starting tree. No need to change unless you want to specify
a user starting tree, in which case use the 'GetTrees' command.]
startcmd "DSet Dist=logdet Objective=ME Rates=equal PInv=0 Subst=all
NegBrLen=setzero";
startcmd "NJ BioNJ=yes ShowTree=no BrLens=no BreakTies=systematic";
[Set the optimality criterion to ML. Don't change.]
startcmd "Set Criterion=likelihood";
[Optimize the substitution-model parameters.]
startcmd "LScores 1 / NST=6 BaseFreq=estimate RMatrix=estimate Rates=gamma
Shape=estimate PInvar=estimate";
[The *.tmp file contains the current working tree. It can be used
to re-start a ratchet run that has been interrupted. Don't change.]
startcmd "SaveTrees File=lratchet.tmp Replace";
startcmd "Time";
10
[Do an NNI search based on these parameter estimates, and then
optimize the substitution-model parameters again.]
startcmd "LSet BaseFreq=previous NST=6 RMatrix=previous Rates=gamma
Shape=previous PInvar=previous ApproxLim=2 AdjustAppLim=no";
startcmd "HSearch Status=no Start=current Swap=nni MulTrees=yes";
startcmd "SaveTrees File=lratchet.tmp Replace";
startcmd "LScores 1 / NST=6 BaseFreq=estimate RMatrix=estimate Rates=gamma
Shape=estimate PInvar=estimate";
startcmd "Time";
[Do an SPR search based on these parameter estimates, and then
optimize the substitution-model parameters again. Save the model
parameter values to the model.out file. The 'LongFmt' option is used only
to deal with a long-standing bug in PAUP* version 4b10.]
startcmd "LSet BaseFreq=previous NST=6 RMatrix=previous Rates=gamma
Shape=previous PInvar=previous ApproxLim=2 AdjustAppLim=no";
startcmd "HSearch Status=no Start=current Swap=spr MulTrees=yes";
startcmd "SaveTrees File=lratchet.tmp Replace";
startcmd "Default LScores LongFmt=yes";
startcmd "LScores 1 / NST=6 BaseFreq=estimate RMatrix=estimate Rates=gamma
Shape=estimate PInvar=estimate ScoreFile=model.out Replace";
startcmd "Default LScores LongFmt=no";
startcmd "Time";
[The *.tre file contains the set of solutions for the initial tree
plus all subsequent iterations. There will thus be at least nreps+1
trees in this file at the end. Don't change.]
startcmd "SaveTrees File=lratchet.tre Replace";
[Set the substitution-model parameters for the likelihood model used
in all subsequent iterations.]
startcmd "LSet BaseFreq=previous NST=6 RMatrix=previous Rates=gamma
Shape=previous PInvar=previous ApproxLim=2 AdjustAppLim=no";
[Commands for the branch-swapping cycles under the re-weighted scheme.
This is the tree search that tries to get to another island of trees.]
rewtdcmd "HSearch Status=no Start=1 Swap=spr MulTrees=no";
[Updates the *.tmp file to contain the current tree. Don't change.]
rewtdcmd "SaveTrees File=lratchet.tmp Replace";
rewtdcmd "Time";
[Commands for the branch-swapping cycles under the original weighting
scheme. This is the tree search that tries to find the peak of the
island.]
normcmd "HSearch Status=no Start=1 Swap=spr MulTrees=yes";
[Update the *.tmp file to contain the current starting tree.
Don't change.]
normcmd "SaveTrees File=lratchet.tmp Replace";
11
[Update the set of optimal trees over all iterations. Note that both the
'GetTrees' and 'SaveTrees' commands are used in order to get all of the
trees into a single Trees block in the treefile (the default in PAUP* is
to create a separate block for each ratchet iteration). Don't change.]
normcmd "GetTrees Rooted=no Unrooted=yes File=lratchet.tre Mode=7";
normcmd "SaveTrees File=lratchet.tre Replace";
normcmd "GetTrees Rooted=no Unrooted=yes File=lratchet.tmp Mode=3
Warntree=no";
normcmd "Time";
[Retrieve the final set of optimal trees at the end of the ratchet search.
Don't change.]
stopcmd "GetTrees File=lratchet.tre Mode=3";
[Print the negative log-likelihoods and the between-tree distances
for the set of optimal solutions. Note that the trees are numbered
in reverse order (i.e. the final-iteration tree is #1). There will be
more than nreps+1 trees if some of the iterations found several equally
optimal trees. Don't change.]
stopcmd "LScores All / SortTrees=yes";
stopcmd "TreeDist Metric=symdiff";
stopcmd "Time";
[Stop the logging of the display buffer.]
stopcmd "Log Stop";
[Final message.]
stopcmd
stopcmd
stopcmd
stopcmd
stopcmd
stopcmd
stopcmd
"[!* * * * * * * * * * * * * * * *
"[!* -- THIS SEARCH IS COMPLETE -"[!* A LOG FILE HAS BEEN WRITTEN
"[!* AND ALL TREES HAVE BEEN SAVED
"[!*
IT IS OKAY TO QUIT PAUP
"[!* * * * * * * * * * * * * * * *
"Quit";
[Define the name of the ratchet script file.]
write file=lratchet.nex;
end;
*]";
*]";
*]";
*]";
*]";
*]";
Download