TableX_TRSoftwareOverview

advertisement
Table X. Software for strict gene tree reconciliation and gene tree estimation that is guided by reconciliation to a species tree.
some general cophylogeny estimation as well?
- Duplication cost
- Speciation cost
- Sorting cost
- HGT cost
(values or probs)
- Number cycles
(Cost values CAN be
determined
automatically)
?? Can values be
interpreted as RATES
or rate dedepent
parameters??
- Duplication rate (or
lambda distribution)
- MLE algorithm
- exhaustive ML (BOOL)
- Alg for hard instances
-Alg used to root the
species tree (
- Search heuristic
(randomized hill
climbing, partial queue
based, queue based)
C++
(Open)
MLE
CLI
- Species tree
- Gene tree labeled with
species names (not loci)
(Newick)
Parsimony
- Node labeled gene tree
(DLS)
- Rooted species tree
-Gene trees
(Newick/Nexus)
available
compiled for
linux, mac
os and
others
CmdLine
Python
Parsimony
Strict
Reconcil.
Java
(Open)
Designed
for inferring
ancestral
genes
Parsimony
Distance
Based??
Dollo Parsimony
- Species tree
- Gene tree
Must code by hand, no
GUI. Used by __ for
reconcilition.
- Species tree
(Newick)
- Gene sequences
(Fasta)
- List of orthologs and
paralogs
- BLAST parameters
Y
Y
Y
Y
N
N?
Y
Y
N?
Y
Y
Y
Y
N
N
N
N
N
Y
N
N
Y
Y
Y
Y
Y
N
N
N
Y
Y
N
N
N
Branch Length
- Visualization of the
cophylogeny (exporatable
as pdf through pringing
and svg)
- Cophylogeny as tab
delimited “nexus” file
Save as .. not working on
macbook but save-all
produces a nexus file
- Cost of events
Gene
Tree
Non-binary
DupTree 1.48
Also now “part
of” iGTP
ETE
http://ete.cgeno
mics.org/
Not necessary to
eval code but
mention library,
this lib is used
by TreeKO
EvolMap ^ *
v 1.0
http://kosikweb.mcdb.ucsb.
edu/evolmap/ind
ex.htm
Focus on
estimating
- Species tree
- Gene tree
- Tip map
(Single Nexus File)
Species
Tree
Unrooted
Python
(Open)
http://genome.cs.iastate.
edu/CBL/DupTree
Dynamic Programming
User Set
Parameters
IT IS UNCLEAR IF
SORTING IS ‘LOSS’
OR ILS??
DrML *
v 0.91.05
Infers rooted
species trees
from genetree
data
Parsimony
(DP, Parameter
Adaptive)
Output
Branch Length
Java &
EclipseRCP
(Closed)
Input
Non-binary
[1]
CoRe-PA
http://pacosy.inf
ormatik.unileipzig.de/49-1CoRe-PA.html
Runtime
(Best case/
Worst case)
Unrooted
GUI
CLI
Feasible Data Set
Sizes
ie. How many
species or genes
are feasible with
this approach
given runtime and
memory
requirements
Sequence
Evolution
Installed as
downloaded
but does not
do anything
when
launched,
More of a
tree stats
paper
Algorithm or Heuristic Uses
Lineage Sorting
CopyCat 1.14
http://ab.inf.u
nituebingen.de/sof
tware/copycat/w
elcome.html
Method
Horizontal Transfer
Interface
Gene Loss
Source
Code
Gene Duplication
Program
Y
N
N
ancestral
genome content
FORESTER *
(SDI/RSDI)
Java, Ruby
(LGPL v.3)
Parsimony
GIGA ^ *
v 1.0
v 1.1
[2]
GTP
Disscussed at
BipERL wit
hlink to
http://ginger.ucd
avis.edu/gtp/gtp.
html that does
not appear to
have
information
C
(Closed)
Distance
Based
Agglomerative
Clustering
GIGA is the name of the algorithm
-Species tree
-Gene tree
(phyloxml)
- Species tree
(Newick)
- Gene alignments
(Fasta)
-Node labeled gene tree
(phyloxml?)
- Node labeled gene tree
(NHX)
- Algorithm
(GSDI or SDI)
- Rooting function
- None available
Y
N
?
Y
Y
N
N
N
Y?
N
N?
Y
Y
N?
N
Parsimony
Citation as
Sanderson,M.J.
and
McMahon,M.M.
(2007) Inferring
angiosperm
phylogeny from
EST
data with
widespread gene
duplication.
BMC Evol.
Biol., 7 (Suppl.
1), S
iGTP
http://www.bio
medcentral.com/
14712105/11/574
[3]
Jane v 3.0
cophylogeny
http://www.cs.h
mc.edu/~hadas/j
ane/register.py
Parsimony
Consideres
dupcliation,
duplication-loss
and deep
coalescene
reconcilation cost
Java
(Free BSD
&
Apache 2.0)
Parsimony
by
Genetic
Algorithm
Heuristic
- Species tree “host”
- Gene tree “parasite”
- Guest2Host Tip Map
- Spatial proximity?
- Time zones (optional)
(Single Nexus File,
tab delim tree format)
- CLI produces mapping
text file to STDOUT
- Timing File produced
with –o option, this is for
visualizing in Jane at a
later time
- Number generations
(contstrained:1-500)
- Population size
(constrained: 1-1,500)
- Birth cost
- Duplication cost
- HGT cost
- Node based cost model
(Boolean)
- Failure to diverge cost
Y
Y
Y
N
C++
(Closed)
Binary on
available for
MacOS
10.4+,
Linux
x86(64)
Parsimony
- Dated species tree
(Newick w length or
bootstrap values)
- Dated gene tree
(Newick w branch
length or bootstrap
values)
- Maping of gene tree
onto species tree
- Details of the cost
-Duplication cost
-Loss cost
-Horizontal transfer cost
-Map root withing
species tree (Boolean)
Y
Y
Y
N
Java
(Closed)
Parsimony
on
- Species tree
- Gene tree
- Node labeled gene tree
(Newick/NHX/Notung)
- Duplication cost
- Loss cost
Y
Y
N
usage jane-cli.sh
[4]
Mowgli (MPR)
http://www.atgcmontpellier.fr/M
PR/
http://www.atgcmontpellier.fr/M
owgli/
[5]
Notung *
N
N
Y
N
Y
**
N
Y
N
Y
Y
**
***
v 2.6
Networks
(Newick/NHX/Notung)
- Gene alignments?
Phylodog
(unpublished)
Installation in
Progress
PhyloNet 2.4
http://www.ncbi.
nlm.nih.gov/pm
c/articles/PMC2
533029/
http://bioinfo.cs.
rice.edu/phylone
t/index.html
PrIME-GSR *
v 1.0
C++? with
BoostMPI ?
MLE
Java
(Gnu GPL)
Does not do GTR
but does MAST
and HGT
C++, Perl
(Open)
Bayesian
RAP *
v.
http://pbil.univlyon1.fr/softwar
e/RAP/RAP.htm
Not command
line, should
probably use
Rap-Green
Java
(open)
[6]
Rap-Green 1.0
http://code.googl
e.com/p/rapgreen/
Tagged Rap
This appears to
be the current
dev for of RAP,
This does have
cmd line
arguenst
Softparsmap *
v 1.02
[7]
Roots to
minimize the
number of gene
duplications and
gene los events.
SPIMAP ^ * *
v. 1.1
Need to point to
local python
library in
Home dir
SYNERGY ^
includes parameters used
- CAN REARRANGE TO
MINIMIZE COST TOO
- Rooted species tree
- Node labeled gene tree
- Reconciliation map?
- Species tree
(Newick/PRIME)
- Gene alignments
(Fasta)
- Gene to species Map
- Node labeled gene tree
(PRIME)
- Reconciliation map
(PRIME)
- Gene tree
- Unresolved species
tree
- Node labeled gene tree?
(Newick modified)
Java
(GNU-GPL)
- Conditional duplication
cost
- Edge weight threshold
- Unknown
- Substitution model
- Edge rate model
- Starting gene tree
- Duplication rate
- Loss rate
- Number of iterations
- Branchswapping (Y/N)
- thinning value <INT>
Y
N
N
Y
Y
Y
N
Y
N
N
Y
Y
N
N
Y
Y
-Node labeled gene tree
(phyloxml format)
Y
http://code.g
oogle.com/p
/rapgreen/wiki/
CommandLi
neDocument
ation
Java
(open)
Parsimony
Soft
Parsimony
C++, Python
(GPL v.2)
Algorithm
Bayesian
-Sequence information
(Darwin XML)
- Gene Trees
(Newick w bootstrap
vlas)
- Species Tree
(Genbank files as
Nodes.dmp and
names.dmp)
- Property file
- Species tree
(Newick)
- Gene alignments
(Fasta)
- Gene to species Map
Y
- Nodel labels
()
- Reconciliation map
(recon.tab.txt)
- HKY parameters
- Duplication rate
- Loss rate
- Number of Iterations
Y
Y
Y
Y
Y
Tarzan v 0.9
Cophylogeny
http://pacosy.inf
ormatik.unileipzig.de/51-0Tarzan.html
TreeBeST *
v.1.9.2
(unpublished)
TreeFitter
http://sourceforg
e.net/projects/tre
efitter/
but nothing
there
Also nothing at
http://www.ebc.
uu.se/systzoo/res
earch/treefitter/tr
eefitter.html
TreeKO *
that lacks
transferable
software
Java
(Closed)
C++, Java,
Perl
(GPL v.2)
GUI
Parsimony
MLE
- visualization of minimal
cost trees, no apparent
export
- Duplication cost
- Loss cost
- Speciation Cost
- Sorting Cost
- HGT cost
Y
Y
- Node labeled gene tree
(PRIME)
- Reconciliation map
(PRIME)
Y
Y
Y
Y
Y
Y?
Y
Y
Y
Y?
Y?
Parsimony
Python
Parsimony
- Species tree
- Gene tree
(Newick)
TreeKO Output text file
- Counts of duplications
and losses
Java
(MPL)
Parsimony
- Gene tree
- Species tree
- Host to Guest map
(Nexus format refered
to as Tanglegram)
- Jungle of solutions
described as graphs
(Graphviz dot format)
- Mapping Visualization
(pdf format)
/
http://treeko.cgenomics.org/doku.php?id=
start
Dependent on ETE
Required Python > 2.5
TreeMap 3.0
http://sydney.ed
u.au/engineering
/it/~mcharles/
http://sydney.ed
u.au/engineering
/it/~mcharles/sof
tware/treemap/T
reeMap3.0b.zip
But newest is at
google
https://sites.goog
le.com/site/coph
ylogeny/softwar
e
-Species tree with
divergence times
- Gene tree with
divergence times
(tab delimited tree
format similar to that
used by Jane)
Example input
- Species tree (Newick)
- Gene alignments
(FASTA)
Finds the Set of
Pareto Optimal
Solution
N
N
N
- No obvious way to
set the costs in the GUI or
CLI.
- The CLI does not really
seem to work on the
RCLUSTER,
The currently
development version of
the program allows some
command line options but
only produces unparsable
ascii trees and PDFs of
the maps
+ spimap substitution rates are generate by the program spimap-train-rates which comes with the SPIMAP program
^ journal article
* hyperlinks to software page and source code
** With Notung either the species tree or the gene tree must be binary when the other is multifurcated
** Notung with rearrange mode required gene trees with edge weights, representing bootstrap values, or edge length etc.
A reconciliation map is not an explicit output in the NOTUNG format output file. The species tree is present and the gene tree is present, but and explicit mapping between the two is not given. It seems however that the location of duplication nodes on the species tree can be inferred by fetching the subset of taxon names/leaf nodes that are children of the duplicaton node. This
should defined a unique set that we can map the edge the duplication event took place on.
Looks like the following are an evolution of the same basic code base and developers
GTP -> DupTree -> iGTP
GIGA generates using a method somewhat similar to UPGMA, however it takes evolutionary scenarios (ie duplication/loss) into account when joining clades
SPIMAP includes a training step that determines species specific evolutionary parameters
Does strict reconciliation alwaysw refer to placement of the nodes without modifying the topology of the gene tree???
A strict reconciliation takes a gene tree
Another program to consider for TEs is the Jane program. http://www.cs.hmc.edu/~hadas/jane/ which is specifically designed for host/parasite co-phylogenies. This one includes the ability to handle ‘host switched’ which can be construed in general terms to be horizontal transfer.
Wht is the relationship between the Mowgli program and phylodog??
TreeMap takes over two days and exceeds memory for trees with > 25 tips.
The analysis-> event costs option does not do anything.
Treebest may take unrooted or binary but not both??
Mowgli
would possibly be a useful tool for the discovery of horizontal transfer events in transposable element host/parasite studies.
NOTES:
In addition to comparing trees .. we may want to calculate logML of archetype tree reconciliation under multiple MCL criteria. This will help distinguish between a failure in the algorithm implement the ML or in the ML model itself.
For example, if the ML observed of the archetype tree is greater than the MLobs of the tree determined by the algorithim, we can fault the algorithm for not properly finding the ML. This is likely to be helpful for algorithms that use a heurestic and not an
exhaustive search to find the ML. If MLobs of the determined tree is greater than MLobs of the archetype tree, than we may fault the ML model itself; ie even a fully exhaustive search would find the incorrect reconciliation.
With tree best the input species nodes must all be named and fully sequenced genomes are indiciated with * to take loss into account
In general all methods can use at least binary rooted trees.
- parsimony
- maximum likelihood
- Bayesian methods.
Tree estimation can be an outcome of the algorithm, or gene trees can be supplied as input.
Need to consider if they can consider the following for Species Trees/Gene Trees:
 Multifurcated trees
 Rooted or unrooted trees
 Branch lengths
DrML
It is unclear if gene tree edges are considered in the reconciliation process.
Treebest now on github
https://github.com/lh3/repositories
It would be nice if loss cost could be set to be conditional on the number of extant genes in a taxon. ….
Also see similar database efforts available at
http://www.lirmm.fr/phylariane/resources.php
GIGA 1.0 works on linux on the rcluster but GIGA1.1 has a problem with glib library dependency
Bibliography
1. Meier-Kolthoff JP, Auch AF, Huson DH, Göker M: COPYCAT: cophylogenetic analysis tool. Bioinformatics (Oxford, England) 2007, 23:898-900.
2. Thomas PD: GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC bioinformatics 2010, 11:312.
3. Chaudhary R, Bansal MS, Wehe A, Fernández-Baca D, Eulenstein O: iGTP: a software package for large-scale gene tree parsimony analysis. BMC bioinformatics 2010, 11:574.
4. Conow C, Fielder D, Ovadia Y, Libeskind-Hadas R: Jane: a new tool for the cophylogeny reconstruction problem. Algorithms for molecular biology AMB 2010, 5:16.
5. Doyon J-P, Scornavacca C, Gorbunov K, et al.: An Efficient Algorithm for Gene/Species Trees Parsimonious Reconciliation with Losses, Duplications and Transfers. In COMPARATIVE GENOMICS, Lecture Notes in Computer Science. edited
by Tannier E Berlin, Heidelberg: Springer Berlin Heidelberg; 2011, 6398:93-108.
6. Dufayard J-F, Duret L, Penel S, et al.: Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics (Oxford, England) 2005, 21:2596-603.
7. Berglund-Sonnhammer A-C, Steffansson P, Betts MJ, Liberles DA: Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. Journal of molecular evolution 2006, 63:240-50.
Download