Table X. Software for strict gene tree reconciliation and gene tree estimation that is guided by reconciliation to a species tree. some general cophylogeny estimation as well? - Duplication cost - Speciation cost - Sorting cost - HGT cost (values or probs) - Number cycles (Cost values CAN be determined automatically) ?? Can values be interpreted as RATES or rate dedepent parameters?? - Duplication rate (or lambda distribution) - MLE algorithm - exhaustive ML (BOOL) - Alg for hard instances -Alg used to root the species tree ( - Search heuristic (randomized hill climbing, partial queue based, queue based) C++ (Open) MLE CLI - Species tree - Gene tree labeled with species names (not loci) (Newick) Parsimony - Node labeled gene tree (DLS) - Rooted species tree -Gene trees (Newick/Nexus) available compiled for linux, mac os and others CmdLine Python Parsimony Strict Reconcil. Java (Open) Designed for inferring ancestral genes Parsimony Distance Based?? Dollo Parsimony - Species tree - Gene tree Must code by hand, no GUI. Used by __ for reconcilition. - Species tree (Newick) - Gene sequences (Fasta) - List of orthologs and paralogs - BLAST parameters Y Y Y Y N N? Y Y N? Y Y Y Y N N N N N Y N N Y Y Y Y Y N N N Y Y N N N Branch Length - Visualization of the cophylogeny (exporatable as pdf through pringing and svg) - Cophylogeny as tab delimited “nexus” file Save as .. not working on macbook but save-all produces a nexus file - Cost of events Gene Tree Non-binary DupTree 1.48 Also now “part of” iGTP ETE http://ete.cgeno mics.org/ Not necessary to eval code but mention library, this lib is used by TreeKO EvolMap ^ * v 1.0 http://kosikweb.mcdb.ucsb. edu/evolmap/ind ex.htm Focus on estimating - Species tree - Gene tree - Tip map (Single Nexus File) Species Tree Unrooted Python (Open) http://genome.cs.iastate. edu/CBL/DupTree Dynamic Programming User Set Parameters IT IS UNCLEAR IF SORTING IS ‘LOSS’ OR ILS?? DrML * v 0.91.05 Infers rooted species trees from genetree data Parsimony (DP, Parameter Adaptive) Output Branch Length Java & EclipseRCP (Closed) Input Non-binary [1] CoRe-PA http://pacosy.inf ormatik.unileipzig.de/49-1CoRe-PA.html Runtime (Best case/ Worst case) Unrooted GUI CLI Feasible Data Set Sizes ie. How many species or genes are feasible with this approach given runtime and memory requirements Sequence Evolution Installed as downloaded but does not do anything when launched, More of a tree stats paper Algorithm or Heuristic Uses Lineage Sorting CopyCat 1.14 http://ab.inf.u nituebingen.de/sof tware/copycat/w elcome.html Method Horizontal Transfer Interface Gene Loss Source Code Gene Duplication Program Y N N ancestral genome content FORESTER * (SDI/RSDI) Java, Ruby (LGPL v.3) Parsimony GIGA ^ * v 1.0 v 1.1 [2] GTP Disscussed at BipERL wit hlink to http://ginger.ucd avis.edu/gtp/gtp. html that does not appear to have information C (Closed) Distance Based Agglomerative Clustering GIGA is the name of the algorithm -Species tree -Gene tree (phyloxml) - Species tree (Newick) - Gene alignments (Fasta) -Node labeled gene tree (phyloxml?) - Node labeled gene tree (NHX) - Algorithm (GSDI or SDI) - Rooting function - None available Y N ? Y Y N N N Y? N N? Y Y N? N Parsimony Citation as Sanderson,M.J. and McMahon,M.M. (2007) Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evol. Biol., 7 (Suppl. 1), S iGTP http://www.bio medcentral.com/ 14712105/11/574 [3] Jane v 3.0 cophylogeny http://www.cs.h mc.edu/~hadas/j ane/register.py Parsimony Consideres dupcliation, duplication-loss and deep coalescene reconcilation cost Java (Free BSD & Apache 2.0) Parsimony by Genetic Algorithm Heuristic - Species tree “host” - Gene tree “parasite” - Guest2Host Tip Map - Spatial proximity? - Time zones (optional) (Single Nexus File, tab delim tree format) - CLI produces mapping text file to STDOUT - Timing File produced with –o option, this is for visualizing in Jane at a later time - Number generations (contstrained:1-500) - Population size (constrained: 1-1,500) - Birth cost - Duplication cost - HGT cost - Node based cost model (Boolean) - Failure to diverge cost Y Y Y N C++ (Closed) Binary on available for MacOS 10.4+, Linux x86(64) Parsimony - Dated species tree (Newick w length or bootstrap values) - Dated gene tree (Newick w branch length or bootstrap values) - Maping of gene tree onto species tree - Details of the cost -Duplication cost -Loss cost -Horizontal transfer cost -Map root withing species tree (Boolean) Y Y Y N Java (Closed) Parsimony on - Species tree - Gene tree - Node labeled gene tree (Newick/NHX/Notung) - Duplication cost - Loss cost Y Y N usage jane-cli.sh [4] Mowgli (MPR) http://www.atgcmontpellier.fr/M PR/ http://www.atgcmontpellier.fr/M owgli/ [5] Notung * N N Y N Y ** N Y N Y Y ** *** v 2.6 Networks (Newick/NHX/Notung) - Gene alignments? Phylodog (unpublished) Installation in Progress PhyloNet 2.4 http://www.ncbi. nlm.nih.gov/pm c/articles/PMC2 533029/ http://bioinfo.cs. rice.edu/phylone t/index.html PrIME-GSR * v 1.0 C++? with BoostMPI ? MLE Java (Gnu GPL) Does not do GTR but does MAST and HGT C++, Perl (Open) Bayesian RAP * v. http://pbil.univlyon1.fr/softwar e/RAP/RAP.htm Not command line, should probably use Rap-Green Java (open) [6] Rap-Green 1.0 http://code.googl e.com/p/rapgreen/ Tagged Rap This appears to be the current dev for of RAP, This does have cmd line arguenst Softparsmap * v 1.02 [7] Roots to minimize the number of gene duplications and gene los events. SPIMAP ^ * * v. 1.1 Need to point to local python library in Home dir SYNERGY ^ includes parameters used - CAN REARRANGE TO MINIMIZE COST TOO - Rooted species tree - Node labeled gene tree - Reconciliation map? - Species tree (Newick/PRIME) - Gene alignments (Fasta) - Gene to species Map - Node labeled gene tree (PRIME) - Reconciliation map (PRIME) - Gene tree - Unresolved species tree - Node labeled gene tree? (Newick modified) Java (GNU-GPL) - Conditional duplication cost - Edge weight threshold - Unknown - Substitution model - Edge rate model - Starting gene tree - Duplication rate - Loss rate - Number of iterations - Branchswapping (Y/N) - thinning value <INT> Y N N Y Y Y N Y N N Y Y N N Y Y -Node labeled gene tree (phyloxml format) Y http://code.g oogle.com/p /rapgreen/wiki/ CommandLi neDocument ation Java (open) Parsimony Soft Parsimony C++, Python (GPL v.2) Algorithm Bayesian -Sequence information (Darwin XML) - Gene Trees (Newick w bootstrap vlas) - Species Tree (Genbank files as Nodes.dmp and names.dmp) - Property file - Species tree (Newick) - Gene alignments (Fasta) - Gene to species Map Y - Nodel labels () - Reconciliation map (recon.tab.txt) - HKY parameters - Duplication rate - Loss rate - Number of Iterations Y Y Y Y Y Tarzan v 0.9 Cophylogeny http://pacosy.inf ormatik.unileipzig.de/51-0Tarzan.html TreeBeST * v.1.9.2 (unpublished) TreeFitter http://sourceforg e.net/projects/tre efitter/ but nothing there Also nothing at http://www.ebc. uu.se/systzoo/res earch/treefitter/tr eefitter.html TreeKO * that lacks transferable software Java (Closed) C++, Java, Perl (GPL v.2) GUI Parsimony MLE - visualization of minimal cost trees, no apparent export - Duplication cost - Loss cost - Speciation Cost - Sorting Cost - HGT cost Y Y - Node labeled gene tree (PRIME) - Reconciliation map (PRIME) Y Y Y Y Y Y? Y Y Y Y? Y? Parsimony Python Parsimony - Species tree - Gene tree (Newick) TreeKO Output text file - Counts of duplications and losses Java (MPL) Parsimony - Gene tree - Species tree - Host to Guest map (Nexus format refered to as Tanglegram) - Jungle of solutions described as graphs (Graphviz dot format) - Mapping Visualization (pdf format) / http://treeko.cgenomics.org/doku.php?id= start Dependent on ETE Required Python > 2.5 TreeMap 3.0 http://sydney.ed u.au/engineering /it/~mcharles/ http://sydney.ed u.au/engineering /it/~mcharles/sof tware/treemap/T reeMap3.0b.zip But newest is at google https://sites.goog le.com/site/coph ylogeny/softwar e -Species tree with divergence times - Gene tree with divergence times (tab delimited tree format similar to that used by Jane) Example input - Species tree (Newick) - Gene alignments (FASTA) Finds the Set of Pareto Optimal Solution N N N - No obvious way to set the costs in the GUI or CLI. - The CLI does not really seem to work on the RCLUSTER, The currently development version of the program allows some command line options but only produces unparsable ascii trees and PDFs of the maps + spimap substitution rates are generate by the program spimap-train-rates which comes with the SPIMAP program ^ journal article * hyperlinks to software page and source code ** With Notung either the species tree or the gene tree must be binary when the other is multifurcated ** Notung with rearrange mode required gene trees with edge weights, representing bootstrap values, or edge length etc. A reconciliation map is not an explicit output in the NOTUNG format output file. The species tree is present and the gene tree is present, but and explicit mapping between the two is not given. It seems however that the location of duplication nodes on the species tree can be inferred by fetching the subset of taxon names/leaf nodes that are children of the duplicaton node. This should defined a unique set that we can map the edge the duplication event took place on. Looks like the following are an evolution of the same basic code base and developers GTP -> DupTree -> iGTP GIGA generates using a method somewhat similar to UPGMA, however it takes evolutionary scenarios (ie duplication/loss) into account when joining clades SPIMAP includes a training step that determines species specific evolutionary parameters Does strict reconciliation alwaysw refer to placement of the nodes without modifying the topology of the gene tree??? A strict reconciliation takes a gene tree Another program to consider for TEs is the Jane program. http://www.cs.hmc.edu/~hadas/jane/ which is specifically designed for host/parasite co-phylogenies. This one includes the ability to handle ‘host switched’ which can be construed in general terms to be horizontal transfer. Wht is the relationship between the Mowgli program and phylodog?? TreeMap takes over two days and exceeds memory for trees with > 25 tips. The analysis-> event costs option does not do anything. Treebest may take unrooted or binary but not both?? Mowgli would possibly be a useful tool for the discovery of horizontal transfer events in transposable element host/parasite studies. NOTES: In addition to comparing trees .. we may want to calculate logML of archetype tree reconciliation under multiple MCL criteria. This will help distinguish between a failure in the algorithm implement the ML or in the ML model itself. For example, if the ML observed of the archetype tree is greater than the MLobs of the tree determined by the algorithim, we can fault the algorithm for not properly finding the ML. This is likely to be helpful for algorithms that use a heurestic and not an exhaustive search to find the ML. If MLobs of the determined tree is greater than MLobs of the archetype tree, than we may fault the ML model itself; ie even a fully exhaustive search would find the incorrect reconciliation. With tree best the input species nodes must all be named and fully sequenced genomes are indiciated with * to take loss into account In general all methods can use at least binary rooted trees. - parsimony - maximum likelihood - Bayesian methods. Tree estimation can be an outcome of the algorithm, or gene trees can be supplied as input. Need to consider if they can consider the following for Species Trees/Gene Trees: Multifurcated trees Rooted or unrooted trees Branch lengths DrML It is unclear if gene tree edges are considered in the reconciliation process. Treebest now on github https://github.com/lh3/repositories It would be nice if loss cost could be set to be conditional on the number of extant genes in a taxon. …. Also see similar database efforts available at http://www.lirmm.fr/phylariane/resources.php GIGA 1.0 works on linux on the rcluster but GIGA1.1 has a problem with glib library dependency Bibliography 1. Meier-Kolthoff JP, Auch AF, Huson DH, Göker M: COPYCAT: cophylogenetic analysis tool. Bioinformatics (Oxford, England) 2007, 23:898-900. 2. Thomas PD: GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC bioinformatics 2010, 11:312. 3. Chaudhary R, Bansal MS, Wehe A, Fernández-Baca D, Eulenstein O: iGTP: a software package for large-scale gene tree parsimony analysis. BMC bioinformatics 2010, 11:574. 4. Conow C, Fielder D, Ovadia Y, Libeskind-Hadas R: Jane: a new tool for the cophylogeny reconstruction problem. Algorithms for molecular biology AMB 2010, 5:16. 5. Doyon J-P, Scornavacca C, Gorbunov K, et al.: An Efficient Algorithm for Gene/Species Trees Parsimonious Reconciliation with Losses, Duplications and Transfers. In COMPARATIVE GENOMICS, Lecture Notes in Computer Science. edited by Tannier E Berlin, Heidelberg: Springer Berlin Heidelberg; 2011, 6398:93-108. 6. Dufayard J-F, Duret L, Penel S, et al.: Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics (Oxford, England) 2005, 21:2596-603. 7. Berglund-Sonnhammer A-C, Steffansson P, Betts MJ, Liberles DA: Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. Journal of molecular evolution 2006, 63:240-50.