BALANCED MINIMUM EVOLUTION DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum Evolution (BME) is a distance based method to go from a distance matrix to a phylogenetic tree. MINIMUM EVOLUTION PHYLOGENETIC RECONSTRUCTION Tree topology being considered. Assign branch lengths using ME. Fixed distance matrix. Sum up branch lengths (ex. 36) Goal: Find tree topology T with smallest sum of branch lengths (assigned by ME). That is, find smallest sum of branch lengths for all (2n-5)!! binary tree topologies! MINIMUM EVOLUTION PHYLOGENETIC RECONSTRUCTION • Given the matrix of pairwise evolutionary distances, the ME approach estimates the length of any given tree topology and then selects the tree topology with shortest length. • Minimum evolution is conceptually close to characterbased parsimony. • Complies with Occam’s principle of scientific inference, which essentially maintains that simpler explanations are preferable to more complicated ones and that ad hoc explanations should be avoided. • Numerous variants of the ME principle exist, depending on how the branch lengths are estimated and how the tree length is calculated from these branch lengths. MINIMUM EVOLUTION PHYLOGENETIC RECONSTRUCTION Tree topology being considered. Assign branch lengths using ME. Fixed distance matrix. Sum up branch lengths (ex. 36) How do we assign branch lengths to a tree topology??? LEAST SQUARES ESTIMATE (HOW TO ASSIGN BRANCH LENGTHS TO A TREE TOPOLOGY) Least Squares Observe red data points. Find blue quadratic which minimizes sum of the squared distances from the red points to the blue quadratic. ME analogy for least squares on trees Red dots Blue quadratic Residual/Error Estimated distances (D) Binary tree Sum of branch lengths MINIMUM EVOLUTION PHYLOGENETIC RECONSTRUCTION Tree topology being considered. Assign branch lengths using least squares. Fixed distance matrix. Sum up branch lengths (ex. 36) Goal: Find tree topology T with smallest sum of branch lengths (assigned by ME). That is, find smallest sum of branch lengths for all (2n-5)!! binary tree topologies! LEAST SQUARES ASSIGNMENT OF BRANCH LENGTHS • If distance estimates are independent with the same variance, use ordinary least squares (OLS). • If distance estimates are independent with different variance, use weighted least squares (WLS). (This is BME!) • Well known that distance estimates obtained from sequences do not have the same variance, because the largest distances are much more variable than the shortest ones (Fitch and Margoliash, 1967) and are mutually dependent when they share a common history (or path) in the true phylogeny (Nei and Jin, 1989). • Thus ordinary least-squares poorly fits the features of evolutionary distance data. BALANCED MINIMUM EVOLUTION • In BME, sibling subtrees have equal weight, as opposed to the standard unweighted OLS, where all taxa have the same weight and thus the weight of a subtree is equal to the number of its taxa. • BME is consistent! • BME is NP-Hard [W. Day (87)]. • BME outperforms Neighbor Joining, BIONJ, WEIGHBOR and FITCH [Desper, Gascuel 2002]. • Software (and web version) FastME is a heuristic which finds the BME solution. Uses NNI and SPR moves. WHY IS IT CALLED “BALANCED”? = distance estimate. or is the balanced distance between taxa in A and B in tree T. If B is composed to two subtrees B1 and B2: PAUPLIN’S FORMULA (SHORTCUT FOR BME!) D is the distance matrix. T is the tree topology considered. is the sum of branch lengths assigned by BME. BME VERSION 2.0 (PAUPLIN’S FORMULA) Instead of assigning branch lengths to tree topology T using weighted least squares then summing edge lengths, cut to the chase and use Pauplin’s formula! Given distance matrix D, find binary tree T with the smallest sum of total branch lengths: EXERCISE Which tree is the BME optimal? Why? FASTME ON THE WEB http://www.atgc-montpellier.fr/fastme/ • Submit distance matrix in Phylip format. • Initial tree: OLS_GME, balanced_GME, NJ or BIONJ. • Finds optimal tree using moves: OLS_NNI or balanced_NNI. • Enter email and wait for results! • Self-contained executable available. COMPUTATIONAL EXAMPLE Download sequence at: http://dl.dropbox.com/u/623333/BME%20Example/GeneSeq8taxa .nex Calculate distance matrix (use HKY): http://bioweb2.pasteur.fr/phylogeny/intro-en.html Compute BME tree: http://www.atgc-montpellier.fr/fastme/ REFERENCES "Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.” Desper R., Gascuel O., Journal of Computational Biology. 2002 9(5):687705. "Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting.” Desper R., Gascuel O., Molecular Biology and Evolution. 2004 21(3):587-598. "Getting a Tree Fast: Neighbor Joining, FastME, and Distance-Based Methods." Desper R., Gascuel O., Current Protocols in Bioinformatics. 2006 6.3.1-6.3.28. Edited by John Wiley & Sons "Neighbor-Joining Revealed." Gascuel O., Steel M., Molecular Biology and Evolution. 2006 23(11):1997-2000.