Claude dePamphilis Lecture 1, Phylogenetics Biol 597 Fall 2005 Why tree building? Can learn a lot from trees – almost never interested in the tree itself as the end product, but what we can learn from the tree. improves sequence alignment platform for comparative biology trace history of genomes, genes, organisms Tree is a way of organizing information, and making it possible to make inferences or predictions. Roughly divided into five "units". Some of the topics that you should especially focus on, with one or a few goals or questions follow each section. Concepts of trees, viewing trees, interpreting trees. Difference between gene and species trees. Other basic concepts. Methods of tree building: begin with multiple sequence alignment, several tree building approaches o Parsimony: o Distance models for sequence evolution: inference of constraint and adaptive evolution o Likelihood methods as a general tool for hypothesis testing o Confidence in trees Inferences from trees: gene family analysis, constraint in protein coding regions, inference of ancestors Software Specialty topics: molecular clocks, Bayesian inference 1 1. concepts of trees and inferences based on trees - Graphical hypotheses of evolutionary history (and shared ancestry) coli worm human increasing complexity - yes problem: contemp.ancestry - trees as hypotheses of evolutionary history (and shared ancestry) coli worm human shared ancestry branch points – nodes – ancestors example end points – leaves – terminals – tips – current day 2 Views of trees o Any Rotation – same thing o Topology defines the tree Rooted trees, unrooted trees o significance: rooted trees imply ORDER of events branching order (cladogram), or degree of divergence (phylogram) o Show branching order only – topology - cladogram o Show degree of divergence – phenogram Examples – fish, chicken, mouse, human – rooted phenogram – 3 - species trees [ORTHOLOGOUS SEQUENCES] o each leaf - different species o data may be single gene or many genes combined together o usually – gene or protein sequence – must be aligned (MSA). See Clustal paper. o phylogenetic tree – reveals species relationships – examples – tree of life – o monophyly, paraphyly, polyphyly -- draw tree - monophyletic – common ancestor and all descendents example: mammals, flowering plants paraphyletic – common ancestor, but not all descendents. o Prokaryote o dinosaurs o “seed plant” polyphyletic - common ancestor not group o organisms with color vision 4 common ancestry: ancestral organisms will be able to infer features of these ancestors – organisms if tree is rooted, can infer order of the events who came first, the chicken or the egg? Example: pro prop pro pro euk euk euk - gene trees I: o contain different genes from one or more organisms example – globin genes – everyone w/ 2 copies alpha alpha alpha alpha beta o traces history of gene duplications – paralog – genes whose common ancestor is a gene dup event ortholog – genes whose ancestor is a speciation event 5 gene family within one organism – all paralogs - o common ancestry: ancestral genes can make predictions about these ancestors genes example – presence of motifA – in each – ancestor had it - common: combination gene and species tree; multiple genes, multiple species o example – drawn example of a gene and species trees eg., chicken or the egg? - HOMOPLASY: convergence, parallelism, reversal - inference of ancestral states using ACCTRAN * be able to "read" a phylogenetic tree, and draw correct inferences about the monophyly of groups of organisms or sequences 6 * given a tree and a set of data for a given character, be able to infer the ancestral states of the character using the method of ACCTRAN 2. methods of building phylogenetic trees - input for tree building: multiple sequence alignment brief overview of Clustal - parsimony, distance, likelihood compared and contrasted - the basic approaches, similarities and differences - standard (nonparametric) bootstrap in phylogenies: use and interpretation - strengths and weaknesses of each of the major methods - PHYLIP as an intro to computer programs for phylogeny * be able to perform and interpret a small parsimony analysis by hand, as we did in class, or using any of the main approaches including boostrap, with PHYLIP 3. distance models of sequence evolution * contrast the different distance models for sequence (or protein) evolution. What are some advantages and disadvantages? 4. maximum likelihood as a general tool for hypothesis testing * what is the likelihood ratio test and how is it used to test a wide variety of possible hypotheses about sequence evolution, such as: rates of evolution, monophyly of group or sequences, similarity of branching history of two trees, etc. 7 * be able to outline or diagram the goals and basic steps in a parametric bootstrap analysis, and it's use in hypothesis testing in sequence studies. 5. further concepts and their application - gene families II, reconciled gene trees - long branch attraction conditions, causes * The example given in class of phylogenetic analyses of invertebrate animals was a good example of a dataset where different methods gave different results, but exploring the different results led to a better understanding of the history of the sequences. What were some "take home lessons" to be gained from this example? 8