Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them. Organisms with many characters in common are more likely to be related than those with few in common. We want to use characters that are homologous [shared because of common ancestry] rather than analagous [independently evolved]. But how is this to be done? Turns out that there are many approaches the first of which is to apply parsimony. The basic idea of parsimony in tree building is to build a tree that requires the fewest evolutionary changes in its construction. In the following trees one species differs from the other three. In each tree a single evolutionary change is all that is required to build it. Similarly, we can [in the next slide] analyze a situation where two non-sister taxa (3&4) share a trait. There are two equally likely explanations in this case. The same logic applies when dealing with multiple traits (3 traits each with two states in the next example). Each trait is treated separately and the most parsimonious explanation is calculated. When the data are pooled a total of five changes are present on the tree. Its turns out that the tree we just dealt with is not the most parsimonious tree. It is possible to build a tree that has only three changes [it is impossible to have fewer than three changes]. In the previous example it was easy to see the minimum number of changes needed to make a most parsimonious tree. For larger trees this is not so simple to do. The Fitch algorithm can be used to figure the minimum number of changes necessary for a given tree. The Fitch algorithm begins at the branch tips of a tree and proceeds towards the base of the tree. A running count is kept of the number of the character changes needed. As we proceed down the tree each internal node is assigned one or more character states. Two rules are used to assign character states at nodes. Rule 1. If the two daughters of a node share no stated in common we assign to the node all possible states for both daughters. In other words the set of possible traits at the node is the union of the sets of possible traits for daughters 1 and 2. In this case we increase the tally of character changes by one. Rule 2. If the daughters of a node share one or more possible states of a trait then we assign the shared states to the node. In other words we assign the intersection of the sets of possible states for each daughter to the node. In this case we do not increase the tally of character changes. The Fitch algorithm just tells us the minimum number of changes needed for a given tree. It does not tell us if a different tree would have fewer. In order to compare different trees to find the most parsimonious we would have to repeat the Fitch process for all the trees. Another approach to building phylogenetic trees is to use distance methods. In this approach pairwise distances, (where distance is a measure of morphological or genetic differences between species) are calculated and used in tree construction. Distances can be: › Counts of the number of character differences between species. › Based on morphological measurements › In living species most commonly counts of base pair differences in DNA sequences or amino acid differences coded for are used to build trees. Because insertion/deletion mutations occur and can shift the reading frame of a length of DNA sometimes sequences need to be aligned before using them to build a phylogenetic tree. Once distance measures have been calculated the pairwise measures (differences between individual pairs of species) are arranged into a distance matrix. Once distance measures are tabulated we need to figure out how to arrange these data on a tree and decide how long to make the branches. For four species there is only one basic tree shape and only three pairwise species arrangements. There are multiple statistical procedures that can be used to construct trees using distance data. The details of these are beyond the scope of this class. However, the aim of all of them is to find a tree topology (or structure) in which each pairwise distance in the tree is as close as possible to that in the data matrix. One philosophical objection to trees built using distance methods is that they don’t explicitly incorporate underlying evolutionary relationships. They are similarity measures (and assume that similarity reflects homology), but analagous traits may sometimes be used. We have spent a lot of time looking at ways of assessing how well trees are supported by data. However, the big challenge in building phylogenies is in identifying potentially useful trees from the huge number of potential trees It turns out that the number of potential phylogenetic trees increases exponentially with the number of taxa in the tree. The challenge for phylogenticists who cannot search every possible tree is to develop strategies to search only for plausible trees. Very computer intensive algorithms are used to do this, but the underlying methodologies are beyond the scope of this class. Phylogenetic trees are hypotheses about the relationships between taxa. Once a tree is constructed how much confidence can we have that the tree (or some part of it) is correct? This is an issue of statistical confidence. There are a number of techniques that scientists have developed to measure how well the data support a given tree. One of the most widely used is bootstrap resampling. Bootstrap resampling is based on the idea that the data set that the phylogeny is based on is itself only one possible set of data that the tree could have been built with. How sensitive is the tree’s structure to the set of data we used? If we had used a similar but not identical set of data would we have produced the same tree? To carry out a bootstrap analysis we simply resample from our original character matrix. We randomly pick sets of traits with replacement from our data set and the new data matrix is used to build a phylogenetic tree. That tree is then compared to the original tree. After repeated bootstrap resamplings we see how often the new trees match the original tree. If resampled trees match the original tree 90% of the time we say the tree has 90% bootstrap support. For a considerable period of time before widespread genomic analysis there was controversy about whether the closest relatives of the eutherian (or placental) mammals were the marsupials or the monotremes. In 2001 Killian et al. sequenced a large nuclear gene from 11 species of placental mammal, two marsupials and two monotremes. Using the sequence data they constructed a phylogeny of the mammals that indicated the placental and marsupial mammals were sister groups. To check how strongly their data supported the monophyly of the placental and marsupial mammals Killian et al. carried out a bootstrap resampling analysis of their data. The results showed that the marsupials and placental mammals formed a monophyletic clade in 100% of the trees. The bootstrap analysis thus indicated that strongly supported for this data set the monophyly of the placental and marsupial mammals. Since Killian’s paper numerous other studies of nuclear DNA have supported this conclusion.