Corresponding Word File

advertisement
Split decomposition — Intro
In reconstructing a phylogenetic tree, we are essentially framing a series of hypotheses
about the relationships between species. Generating hypotheses is an important
component of the scientific process, but not the entirety. If we want to have confidence
in the tree we have just constructed (or in any other hypothesis), it’s important to test it.
One simple technique for testing phylogenetic hypotheses is split decomposition. This
procedure measures the support for evolutionary relationships among a set of four taxa,
or quartet. There are three different ways in which members of a quartet can be related,
assuming a strictly bifurcating tree:
A
B
Topology 1:
((A, B), (C, D))
C
A
D
C
Topology 2:
((A, C), (B, D))
B
A
D
D
B
Topology 3:
((A, D), (B, C))
C
Note that these are unrooted trees, also called phylogenetic networks: split decomposition
doesn’t depend on the location of the evolutionary root. Also note that because we are
interested in the system’s topology, the physical position of the taxa is irrelevant. For
example, the networks ((A,B), (D,C)) , ((C,D), (B,A)) , and ((D,C), (A,B)) are all
topologically equivalent to topology 1.
Given a distance matrix for the four taxa, we can calculate the split indices for each of the
three topologies shown above. We begin by adding together the distances of taxa at
opposite corners of the phylogenetic network. Next, we sum the distances of taxa on
each end of the internal branch, and subtract this sum from the previous result. Finally,
we divide by two to obtain the length of the internal branch. It may be helpful to
envision this procedure graphically:
2
For topology 1, for example, we could calculate the split index as
(AD  BC)  (AB  CD)
,
2
where XY is the distance between taxa X and Y.
Alternatively, since the physical position of taxa doesn’t matter, we could just as well
twist topology 1 around the internal branch and then calculate its split index as
(AC  BD)  (AB  CD)
.
2
Thus, each topology has two split indices, which may or may not be equal.
In practice, only one of the three possible topologies can correctly describe the four
taxa’s phylogenetic relationship. That topology’s split indices are equal to the length of
its internal branch. We can also find the “internal branchlength” for the other two
topologies by calculating their split indices; however, since those topologies describe
incorrect phylogenetic relationships, these indices have no biological interpretation and
need not be identical (or even positive). Split indices therefore provide a way to test the
relative support for each possible topology of a phylogenetic network.
Excel worksheet: “Split decomposition”
Summary: This worksheet generates a pairwise distance matrix for a quartet, using data
entered by the user, randomly generated, or a combination of the two. The sheet then
calculates split indices for each of the three possible topologies.
In the first row of red-lined cells, enter any branchlengths you want fixed. You may also
enter a number into the next red cell: the worksheet will then generate random
branchlengths lower than or equal to this maximum. Note that user-entered
branchlengths override randomly generated ones.
If you want to control the phylogenetic network’s true topology, you may enter the letter
corresponding to taxon A’s closest relative into the final red cell. Otherwise, a random
closest relative will be chosen. The program will then calculate a distance matrix for the
four taxa and split indices for each of the three possible topologies.
Points to note:
• The two taxa with the shortest distance between them aren’t necessarily closest
phylogenetic relatives. For example, on the following network, the true topology is
((A, C), (B, D)). However, the shortest distance between two taxa is between A and D.
Distance-based methods of phylogenetic reconstruction will therefore tend to yield the
incorrect topology ((A, D), (B, C)). It’s not until we perform the split decomposition
that the error becomes clear.
A
4
2
D
3
8
C
6
B
Download