Report - Computer Science

advertisement
1
Phylogenetic Tree Generation (August 2012)
Brandon J. Andrews

Abstract— In order to visualize differences between nucleotide
sequences it is often nice to have a visual representation in a tree
form. The algorithms for generating such trees are referred to as
phylogenetic tree generators and can generate trees by
calculating the difference between sequences and then clustering
them. The distance based algorithms create generally accurate
evolutionary paths showing where one genetic sequence might
have branched from. To solve this problem the Fitch-Margoliash
method is commonly used.
Index Terms—phylogenetic, bioinformatics, fitch
I. INTRODUCTION
P
trees allow for the visualization of the
similarities and dissimilarities between genetic sequences
and plot the possible evolutionary paths where the sequences
might have originated from. These algorithms allow
researchers to find information that would not be obvious from
just viewing the raw distance or other data calculated about
the sequences [1].
In some algorithms the evolutionary rate is taken into
consideration. So the further away from the root the longer it
took to for those changes to appear. Knowing the evolutionary
rate can then allow researchers to connect a time value to the
tree.
A tree can either be unrooted, indicating there is no start or
primary parent in the sequence, or rooted, indicating a possible
beginning to the sequence mutations.
There are many ways to create a phylogenetic tree. One of
the most common ways is via distance information between
each pair of sequences. The distances can be calculated in
many ways depending on the input data, but represent the
amount of dissimilarity in the sequences. It is often critical
that this distance data is linear in relationship to the amount of
dissimilarity and some algorithms require that criteria to
produce accurate trees.
Two distance based algorithms are usually discussed. The
Unweighted Pair Group Method with Arithmetic Mean
(UPGMA) method and the Fitch-Margoliash method. The
former is not used and produces inaccurate graphs since it
assumes each child has equal weighting, while the FithMargliash algorithm uses weighting to help closely preserve
the distances between nodes in the tree.
The Fitch-Algorithm is the focus in the following
paragraphs and the implementation.
HYLOGENETIC
Manuscript received August 16, 2012
II. RESEARCH AND DESIGN
A. Research
The research performed was simply to look for examples
and explanations of the algorithm and its proper
implementation. I found a slide which contained a brief
example of the algorithm which was enough to program the
whole algorithm from [2].
The other research was to get basic information about
phylogenetic trees; however, it turns out there’s a lot of
misinformation about how they are defined. Some papers for
instance will erroneously assume that they must be binary
trees when in fact a distance at a branch of zero indicates
multiple branches [1].
B. Design
Phylogenetic trees can be drawn linearly in two dimensions
or they can be drawn using polar coordinates which creates a
nice radial diagram. I chose to use a radial drawing for
aesthetic reasons. The core algorithm design was just ad-hoc
from an example in a Powerpoint presentation [2].
III. IMPLEMENTATION
The implementation used Javascript for computing and
HTML 5 Canvas for rendering. The Context2D API in Canvas
allows a user to easily draw lines, arcs, and text which are all
that is required to draw a phylogenetic tree from the outputted
nodes of the Fitch-Margoliash algorithm.
Since the algorithm can take in sequence data it often
requires that the distance matrix be computed. I used a very
simple hamming distance algorithm that counts the number of
mismatches and returns that as the distance into the distance
matrix for each pair of sequences. It works, but it is not a very
good algorithm for judging the dissimilarity of sequences. If a
single nucleotide is missing it skews the distances, so in
practice a better dissimilarity algorithm would be required.
IV. TESTING AND SAMPLE DATA
There is one set of test data that I kept finding over and over
again which is listed in Fig 1 [2]. Since the input to the
Fitch-Margoliash method is just a distance matrix it is either
generated from the sequence data or given separately in a
precomputed form. The example in Fig 1 was given
precomputed.
2
A
B
C
D
E
A
0
0
0
0
0
B
22
0
0
0
0
C
39
41
0
0
0
D
39
41
18
0
0
E
41
43
20
10
0
Fig. 1. This testing data found in multiple presentations.
V. RESULTS AND ANALYSIS
The Fitch-Margoliash method, while there might be minor
errors, generally creates the expected phylogenetic tree. The
lengths for each branch also correctly correspond to the
dissimilarity allowing easy visualization of which sequences
are similar and which are completely different.
To analyze the tree’s accuracy to the distance matrix one
only had to recomputed the distances between each node in
the tree and calculate the path distances through the branches.
An error could then be created to judge the accuracy of the
tree.
VI. CONCLUSION
The Fitch-Margoliash distance based method creates very
“accurate” trees that preserve most of the distances in the
inputted distance matrix. However, it’s important to realize
that the phylogenetic tree might not relate to any real
evolutionary mutations without further analysis, especially on
the scale of whole organisms [1].
REFERENCES
[1]
[2]
K. Louhisuo. (2004, May 4). Constructing phylogenetic trees with
UPGMA and Fitch-Margoliash. [Online]. Available:
http://www.niksula.cs.hut.fi/~klouhisu/Bioinfo/phyltree.pdf
J. Bacardit, and N. Krasnogor, Phylogenetic Trees [PPT]. Available:
http://www.cs.nott.ac.uk/~jqb/G53BIO/Slides/Phylogenetic%20Trees.pp
t
Brandon J. Andrews Lives in Kalamazoo, Michigan and
is a computer science graduate. Andrews received his
undergraduate computer science degree at Western
Michigan University (WMU), Kalamazoo, Michigan in
2010.
He works for the Office of Information Technology at
WMU maintaining printer and database servers while also being a part-time
teaching assistant.
Download