Class 9: Phylogenetic Trees

advertisement
Phylogenetic Trees
The Tree of Life, Evolution
 Many theories of evolution
 Basic idea:
 speciation events lead to creation of different species
 Speciation caused by physical separation into groups where
different genetic variants become dominant
 Any two species share a (possibly distant) common ancestor
Phylogenies
 A phylogeny is a tree that describes the sequence of speciation
events that lead to the forming of a set of current day species
Leafs - current day species
Nodes - hypothetical most recent common ancestors
 Edges length - “time” from one speciation to the next


Primate evolution
 Until
mid 1950’s phylogenies were constructed by experts based on
their opinion (subjective criteria)
 The Linnaeus classification scheme implicitly assumes tree
structure
 Since then, focus on objective criteria for constructing phylogenetic
trees
 Important for many aspects of biology
 Classification (systematics)
 Understanding biological mechanisms
 Taxonomy deals with the naming and ordering of taxa.
The Linnaean hierarchy:
1. Kingdom
2. Division
3. Class
4. Order
5. Family
6. Genus
7. Species
Morphological vs. Molecular
 Classical
phylogenetic analysis: morphological features
 number of legs, lengths of legs, etc.
 Modern biological methods allow to use molecular features
 Gene sequences
Protein sequences
 Analysis based on homologous sequences (e.g., globins) in
different species

Dangers in Molecular Phylogenies
 We
have to remember that gene/protein sequence can be
homologous for different reasons:
 Orthologs
-- sequences diverged after a speciation event
 Paralogs -- sequences diverged after a duplication event
 Xenologs -- sequences diverged after a horizontal transfer (e.g.,
by virus)
Types of Trees
 A natural model to consider is that of rooted trees- Depending on
the model, data from current day species does not distinguish
between different placements of the root
 Unrooted tree represents the same phylogeny with out the root
node
 Trees can either contain distances, or simply links and nodes.
Positioning Roots in Unrooted Trees
 We can estimate the position of the root by introducing an outgroup:
 a set of species that are definitely distant from all the species of
interest
Type of Data
 Distance-based
 Input is a matrix of distances between species
 Can be fraction of residue they disagree on, or alignment score
between them, or …
 Character-based
 Examine each character (e.g., residue) separately
Simple Distance-Based Method
Input: distance matrix between species
Outline:
 Cluster species together
 Initially clusters are singletons
 At each iteration combine two “closest” clusters to get a new one
UPGMA Clustering
 Let Ci and Cj be clusters, define distance between them to be
d (Ci , C j ) 
1
  d ( p, q )
| Ci || C j | pCi qC j
 When
we combine two cluster, Ci and Cj, to form a new cluster Ck,
then
d (Ck , Cl ) 
| Ci | d (Ci , Cl ) | C j | d (C j , Cl )
| Ci |  | C j |
Molecular Clock
 UPGMA implicitly assumes that all distances measure time in the
same way
 A weaker requirement is additivity
 In “real” tree, distances between species are the sum of
distances between intermediate nodes
k
d (i , j )  a  b
c
b
a
j
d (i , k )  a  c
d (j ,k )  b  c
i
 Suppose
input distances are additive
d (m, k ) 
1
(d (i , k )  d ( j , k )  d (i , j ))
2
Neighbor Joining
 Can
we use this fact to construct trees?
 Let
D(i, j )  d (i, j )  ( ri  rj )
where
ri 
1
d (i, k )

| L | 2 k
Theorem: if D(i,j) is minimal (among all pairs of leaves), then i and j
are neighbors in the tree
Neighbor Joining
 Set
L to contain all leaves
Iteration:
 Choose
i,j such that D(i,j) is minimal
 Create new node k, and set
1
(d (i, j )  ri  rj )
2
d ( j , k )  d (i, j )  d (i, k )
1
d (k , m)  (d (i, m)  d ( j, m)  d (i, j ))
2
d (i, k ) 
 remove
i,j from L, and add k
Terminate:
when |L| =2, connect two remaining nodes
Distance Based Methods
 If
we make strong assumptions on distances, we can reconstruct
trees
 In real-life distances are not additive
Download