# Document ```Tree Searching Methods
• Exhaustive search (exact)
• Branch-and-bound search (exact)
• Heuristic search methods (approximate)
– Branch swapping
– Star decomposition
Exhaustive Search
12
12
13
13
13
12
13
13
12
13
11
13
13
13
13
Searching for trees
• Generation of all possible trees
1.Generate all 3 trees for first 4 taxa:
Searching for trees
2. Generate all 15 trees for first 5 taxa:
(likewise for each of the other two 4-taxon trees)
Searching for trees
3. Full search tree:
Searching for trees
Branch and bound algorithm:
The search tree is the same as
for exhaustive search, with tree
lengths for a hypothetical data
set shown in boldface type. If a
tree lying at a node of this search
tree has a length that exceeds
the current lower bound on the
optimal tree length, this path of
the search tree is terminated
(indicated by a cross-bar), and
the algorithm backtracks and
takes the next available path.
When a tip of the search tree is
reached (i.e., when we arrive at
a tree containing the full set of
taxa), the tree is either optimal
(and hence retained) or
suboptimal (and rejected). When
all paths leading from the initial
3-taxon tree have been explored,
the algorithm terminates, and all
most-parsimonious trees will
have been identified. Asterisks
indicate points at which the
current lower bound is reduced.
Circled numbers represent the
order in which phylogenetic trees
are visited in the search tree.
2
1
3
1
2
1
3
1
2
4
3
2
4
3
4
Searching for trees
to the example used for branch-and-bound.
The best 4-taxon tree is determined by
evaluating the lengths of the three trees
obtained by joining taxon D to tree 1
containing only the first three taxa. Taxa E
and F are then connected to the five and
seven possible locations, respectively, on
trees 4 and 9, with only the shortest trees
found during each step being used for the
next step. In this example, the 233-step tree
obtained is not a global optimum. Circled
numbers indicate the order in which
phylogenetic trees are evaluated in the
• As Is
– add in order found in matrix
• Closest
– add unplaced taxa that requires smallest increase
• Furthest
– add unplaced taxa that requires largest increase
• Simple
– Farris’s (1970) “simple algorithm” uses a set of pairwise
reference distances
• Random
– random permutation of taxa is used to select the order
Branch swapping
Nearest Neighbor Interchange (NNI)
B
C
B
E
C
A
D
D
E
A
E
D
A
C
B
Branch swapping
Subtree Pruning and Regrafting (SPR)
B
A
E
B
D
A
C

D
F
C
G
D
C
E
E
A
F
F
G
B
G
a
Branch swapping
Tree Bisection and Reconnection (TBR)
E
D
B
A
F
G
B
A
A
C
C
E

D
D
C
B
D
F
G
E
F
G
E
E
C
D
A
F
G
F
B
G
B
A
C
Reconnection limits in TBR
3
2
r
4
x
s
5
u
t
v
z
y
w
1
3
2
6
4
u
x
x'
2
6
1
4
0
1
5
6
5
1
Reconnection distances:
3
w
3
2
4
v
z
1
5
2
0
1
2
6
Reconnection limits in TBR
3
2
4
r
(D)
s
2
1
5
5
4
v
y
y'
w
1
3
6
3
2
4
1
Reconnection distances:
1
1
6
5
1
0
0
1
6
In PAUP*, use “ReconLim” to set maximum reconnection distance
Star-decomposition search
Overview of maximum likelihood as used
in phylogenetics
• Overall goal: Find a tree topology (and associated parameter estimates)
that maximizes the probability of obtaining the observed data, given a
model of evolution
Likelihood(hypothesis) Prob(data|hypothesis)
Likelihood(tree,model) = k Prob(observed sequences|tree,model)
[not Prob(tree|data,model)]
Computing the likelihood of a single tree
1
j
N
(1) C…GGACA…C…GTTTA…C
(2) C…AGACA…C…CTCTA…C
(3) C…GGATA…A…GTTAA…C
(4) C…GGATA…G…CCTAG…C
(1)
(3)
C
C
A
(5)
(2)
(4)
(6)
G
Computing the likelihood of a single tree
Likelihood at site j =
C
Prob
C
A
G
C
+ Prob
A
C
But use Felsenstein (1981) pruning algorithm
A
G
A
C
…
G
C
A
+
A
+
Prob
C
T
T
Computing the likelihood of a single tree
L  L1L2
N
LN   L j
j1


ln L  ln L1  ln L2 
N
ln LN   ln L1
j1
Note: PAUP* reports -ln L, so lower -ln L implies higher likelihood
Finding the maximum-likelihood tree
(in principle)
• Evaluate the likelihood of each possible
tree for a given collection of taxa.
• Choose the tree topology which
maximizes the likelihood over all
possible trees.
Probability calculations require…
• An explicit model of substitution that specifies change
probabilities for a given branch length
“Instantaneous rate matrix”
 A rAA  CrAC  G rAG  T rAT 


 A rCA  CrCC  G rCG  T rCT 

Q
 A rGA  CrGC  G rGG  T rGT 


 A rTA  CrTC  G rTG  T rTT 

Jukes-Cantor
Kimura 2-parameter
Hasegawa-Kishino-Yano (HKY)
Felsenstein 1981, 1984
General time-reversible
• An estimate of optimal branch lengths in units of
expected amount of change ( = rate x time)
P(v)  e
Q

For example:
    







Q  
    








    







Q  
    








Kimura (1980) “2-parameter”
Jukes-Cantor (1969)
 

 A

Q
 A

 A
 C  G  T  


 G  T 
 C

 T  

 C  G
 
Hasegawa-Kishino-Yano (1985)


 A rAA  CrAC  G rAG  T rAT 



r

r

r

r
C CC
G CG
T CT 
Q   A CA
 A rGA  CrGC  G rGG  T rGT 



r

r

r

r
 A TA
C TC
G TG
T TT 
General-Time Reversible
E.g., transition probabilities for
HKY and F84:



   
1
j   A
     1e    j
e
j 



 j
j
 j 




 1

 


e    j e  A
Pij t    j   j 

1


 

 j

 j 




1

e


 j


(i  j)
(i  j, transition)
(i  j, transversion)
A Family of Reversible Substitution Models
The Relevance of Branch Lengths
C C
C
A
A
A
A
A
A
A
A
A
C C A
A
A
A
C
A
A
A
A
A
When does maximum likelihood work
better than parsimony?
• When you’re in the “Felsenstein Zone”
A
C
(Felsenstein, 1978)
B
D
In the Felsenstein Zone
A
B
0.8
0.8
0.1
0.1
0.1
C
Substitution rates:
Base frequencies:
D
A
C
G
T
A
5
6
2
A=0.1
C=0.2
G=0.3
T=0.4
C
5
3
8
G
6
3
1
T
2
8
1
-
In the Felsenstein Zone
Proportion correct
1
0.8
parsimony
ML-GTR
0.6
0.4
0.2
0
0
5000
Sequence Length
10000
The long-branch attraction (LBA) problem
Pattern type
1
A
I = Uninformative (constant)
4
A
The true phylogeny of
1, 2, 3 and 4
(zero changes required on any
tree)
A
2
A
3
The long-branch attraction (LBA) problem
Pattern type
1
A
A
I = Uninformative (constant)
II = Uninformative
4
A
G
The true phylogeny of
1, 2, 3 and 4
(one change required on any tree)
A
2
A
3
The long-branch attraction (LBA) problem
Pattern type
1
A
A
C
I = Uninformative (constant)
II = Uninformative
III = Uninformative
4
A
G
G
The true phylogeny of
1, 2, 3 and 4
(two changes required on any tree)
A
2
A
3
The long-branch attraction (LBA) problem
Pattern type
1
A
A
C
G
I = Uninformative (constant)
II = Uninformative
III = Uninformative
IV = Misinformative
4
A
G
G
G
The true phylogeny of
1, 2, 3 and 4
(two changes required on true tree)
A
2
A
3
The long-branch attraction (LBA) problem
… but this tree needs only one step
G
1
A
2
A
3
G
4
and suitability of models
(assumptions)
Consistency
If an estimator converges to the true value of a
parameter as the amount of data increases toward
infinity, the estimator is consistent.
When do both methods fail?
• When there is insufficient phylogenetic signal...
1
3
2
4
When does parsimony work “better”
than maximum likelihood?
• When you’re in the Inverse-Felsenstein (“Farris”) zone
A
C
(Siddall, 1998)
D
B
Siddall (1998) parameter space
a
b
b
a
b
0.75
pa
Both methods do poorly
Parsimony has higher
accuracy than likelihood
0
pb
0.75
Both methods do well
Parsimony vs. likelihood in the Inverse-Felsenstein Zone
1
B
B
B
B
B
B
B
B
B
B
B
0.9
0.8
B
J
15%
67.5%
J
0.7
Accuracy
67.5%
0.6
J
0.5 J
(expected differences/site)
J
J
0.4
J
J
J
J
J
J
0.3
B Parsimony
J ML/JC
0.2
0.1
0
20
100
1,000
Sequence length
10,000
100,000
Why does parsimony do so well in the
Inverse-Felsenstein zone?
C
CA
True synapomorphy
A
A
C
C
CA
A
CA
Apparent synapomorphies
actually due to
misinterpreted homoplasy
A
C
C
CA
GA
G
C
C
A
A
Parsimony vs. likelihood in the Felsenstein Zone
1
J
0.9
J
J
J
0.8
J
0.7
Accuracy
J
J
67.5%
67.5%
J
0.6
15%
0.5
J
(expected differences/site)
J
0.4
0.3 J
J
B Parsimony
J ML/JC
0.2
B
0.1
B
B
0
20
100
B
B
B
B
1,000
Sequence length
B
B
10,000
B
B
B
100,000
From the Farris Zone to the Felsenstein Zone
A
B
A
A
C
C
C
D
D
D
B
A
B
B
A
C
C
D
D
B
External branches = 0.5 or 0.05 substitutions/site, Jukes-Cantor model of nucleotide substitution
G
1.0 H
J
Simulation
results:
H
G
J
H
G
J
H
G
J
H
G
J
H
G
J
Accuracy
0.8
0.6
Parsimony
0.4
0.2
0
0.05
J
100 sites
G
1,000 sites
H
10,000 sites
0.04
0.03
0.02
Farris zon e
1.0 H
H
J
0.01
J
H
G
0
J
G
H
0.01
H
H
H
G
H
Accuracy
J
J
J
0.4
0.2
0
0.05
H
G
0.03
G
H
0.04
G
H
0.05
Felsenstein zone
H
H
G
H
G
H
G
J
G
J
J
G
J
J
G
G
0.6
G
H
0.02
Length of internal branch ( d)
G
0.8
J
J
J
100 sites
G
1,000 sites
H
10,000 sites
0.04
Farris zon e
0.03
0.02
J
Likelihood
J
G
J
H
G
J
G
HJ
ML/JC
0.01
0
0.01
0.02
Length of internal branch ( d)
0.03
0.04
0.05
Felsenstein zone
Maximum likelihood models are
oversimplifications of reality. If I assume the
wrong model, won’t my results be meaningless?
• Not necessarily (maximum likelihood is pretty robust)
Model used for simulation...
A
B
0.8
0.8
0.1
0.1
0.1
C
Substitution rates:
Base frequencies:
D
A
C
G
T
A
5
6
2
A=0.1
C=0.2
G=0.3
T=0.4
C
5
3
8
G
6
3
1
T
2
8
1
-
Performance of ML when its model is
violated (one example)
Among site rate heterogeneity
equal rates?
Lemur
Homo
Pan
Goril
Pongo
Hylo
Maca
AAGCTTCATAG
AAGCTTCACCG
AAGCTTCACCG
AAGCTTCACCG
AAGCTTCACCG
AAGCTTTACAG
AAGCTTTTCCG
TTGCATCATCCA
TTGCATCATCCA
TTACGCCATCCA
TTACGCCATCCA
TTACGCCATCCT
TTACATTATCCG
TTACATTATCCG
…TTACATCATCCA
…TTACATCCTCAT
…TTACATCCTCAT
…CCCACGGACTTA
…GCAACCACCCTC
…TGCAACCGTCCT
…CGCAACCATCCT
• Proportion of invariable sites
–
Some sites don’t change do to strong functional or structural constraint (Hasegawa et
al., 1985)
• Site-specific rates
–
Different relative rates assumed for pre-assigned subsets of sites
• Gamma-distributed rates
–
Rate variation assumed to follow a gamma distribution with shape parameter 
.
....
Performance of ML when its model is
violated (another example)
Modeling among-site rate variation with a gamma distribution...
0.08
=200
Frequency
0.06
=0.5
=2
0.04
=50
0.02
0
0
1
2
Rate
…can also estimate a proportion of “invariable” sites (pinv)
Performance of ML when its model is
violated (another example)
 = 0.5, pinv=0.5
Tree
 = 1.0, pinv=0.5
1
1
1
0 .9
0 .9
0 .9
0 .8
0 .8
0 .7
0 .4
0 .3
0 .2
0 .1
100
1000
1 0 0 00
GTRg
0 .4
HK Yg
0 .2
GTRi
HKYi
GTRer
HKYer
0 .1
p arsimo n y
0 .1
0 .3
1 0 0 0 00
HYYig
0 .5
0 .4
0
0
100
GTRi
0 .3
1000
1 0 0 00
100
1 0 0 0 00
1
0 .9
0 .9
0 .9
0 .8
0 .8
GTRig
0 .7
GTRig
0 .7
0 .6
HKYig
0 .6
0 .6
0 .5
GTRg
HKYg
GTRi
HKYi
GTRer
0 .5
HKYig
GTRg
HKYg
GTRi
HKYi
GTRer
HKYer
0 .3
p arsimo n y
0 .1
0 .4
0 .3
0 .2
HKYer
0 .1
p arsimo n y
100
1000
1 0 0 00
0 .1
1 0 0 0 00
100
1 0 0 00
1 0 0 0 00
100
0 .9
0 .9
0 .9
0 .8
0 .8
GTRig
0 .7
GTRig
0 .7
0 .6
HKYig
0 .6
HKYig
GTRg
0 .6
0 .5
0 .2
0 .3
HKYer
p arsimo n y
0 .1
0
100
1 0 0 00
1 0 0 0 00
0 .2
1000
1 0 0 00
1 0 0 0 00
0 .3
HKYer
p arsimo n y
0 .1
100
HKYer
p arsimo n y
0 .4
0
1000
GTRg
HKYg
GTRi
HKYi
GTRer
0 .5
HKYg
GTRi
HKYi
GTRer
0 .4
HKYi
GTRer
GTRig
HKYig
0 .8
0 .7
0 .3
1 0 0 0 00
0
1000
1
0 .4
1 0 0 00
0 .2
1
0 .5
1000
0 .4
1
GTRg
HKYg
GTRi
parsim ony
0 .5
0
0
HK Yer
0 .8
0 .7
0 .2
GRTer
0
1
0 .3
HK Yi
0 .2
1
0 .4
GTRig
0 .6
GTRg
HKTg
0 .5
GTRer
HKYer
Parsimo n y
0 .7
GTRig
HKYig
0 .6
GTRg
HKYg
GTRi
HKYi
0 .5
0 .8
0 .7
GTRig
HKYig
0 .6
Proportion Correct
a = 1.0, pinv=0.2
1000
1 0 0 00
Sequence Leng t h
1 0 0 0 00
0 .2
0 .1
0
100
1000
1 0 0 00
1 0 0 0 00
“MODERATE”–Felsenstein zone
 = 1.0, p
inv =0.5
1
0.9
0.8
JCer
0.7
JC+G
0.6
JC+I
JC+I+G
0.5
GTRer
GTR+G
0.4
GTR+I
0.3
GTR+I+G
parsimony
0.2
0.1
0
100
1000
10000
100000
“MODERATE”–InverseFelsenstein zone
1
0.9
0.8
JCer
0.7
JC+G
0.6
JC+I
JC+I+G
0.5
GTRer
GTR+G
0.4
GTR+I
0.3
GTR+I+G
parsimony
0.2
0.1
0
100
1000
10000
100000
Bayesian Inference in Phylogenetics
• Uses Bayes formula:
Pr(|D) = Pr(D|) Pr()
Pr(D)
 Pr(D|) Pr()
 L() Pr()
( =tree topology,
branch-lengths, and
substitution-model
parameters)
• Calculation involves integrating over all tree
topologies and model-parameter values,
subject to assumed prior distribution on
parameters
Bayesian Inference in Phylogenetics
• To approximate this posterior density (complicated
multidimensional integral) we use Markov chain Monte Carlo
(MCMC)
– Simulated Markov chain in which transition probabilities are
assigned such that the stationary distribution of the chain is
the posterior density of interest
– E.g., Metropolis-Hastings algorithm: Accept a proposed
move from one state  to another state * with probability
min(r,1) where
r = Pr(*|D) Pr(| *)
Pr(|D) Pr(*| )
– Sample chain at regular intervals to approximate posterior
distribution
• MrBayes (by John Huelsenbeck and Fredrik Ronquist) is most
popular Bayesian inference program
A brief intro to Markov chain Monte Carlo (MCMC)
B
“burn in”
C
B
Likelihood
C
A
D
B
A
C
B
D
B
D
A
B
D
AB|CD
D
C
C
B
B
A
C
D
A
A
B
C
B
A
D
A
C
...
D
A
B
A
A
C
AB|CD
C
C
D
AB|CD
D
D
AC|BD
Iterations
1. Initialize the chain, e.g., by picking a random state X0 (topology,branch lengths, substitution-model
parameters) from the assumed prior distribution
2. For each time t, sample a new candidate state Y from some proposal distribution q(.|Xt) (e.g.,
change branch lengths or topology plus branch lengths)
 PrY | Dq(X |Y) 
  (Y) Pr(D |Y) q(X |Y) 

(X,Y)

min
1,

min




Calculate acceptance probability
1,

Pr
X
|
D
q(Y
|
X)

(X)
Pr(D
|
X)
q(X |Y) 


 

3. If Y is accepted, let Xt+1 = Y; otherwise let Xt+1 = Xt
If the chain is run “long enough”, the stationary distribution of states in the chain will represent a

good approximation to the target
distribution (in this case, the Bayesian posterior)
Model-based distances
• Can also calculate pairwise distances based on these models
• These distances estimate the number of substitutions per site
that have accumulated since the two sequences shared a
common ancestor, allowing for superimposed substitutions
(“multiple hits”)
• E.g.:
– Jukes-Cantor distance
– Kimura 2-parameter distance
– General maximum-likelihood distances available for other
models

Distance-based optimality criteria
1
2
3
4

d12

a
d13
d23

d14
d24
d34

1
2
3
4
p12 = a+b
p13 = a+c+d
p14 = a+c+e
p23 = b+c+d
p24 = b+c+e
p34 = d+e
3
1
b
2
c
d
e
4
pij = dij for all i and j if the tree
topology is correct and distances
Distances in gene ral will not be add itive, so
choose opt imal tree acco rding to one of the
following c riteria (object ive funct ions ):
&quot;Goodness - of - fit&quot; : minimize
wij

i&lt;j
pij  dij
r
Typicall y, r = 2 (l east-squares) and wij = 1/dij2 (&quot;FitchMargoliash&quot; metho d)
#branches
&quot;Minimum - evolution&quot; : minimize

k =1
# branches
vk
or

k =1
vk
Distance-based optimality criteria
Minimum evolution and least-squares
Homo sapiens
Pan
0.045
0.050
Minumum
evolution
(ME)
0.015
0.050
0.044
0.286
0.085
LS branch lengths
Lemur catta
0.28611
0.04436
0.01511
0.04463
0.05044
0.05038
0.08485
0.57588
Least-Squares
Gorilla
Pongo
dij
pij
SS
0.39646
0.39838
0.09506
0.37222
0.11172
0.11431
0.37096
0.18107
0.19399
0.18820
0.39021
0.39602
0.09507
0.38084
0.11011
0.11592
0.37096
0.18894
0.19475
0.17958
0.000039
0.000006
0.000000
0.000074
0.000003
0.000003
0.000000
0.000062
0.000001
0.000074
0.000261
```